[[tutorial-load-dataset]] == Loading Sample Data The tutorials in this section rely on the following data sets: * The complete works of William Shakespeare, suitably parsed into fields. Download this data set by clicking here: https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json[shakespeare.json]. * A set of fictitious accounts with randomly generated data. Download this data set by clicking here: https://download.elastic.co/demos/kibana/gettingstarted/accounts.zip[accounts.zip] * A set of randomly generated log files. Download this data set by clicking here: https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz[logs.jsonl.gz] Two of the data sets are compressed. Use the following commands to extract the files: [source,shell] unzip accounts.zip gunzip logs.jsonl.gz The Shakespeare data set is organized in the following schema: [source,json] { "line_id": INT, "play_name": "String", "speech_number": INT, "line_number": "String", "speaker": "String", "text_entry": "String", } The accounts data set is organized in the following schema: [source,json] { "account_number": INT, "balance": INT, "firstname": "String", "lastname": "String", "age": INT, "gender": "M or F", "address": "String", "employer": "String", "email": "String", "city": "String", "state": "String" } The schema for the logs data set has dozens of different fields, but the notable ones used in this tutorial are: [source,json] { "memory": INT, "geo.coordinates": "geo_point" "@timestamp": "date" } Before we load the Shakespeare and logs data sets, we need to set up {ref}/mapping.html[_mappings_] for the fields. Mapping divides the documents in the index into logical groups and specifies a field's characteristics, such as the field's searchability or whether or not it's _tokenized_, or broken up into separate words. Use the following command in a terminal (eg `bash`) to set up a mapping for the Shakespeare data set: [source,js] PUT /shakespeare { "mappings": { "doc": { "properties": { "speaker": {"type": "keyword"}, "play_name": {"type": "keyword"}, "line_id": {"type": "integer"}, "speech_number": {"type": "integer"} } } } } //CONSOLE This mapping specifies the following qualities for the data set: * Because the _speaker_ and _play_name_ fields are keyword fields, they are not analyzed. The strings are treated as a single unit even if they contain multiple words. * The _line_id_ and _speech_number_ fields are integers. The logs data set requires a mapping to label the latitude/longitude pairs in the logs as geographic locations by applying the `geo_point` type to those fields. Use the following commands to establish `geo_point` mapping for the logs: [source,js] PUT /logstash-2015.05.18 { "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } } } //CONSOLE [source,js] PUT /logstash-2015.05.19 { "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } } } //CONSOLE [source,js] PUT /logstash-2015.05.20 { "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } } } //CONSOLE The accounts data set doesn't require any mappings, so at this point we're ready to use the Elasticsearch {ref}/docs-bulk.html[`bulk`] API to load the data sets with the following commands: [source,shell] curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl These commands may take some time to execute, depending on the computing resources available. Verify successful loading with the following command: [source,js] GET /_cat/indices?v //CONSOLE You should see output similar to the following: [source,shell] health status index pri rep docs.count docs.deleted store.size pri.store.size yellow open bank 5 1 1000 0 418.2kb 418.2kb yellow open shakespeare 5 1 111396 0 17.6mb 17.6mb yellow open logstash-2015.05.18 5 1 4631 0 15.6mb 15.6mb yellow open logstash-2015.05.19 5 1 4624 0 15.7mb 15.7mb yellow open logstash-2015.05.20 5 1 4750 0 16.4mb 16.4mb