kibana/docs/maps/reverse-geocoding-tutorial.asciidoc
Nathan Reese 11341ffe0e
[Maps] update docs for index pattern -> data view rename (#117400)
* [Maps] update docs for index pattern -> data view rename

* Update docs/maps/reverse-geocoding-tutorial.asciidoc

Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>

Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2021-11-04 12:04:05 -06:00

183 lines
7.7 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[role="xpack"]
[[reverse-geocoding-tutorial]]
== Map custom regions with reverse geocoding
*Maps* comes with https://maps.elastic.co/#file[predefined regions] that allow you to quickly visualize regions by metrics. *Maps* also offers the ability to map your own regions. You can use any region data you'd like, as long as your source data contains an identifier for the corresponding region.
But how can you map regions when your source data does not contain a region identifier? This is where reverse geocoding comes in. Reverse geocoding is the process of assigning a region identifier to a feature based on its location.
In this tutorial, youll use reverse geocoding to visualize United States Census Bureau Combined Statistical Area (CSA) regions by web traffic.
Youll learn to:
- Upload custom regions.
- Reverse geocode with the {es} {ref}/enrich-processor.html[enrich processor].
- Create a map and visualize CSA regions by web traffic.
When you complete this tutorial, youll have a map that looks like this:
[role="screenshot"]
image::maps/images/reverse-geocoding-tutorial/csa_regions_by_web_traffic.png[]
[float]
=== Step 1: Index web traffic data
GeoIP is a common way of transforming an IP address to a longitude and latitude. GeoIP is roughly accurate on the city level globally and neighborhood level in selected countries. Its not as good as an actual GPS location from your phone, but its much more precise than just a country, state, or province.
Youll use the <<get-started, web logs sample data set>> that comes with Kibana for this tutorial. Web logs sample data set has longitude and latitude. If your web log data does not contain longitude and latitude, use {ref}/geoip-processor.html[GeoIP processor] to transform an IP address into a {ref}/geo-point.html[geo_point] field.
To install web logs sample data set:
. On the home page, click *Try sample data*.
. On the *Sample web logs* card, click *Add data*.
[float]
=== Step 2: Index Combined Statistical Area (CSA) regions
GeoIP level of detail is very useful for driving decision-making. For example, say you want to spin up a marketing campaign based on the locations of your users or show executive stakeholders which metro areas are experiencing an uptick of traffic.
That kind of scale in the United States is often captured with what the Census Bureau calls the Combined Statistical Area (CSA). Combined Statistical Area is roughly equivalent with how people intuitively think of which urban area they live in. It does not necessarily coincide with state or city boundaries.
CSAs generally share the same telecom providers and ad networks. New fast food franchises expand to a CSA rather than a particular city or municipality. Basically, people in the same CSA shop in the same IKEA.
To get the CSA boundary data:
. Download the https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html[Cartographic Boundary shapefile (.shp)] from the Census Bureaus website.
. To use the data in Kibana, convert it to GeoJSON format. Follow this https://gist.github.com/YKCzoli/b7f5ff0e0f641faba0f47fa5d16c4d8d[helpful tutorial] to use QGIS to convert the Cartographic Boundary shapefile to GeoJSON. Or, download a https://raw.githubusercontent.com/elastic/examples/master/blog/reverse-geocoding/csba.json[prebuilt GeoJSON version].
Once you have your GeoJSON file:
. Open the main menu, and click *Maps*.
. Click *Create map*.
. Click *Add layer*.
. Click *Upload GeoJSON*.
. Use the file chooser to import the CSA GeoJSON file.
. Set index name to *csa* and click *Import file*.
. When importing is complete, click *Add as document layer*.
. Add Tooltip fields:
.. Click *+ Add* to open field select.
.. Select *NAME*, *GEOID*, and *AFFGEOID*.
.. Click *Add*.
. Click *Save & close*.
Looking at the map, you get a sense of what constitutes a metro area in the eyes of the Census Bureau.
[role="screenshot"]
image::maps/images/reverse-geocoding-tutorial/csa_regions.jpeg[]
[float]
=== Step 3: Reverse geocoding
To visualize CSA regions by web log traffic, the web log traffic must contain a CSA region identifier. You'll use {es} {ref}/enrich-processor.html[enrich processor] to add CSA region identifiers to the web logs sample data set. You can skip this step if your source data already contains region identifiers.
. Open the main menu, then click *Dev Tools*.
. In *Console*, create a {ref}/geo-match-enrich-policy-type.html[geo_match enrichment policy]:
+
[source,js]
----------------------------------
PUT /_enrich/policy/csa_lookup
{
"geo_match": {
"indices": "csa",
"match_field": "coordinates",
"enrich_fields": [ "GEOID", "NAME"]
}
}
----------------------------------
. To initialize the policy, run:
+
[source,js]
----------------------------------
POST /_enrich/policy/csa_lookup/_execute
----------------------------------
. To create a ingest pipeline, run:
+
[source,js]
----------------------------------
PUT _ingest/pipeline/lonlat-to-csa
{
"description": "Reverse geocode longitude-latitude to combined statistical area",
"processors": [
{
"enrich": {
"field": "geo.coordinates",
"policy_name": "csa_lookup",
"target_field": "csa",
"ignore_missing": true,
"ignore_failure": true,
"description": "Lookup the csa identifier"
}
},
{
"remove": {
"field": "csa.coordinates",
"ignore_missing": true,
"ignore_failure": true,
"description": "Remove the shape field"
}
}
]
}
----------------------------------
. To update your existing data, run:
+
[source,js]
----------------------------------
POST kibana_sample_data_logs/_update_by_query?pipeline=lonlat-to-csa
----------------------------------
. To run the pipeline on new documents at ingest, run:
+
[source,js]
----------------------------------
PUT kibana_sample_data_logs/_settings
{
"index": {
"default_pipeline": "lonlat-to-csa"
}
}
----------------------------------
. Open the main menu, and click *Discover*.
. Set the data view to *kibana_sample_data_logs*.
. Open the <<set-time-filter, time filter>>, and set the time range to the last 30 days.
. Scan through the list of *Available fields* until you find the `csa.GEOID` field. You can also search for the field by name.
. Click image:images/reverse-geocoding-tutorial/add-icon.png[Add icon] to toggle the field into the document table.
. Find the 'csa.NAME' field and add it to your document table.
Your web log data now contains `csa.GEOID` and `csa.NAME` fields from the matching *csa* region. Web log traffic not contained in a CSA region does not have values for `csa.GEOID` and `csa.NAME` fields.
[role="screenshot"]
image::maps/images/reverse-geocoding-tutorial/discover_enriched_web_log.png[]
[float]
=== Step 4: Visualize Combined Statistical Area (CSA) regions by web traffic
Now that our web traffic contains CSA region identifiers, you'll visualize CSA regions by web traffic.
. Open the main menu, and click *Maps*.
. Click *Create map*.
. Click *Add layer*.
. Click *Choropleth*.
. For *Boundaries source*:
.. Select *Points, lines, and polygons from Elasticsearch*.
.. Set *Data view* to *csa*.
.. Set *Join field* to *GEOID*.
. For *Statistics source*:
.. Set *Data view* to *kibana_sample_data_logs*.
.. Set *Join field* to *csa.GEOID.keyword*.
. Click *Add layer*.
. Scroll to *Layer Style* and Set *Label* to *Fixed*.
. Click *Save & close*.
. *Save* the map.
.. Give the map a title.
.. Under *Add to dashboard*, select *None*.
.. Click *Save and add to library*.
[role="screenshot"]
image::maps/images/reverse-geocoding-tutorial/csa_regions_by_web_traffic.png[]
Congratulations! You have completed the tutorial and have the recipe for visualizing custom regions. You can now try replicating this same analysis with your own data.