kibana/docs/maps/maps-aggregations.asciidoc
Nathan Reese 11341ffe0e
[Maps] update docs for index pattern -> data view rename (#117400)
* [Maps] update docs for index pattern -> data view rename

* Update docs/maps/reverse-geocoding-tutorial.asciidoc

Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>

Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2021-11-04 12:04:05 -06:00

239 lines
8.8 KiB
Plaintext

[role="xpack"]
[[maps-aggregations]]
== Plot big data without plotting too much data
++++
<titleabbrev>Plot big data</titleabbrev>
++++
Use {ref}/search-aggregations.html[aggregations] to plot large data sets without overwhelming your network or your browser.
When using aggregations, the documents stay in Elasticsearch and only the calculated values for each group are returned to your computer.
Aggregations group your documents into buckets and calculate metrics for each bucket.
Use metric aggregations for <<maps-vector-style-data-driven, data driven styling>>. For example, use the count aggregation to shade world countries by web log traffic.
You can add the following metric aggregations:
* *Average.* The mean of the values.
* *Count.* The number of documents.
* *Max.* The highest value.
* *Min.* The lowest value.
* *Percentile.* The value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values.
* *Sum.* The total value.
* *Top term.* The most common value.
* *Unique count.* The number of distinct values.
Use aggregated layers with document layers to show aggregated views when the map shows larger
amounts of the globe and individual documents when the map shows smaller regions.
In the following example, the Grid aggregation layer is only visible when the map is at zoom levels 0 through 5. The Documents layer is only visible when the map is at zoom levels 4 through 24.
See the <<maps-add-elasticsearch-layer, Getting started>> tutorial for more details on configuring the layers.
[role="screenshot"]
image::maps/images/grid_to_docs.gif[]
[role="xpack"]
[[maps-grid-aggregation]]
=== Grid aggregation
Grid aggregation layers use {ref}/search-aggregations-bucket-geotilegrid-aggregation.html[GeoTile grid aggregation] to group your documents into grids. You can calculate metrics for each gridded cell.
Symbolize grid aggregation metrics as:
*Clusters*:: Creates a <<vector-layer, vector layer>> with a cluster symbol for each gridded cell.
The cluster location is the weighted centroid for all documents in the gridded cell.
*Grid rectangles*:: Creates a <<vector-layer, vector layer>> with a bounding box polygon for each gridded cell.
*Heat map*:: Creates a <<heatmap-layer, heat map layer>> that clusters the weighted centroids for each gridded cell.
To enable a grid aggregation layer:
. Click *Add layer*, then select the *Clusters and grids* or *Heat map* layer.
To enable a blended layer that dynamically shows clusters or documents:
. Click *Add layer*, then select the *Documents* layer.
. Configure *Data view* and the *Geospatial field*.
. In *Scaling*, select *Show clusters when results exceed 10000*.
[role="xpack"]
[[maps-top-hits-aggregation]]
=== Display the most relevant documents per entity
Use *Top hits per entity* to display the most relevant documents per entity, for example, the most recent GPS tracks per flight route.
To get this data, {es} first groups your data using a {ref}/search-aggregations-bucket-terms-aggregation.html[terms aggregation],
then accumulates the most relevant documents based on sort order for each entry using a {ref}/search-aggregations-metrics-top-hits-aggregation.html[top hits metric aggregation].
To enable top hits:
. Click *Add layer*, then select the *Top hits per entity* layer.
. Configure *Data view* and *Geospatial field*.
. Set *Entity* to the field that identifies entities in your documents.
This field will be used in the terms aggregation to group your documents into entity buckets.
. Set *Documents per entity* to configure the maximum number of documents accumulated per entity.
This setting is limited to the `index.max_inner_result_window` index setting, which defaults to 100.
[role="screenshot"]
image::maps/images/top_hits.png[]
[role="xpack"]
[[point-to-point]]
=== Point to point
A point-to-point connection plots aggregated data paths between the source and the destination.
Thicker, darker lines symbolize more connections between a source and destination, and thinner, lighter lines symbolize less connections.
Point to point uses an {es} {ref}/search-aggregations-bucket-terms-aggregation.html[terms aggregation] to group your documents by destination.
Then, a nested {ref}/search-aggregations-bucket-geotilegrid-aggregation.html[GeoTile grid aggregation] groups sources for each destination into grids.
A line connects each source grid centroid to each destination.
Point-to-point layers are used in several common use cases:
* Source-destination maps for network traffic
* Origin-destination maps for flight data
* Origin-destination flows for import/export/migration
* Origin-destination for pick-up/drop-off data
image::maps/images/point_to_point.png[]
[role="xpack"]
[[terms-join]]
=== Term join
Use term joins to augment vector features with properties for <<maps-vector-style-data-driven, data driven styling>> and richer tooltip content.
Term joins are available for the following <<vector-layer, vector layers>>:
* Configured GeoJSON
* Documents
* EMS Boundaries
==== Example term join
The <<maps-add-choropleth-layer, choropleth layer example>> uses a term join to shade world countries by web log traffic.
Darker shades symbolize countries with more web log traffic, and lighter shades symbolize countries with less traffic.
[role="screenshot"]
image::maps/images/gs_add_cloropeth_layer.png[]
===== How a term join works
A term join uses a shared key to combine vector features, the left source, with the results of an {es} terms aggregation, the right source.
The cloropeth example uses the shared key, https://wikipedia.org/wiki/ISO_3166-1_alpha-2[ISO 3166-1 alpha-2 code], to join world countries and web log traffic.
ISO 3166-1 alpha-2 code is an international standard that identifies countries by a two-letter country code.
For example, *Sweden* has an ISO 3166-1 alpha-2 code of *SE*.
[role="screenshot"]
image::maps/images/terms_join_shared_key_config.png[]
===== Left source
The left source for the term join is the https://www.elastic.co/elastic-maps-service[Elastic Maps Service (EMS)] World Countries. Vector features for this source are provided by EMS. You can also use your own vector features.
In the following example, *iso2* property defines the shared key for the left source.
--------------------------------------------------
{
geometry: {
coordinates: [...],
type: "Polygon"
},
properties: {
name: "Sweden",
iso2: "SE"
},
type: "Feature"
}
--------------------------------------------------
===== Right source
The right source uses the Kibana sample data set "Sample web logs".
In this data set, the *geo.src* field contains the ISO 3166-1 alpha-2 code of the country of origin.
A {ref}/search-aggregations-bucket-terms-aggregation.html[terms aggregation] groups the sample web log documents by *geo.src* and calculates metrics for each term.
The METRICS configuration defines two metric aggregations:
* The count of all documents in the terms bucket.
* The average of the field "bytes" for all documents in the terms bucket.
[role="screenshot"]
image::maps/images/terms_join_metric_config.png[]
The right source does not provide individual documents, but instead provides the metrics from a terms aggregation.
The metrics are calculated from the following sample web logs documents.
--------------------------------------------------
{
bytes: 1837,
geo: {
src: "SE"
},
timestamp: "Feb 28, 2019 @ 07:23:08.754"
},
{
bytes: 971,
geo: {
src: "SE"
},
timestamp: "Feb 27, 2019 @ 08:10:45.205"
},
{
bytes: 4277,
geo: {
src: "SE"
},
timestamp: "Feb 21, 2019 @ 05:24:33.945"
},
{
bytes: 5624,
geo: {
src: "SE"
},
timestamp: "Feb 21, 2019 @ 04:57:05.921"
}
--------------------------------------------------
The terms aggregation creates a bucket for each unique *geo.src* value. Metrics are calucated for all documents in a bucket.
The following shows an example terms aggregation response. Note the *key* property, which defines the shared key for the right source.
--------------------------------------------------
{
aggregations: {
join: {
buckets: [
{
doc_count: 4,
key: "SE",
avg_of_bytes: {
value: 3177.25
}
},
...
]
}
}
}
--------------------------------------------------
==== Augment the left source with metrics from the right source
The join adds metrics for each terms aggregation bucket to the world country feature with the corresponding ISO 3166-1 alpha-2 code. Features that do not have a corresponding terms aggregation bucket are not visible on the map.
The world country features now have two additional properties:
* Count of web log traffic originating from the world country
* Average bytes of web log traffic originating from the world country
The cloropeth example uses the count of web log traffic to symbolize countries by web log traffic.