[docs] 7.10 APM docs updates (#80605) (#81248)

This commit is contained in:
Brandon Morelli 2020-10-21 07:42:41 -07:00 committed by GitHub
parent 9a4e2af97f
commit 633dc4f6eb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
31 changed files with 97 additions and 47 deletions

View file

@ -18,12 +18,22 @@ image::apm/images/apm-alert.png[Create an alert in the APM app]
For a walkthrough of the alert flyout panel, including detailed information on each configurable property,
see Kibana's <<defining-alerts,defining alerts>>.
The APM app supports two different types of threshold alerts: transaction duration, and error rate.
Below, we'll create one of each.
The APM app supports four different types of alerts:
* Transaction duration anomaly:
alerts when the service's transaction duration reaches a certain anomaly score
* Transaction duration threshold:
alerts when the service's transaction duration exceeds a given time limit over a given time frame
* Transaction error rate threshold:
alerts when the service's transaction error rate is above the selected rate over a given time frame
* Error count threshold:
alerts when service exceeds a selected number of errors over a given time frame
Below, we'll walk through the creation of two of these alerts.
[float]
[[apm-create-transaction-alert]]
=== Create a transaction duration alert
=== Example: create a transaction duration alert
Transaction duration alerts trigger when the duration of a specific transaction type in a service exceeds a defined threshold.
This guide will create an alert for the `opbeans-java` service based on the following criteria:
@ -57,9 +67,9 @@ Enter a name for the connector,
and paste the webhook URL.
See Slack's webhook documentation if you need to create one.
Add a message body in markdown format.
A default message is provided as a starting point for your alert.
You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}`
to pass alert values at the time a condition is detected to an action.
to pass additional alert values at the time a condition is detected to an action.
A list of available variables can be accessed by selecting the
**add variable** button image:apm/images/add-variable.png[add variable button].
@ -67,7 +77,7 @@ Select **Save**. The alert has been created and is now active!
[float]
[[apm-create-error-alert]]
=== Create an error rate alert
=== Example: create an error rate alert
Error rate alerts trigger when the number of errors in a service exceeds a defined threshold.
This guide creates an alert for the `opbeans-python` service based on the following criteria:
@ -94,9 +104,9 @@ Based on the alert criteria, define the following alert details:
Select the **Email** action type and click **Create a connector**.
Fill out the required details: sender, host, port, etc., and click **save**.
Add a message body in markdown format.
A default message is provided as a starting point for your alert.
You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}`
to pass alert values at the time a condition is detected to an action.
to pass additional alert values at the time a condition is detected to an action.
A list of available variables can be accessed by selecting the
**add variable** button image:apm/images/add-variable.png[add variable button].

View file

@ -69,7 +69,7 @@ the host filter will still be applied.
These filters are very useful for quickly and easily removing noise from your data.
With just a click, you can filter your transactions by the transaction result,
host, container ID, and more.
host, container ID, Kubernetes pod, and more.
[role="screenshot"]
image::apm/images/local-filter.png[Local filters available in the APM app in Kibana]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 568 KiB

After

Width:  |  Height:  |  Size: 448 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 216 KiB

After

Width:  |  Height:  |  Size: 230 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 294 KiB

After

Width:  |  Height:  |  Size: 301 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 290 KiB

After

Width:  |  Height:  |  Size: 288 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.5 MiB

After

Width:  |  Height:  |  Size: 366 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 182 KiB

After

Width:  |  Height:  |  Size: 220 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 480 KiB

After

Width:  |  Height:  |  Size: 485 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 477 KiB

After

Width:  |  Height:  |  Size: 590 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 265 KiB

After

Width:  |  Height:  |  Size: 385 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 231 KiB

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 312 KiB

After

Width:  |  Height:  |  Size: 373 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 426 KiB

After

Width:  |  Height:  |  Size: 363 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 256 KiB

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 435 KiB

After

Width:  |  Height:  |  Size: 394 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 249 KiB

After

Width:  |  Height:  |  Size: 224 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 159 KiB

After

Width:  |  Height:  |  Size: 152 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 266 KiB

After

Width:  |  Height:  |  Size: 355 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 247 KiB

After

Width:  |  Height:  |  Size: 357 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 524 KiB

After

Width:  |  Height:  |  Size: 584 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 352 KiB

After

Width:  |  Height:  |  Size: 549 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 179 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 212 KiB

After

Width:  |  Height:  |  Size: 219 KiB

View file

@ -14,7 +14,12 @@ Machine learning jobs are created per environment, and are based on a service's
Because jobs are created at the environment level,
you can add new services to your existing environments without the need for additional machine learning jobs.
After a machine learning job is created, results are shown in two places:
Results from machine learning jobs are shown in multiple places throughout the APM app:
* The **Services overview** provides a quick-glance view of the general health of all of your services.
+
[role="screenshot"]
image::apm/images/service-quick-health.png[Example view of anomaly scores on response times in the APM app]
* The transaction duration chart will show the expected bounds and add an annotation when the anomaly score is 75 or above.
+

View file

@ -33,7 +33,7 @@ distributed tracing will not work, and the connection will not be drawn on the m
Select the **Service Map** tab to get started.
By default, all instrumented services and connections are shown.
Whether you're onboarding a new engineer, or just trying to grasp the big picture,
click around, zoom in and out, and begin to visualize how your services are connected.
drag things around, zoom in and out, and begin to visualize how your services are connected.
If there's a specific service that interests you, select that service to highlight its connections.
Clicking **Focus map** will refocus the map on that specific service and lock the connection highlighting.

View file

@ -2,8 +2,13 @@
[[services]]
=== Services overview
The *Services* overview gives you quick insights into the health and general performance of all of your instrumented services.
Services are sorted by the `service.name` configured in each of the {apm-agents-ref}[APM agents] youve installed.
The *Services* overview page provides a quick, high-level overview of the health and general
performance of all instrumented services.
To help surface potential issues, services are sorted by their health status:
**critical** > **warning** > **healthy** > **unknown**.
Health status is powered by machine learning and requires anomaly detection to be enabled.
Learn more in <<machine-learning-integration,machine learning>>.
[role="screenshot"]
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]

View file

@ -3,7 +3,7 @@
=== Trace sample timeline
The trace sample timeline visualization is a bird's-eye view of what your application was doing while it was trying to respond to a request.
This makes it useful for visualizing where the selected transaction spent most of its time.
This makes it useful for visualizing where a selected transaction spent most of its time.
[role="screenshot"]
image::apm/images/apm-transaction-sample.png[Example of distributed trace colors in the APM app in Kibana]
@ -43,9 +43,12 @@ this makes finding possible bottlenecks throughout your application much easier
image::apm/images/apm-distributed-tracing.png[Example view of the distributed tracing in APM app in Kibana]
Don't forget; by definition, a distributed trace includes more than one transaction.
When viewing these distributed traces in the timeline waterfall, you'll see this image:apm/images/transaction-icon.png[APM icon] icon,
When viewing distributed traces in the timeline waterfall,
you'll see this icon: image:apm/images/transaction-icon.png[APM icon],
which indicates the next transaction in the trace.
These transactions can be expanded and viewed in detail by clicking on them.
For easier problem isolation, transactions can be collapsed in the waterfall by clicking
the icon to the left of the transactions.
Transactions can also be expanded and viewed in detail by clicking on them.
After exploring these traces,
you can return to the full trace by clicking *View full trace*.

View file

@ -7,7 +7,8 @@ and which services were part of it.
In addition to the Traces overview, you can view your application traces in the <<spans,trace sample timeline waterfall>>.
The *Traces* overview displays the entry transaction for all traces in your application.
If you're using <<distributed-tracing>>, this view is key to finding the critical paths within your application.
If you're using <<distributed-tracing,distributed tracing>>,
this view is key to finding the critical paths within your application.
Transactions with the same name are grouped together and only shown once in this table.
By default, transactions are sorted by _Impact_.

View file

@ -10,7 +10,24 @@ Selecting a <<services,*service*>> brings you to the *transactions* overview.
[role="screenshot"]
image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]
The *time spent by span type*, *transaction duration*, and *requests per minute* chart display information on all transactions associated with the selected service:
The *transaction duration*, *transactions per minute*, *transaction error rate*, and *time spent by span type*
charts display information on all transactions associated with the selected service:
*Transaction duration*::
Response times for this service, broken down into average, 95th, and 99th percentile.
If there's a weird spike that you'd like to investigate,
you can simply zoom in on the graph - this will adjust the specific time range,
and all of the data on the page will update accordingly.
*Transactions per minute*::
Visualize response codes: `2xx`, `3xx`, `4xx`, etc.,
and is useful for determining if you're serving more of one code than you typically do.
Like in the Transaction duration graph, you can zoom in on anomalies to further investigate them.
*Transaction error rate*::
Visualize the total number of transactions with errors divided by the total number of transactions.
Any unexpected increases, decreases, or irregular patterns can be investigated further
with the <<errors,errors overview>>.
*Time spent by span type*::
Visualize where your application is spending most of its time.
@ -22,17 +39,6 @@ This could be a sign that the agent does not have auto-instrumentation for whate
+
It's important to note that if you have asynchronous spans, the sum of all span times may exceed the duration of the transaction.
*Transaction duration*::
Response times for this service, broken down into average, 95th, and 99th percentile.
If there's a weird spike that you'd like to investigate,
you can simply zoom in on the graph - this will adjust the specific time range,
and all of the data on the page will update accordingly.
*Requests per minute*::
Visualize response codes: `2xx`, `3xx`, `4xx`, etc.,
and is useful for determining if you're serving more of one code than you typically do.
Like in the Transaction duration graph, you can zoom in on anomalies to further investigate them.
[[transactions-table]]
==== Transactions table
@ -61,42 +67,45 @@ refer to the documentation for each {apm-agents-ref}[APM Agent] you've implement
==== RUM Transaction overview
The transaction overview page is customized for the JavaScript RUM Agent.
This page highlights things like *page load times*, *transactions per minute*, and even the *average page load duration distribution by country*.
Specifically, the page highlights *page load times* for your service:
[role="screenshot"]
image::apm/images/apm-geo-ui.png[average page load duration distribution]
This data is available due to the geo-ip and user agent pipelines being enabled by default,
which allows for the capture of geo-location and user agent data.
These visualizations make it easy for you to visualize performance information about your
end-users' experience based on their location.
Additional RUM goodies, like core vitals, and visitor breakdown by browser, location, and device,
are available in the Observability User Experience tab.
// To do
// Add link to the Observability UE docs when complete
[[transaction-details]]
==== Transaction details
Selecting a transaction group will bring you to the *transaction* details.
Transaction details include a high-level overview of the time spent by span type,
transaction group duration, requests per minute, and transaction group duration distribution.
It's important to note that all of these graphs show data from every transaction within the selected transaction group.
This page is visually similar to the transaction overview, but it shows data from all transactions within
the selected transaction group.
[role="screenshot"]
image::apm/images/apm-transaction-response-dist.png[Example view of response time distribution]
Up to ten sampled transactions are also displayed.
These sampled transactions are based on your selection in the *Transactions duration distribution*.
You can update the sampled transactions by selecting a new _bucket_ in the transactions duration distribution graph.
The number of requests per bucket is displayed when hovering over the graph, and the selected bucket is highlighted to stand out.
These sampled transactions are based on the _bucket_ selection in the *Transactions duration distribution* chart.
You can update the sampled transactions by selecting a new _bucket_.
The number of requests per bucket is displayed when hovering over the graph,
and the selected bucket is highlighted to stand out.
The screenshot below shows a typical distribution, and indicates most of our requests were served quickly--awesome!
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.
[role="screenshot"]
image::apm/images/apm-transaction-duration-dist.png[Example view of transactions duration distribution graph]
This graph shows a typical distribution, and indicates most of our requests were served quickly--awesome!
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.
When you select one of these buckets,
When you select a bucket,
you're presented with up to ten trace samples.
Each sample has a trace timeline waterfall that shows what a typical request in that bucket was doing.
By investigating this timeline waterfall, we can hopefully determine _why_ this request was slow and then implement a fix.
Each sample has a trace timeline waterfall that shows how a typical request in that bucket executed.
This waterfall is useful for understanding the parent/child hierarchy of transactions and spans,
and ultimately determining _why_ a request was slow.
For large waterfalls, expand problematic transactions and collapse well-performing ones
for easier problem isolation and troubleshooting.
[role="screenshot"]
image::apm/images/apm-transaction-sample.png[Example view of transactions sample]

View file

@ -14,6 +14,7 @@ Also, check out the https://discuss.elastic.co/c/apm[APM discussion forum].
* <<troubleshooting-too-many-transactions>>
* <<troubleshooting-unknown-route>>
* <<troubleshooting-fields-unsearchable>>
* <<service-map-rum-connections>>
[float]
[[no-apm-data-found]]
@ -180,3 +181,19 @@ setup.template.append_fields:
type: object
dynamic: true
----
[float]
[[service-map-rum-connections]]
=== Service maps: no connection between client and server
If the service map is not showing an expected connection between the client and server,
it's likely because you haven't configured
{apm-agent-rum}/configuration.html#distributed-tracing-origins[`distributedTracingOrigins`].
This setting is necessary, for example, for cross-origin requests.
If you have a basic web application that provides data via an API on `localhost:4000`,
and serves HTML from `localhost:4001`, you'd need to set `distributedTracingOrigins: ['https://localhost:4000']`
to ensure the origin is monitored as a part of distributed tracing.
In other words, `distributedTracingOrigins` is consulted prior to the agent adding the
distributed tracing `traceparent` header to each request.