Alerting user interface documentation and guide (#61701)

This commit is contained in:
Peter Schretlen 2020-03-27 17:01:01 -04:00 committed by GitHub
parent 451fc263b1
commit c4b4ea11c9
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
53 changed files with 793 additions and 2 deletions

View file

@ -0,0 +1,34 @@
[role="xpack"]
[[alert-details]]
=== Alert details
beta[]
The *Alert details* page tells you about the state of the alert and provides granular control over the actions it is taking.
[role="screenshot"]
image::images/alerts-details-instances-active.png[Alert details page with three alert instances]
In this example, alerts detect when a site serves more than a threshold number of bytes in a 24 hour period. Three sites are above the threshold. These are called alert instances - occurrences of the condition being detected - and the instance name, status, time of detection, and duration of the condition are shown in this view.
Upon detection, each instance can trigger one or more actions. If the condition persists, the same actions will trigger either on the next scheduled alert check, or (if defined) after the re-notify period on the alert has passed. To prevent re-notification, you can suppress future actions by clicking on the eye icon to mute an individual alert instance. Muting means that the alert checks continue to run on a schedule, but that instance will not trigger any action.
[role="screenshot"]
image::images/alerts-details-instance-muting.png[Muting an alert instance]
Alert instances will come and go from the list depending on whether they meet the alert conditions or not - unless they are muted. If a muted instance no longer meets the alert conditions, it will appear as inactive in the list. This prevents an instance from triggering actions if it reappears in the future.
[role="screenshot"]
image::images/alerts-details-instances-inactive.png[Alert details page with three inactive alert instances]
If you want to suppress actions on all current and future instances, you can mute the entire alert. Alert checks continue to run and the instance list will update as instances activate or deactivate, but no actions will be triggered.
[role="screenshot"]
image::images/alerts-details-muting.png[Use the mute toggle to suppress all action on current and future instances]
You can also disable an alert altogether. When disabled, the alert stops running checks altogether and will clear any instances it is tracking. You may want to disable alerts that are not currently needed to reduce the load on {kib} and {es}.
[role="screenshot"]
image::images/alerts-details-disabling.png[Use the disable toggle to turn off alert checks and clear instances tracked]
* For further information on alerting concepts and examples, see <<alerting-getting-started>>.

View file

@ -0,0 +1,59 @@
[role="xpack"]
[[alert-management]]
=== Managing Alerts
beta[]
The *Alerts* tab provides a cross-app view of alerting. Different {kib} apps like <<xpack-infra, Metrics>>, <<xpack-apm, APM>>, <<xpack-uptime, Uptime>>, and <<xpack-siem, SIEM>> can offer their own alerts, and the *Alerts* tab provides a central place to:
* <<create-edit-alerts, Create and edit>> alerts
* <<controlling-alerts, Control alerts>> including enabling/disabling, muting/unmuting, and deleting
* Drill-down to <<alert-details, alert details>>
[role="screenshot"]
image:management/alerting/images/alerts-and-actions-ui.png[Example alert listing in the Alerts and Actions UI]
For more information on alerting concepts and the types of alerts and actions available, see <<alerting-getting-started>>.
[float]
==== Finding alerts
The *Alerts* tab lists all alerts in the current space, including summary information about their execution frequency, tags, and type.
The *search bar* can be used to quickly find alerts by name or tag.
[role="screenshot"]
image::images/alerts-filter-by-search.png[Filtering the alerts list using the search bar]
The *type* dropdown lets you filter to a subset of alert types.
[role="screenshot"]
image::images/alerts-filter-by-type.png[Filtering the alerts list by types of alert]
The *Action type* dropdown lets you filter by the type of action used in the alert.
[role="screenshot"]
image::images/alerts-filter-by-action-type.png[Filtering the alert list by type of action]
[float]
[[create-edit-alerts]]
==== Creating and editing alerts
Many alerts must be created within the context of a {kib} app like <<xpack-infra, Metrics>>, <<xpack-apm, APM>>, or <<xpack-uptime, Uptime>>, but others are generic. Generic alert types can be created in the *Alerts* management UI by clicking the *Create* button. This will launch a flyout that guides you through selecting an alert type and configuring it's properties. Refer to <<alert-types>> for details on what types of alerts are available and how to configure them.
After an alert is created, you can re-open the flyout and change an alerts properties by clicking the *Edit* button shown on each row of the alert listing.
[float]
[[controlling-alerts]]
==== Controlling alerts
The alert listing allows you to quickly mute/unmute, disable/enable, and delete individual alerts by clicking the action button at the right of each row.
[role="screenshot"]
image:management/alerting/images/individual-mute-disable.png[The actions button allows an individual alert to be muted, disabled, or deleted]
These operations can also be performed in bulk by multi-selecting alerts and clicking the *Manage alerts* button:
[role="screenshot"]
image:management/alerting/images/bulk-mute-disable.png[The Manage alerts button lets you mute/unmute, enable/disable, and delete in bulk]

View file

@ -0,0 +1,25 @@
[role="xpack"]
[[managing-alerts-and-actions]]
== Alerts and Actions
beta[]
The *Alerts and Actions* UI lets you <<alert-management, see and control all the alerts>> in a space, and provides tools to <<connector-management, create and manage connectors>> so that alerts can trigger actions like notification, indexing, and ticketing.
To manage alerting and connectors, go to *Management > {kib} > Alerts and Actions*.
[role="screenshot"]
image:management/alerting/images/alerts-and-actions-ui.png[Example alert listing in the Alerts and Actions UI]
[NOTE]
============================================================================
Similar to dashboards, alerts and connectors reside in a <<xpack-spaces, space>>.
The *Alerts and Actions* UI only shows alerts and connectors for the current space.
============================================================================
[NOTE]
============================================================================
{es} also offers alerting capabilities through Watcher, which
can be managed through the <<watcher-ui, Watcher UI>>. See
<<alerting-concepts-differences>> for more information.
============================================================================

View file

@ -0,0 +1,47 @@
[role="xpack"]
[[connector-management]]
=== Managing Connectors
beta[]
Alerts use *Connectors* to route actions to different destinations like log files, ticketing systems, and messaging tools. While each {kib} app can offer their own types of alerts, they typically share connectors. The *Connectors* tab offers a central place to view and manage all the connectors in the current space.
For more information on connectors and the types of actions available see <<action-types>>.
[role="screenshot"]
image::images/connector-listing.png[Example connector listing in the Alerts and Actions UI]
[float]
==== Connector list
The *Connectors* tab lists all connectors in the current space. The *search bar* can be used to find specific connectors by name and/or type.
[role="screenshot"]
image::images/connector-filter-by-search.png[Filtering the connector list using the search bar]
The *type* dropdown also lets you filter to a subset of action types.
[role="screenshot"]
image::images/connector-filter-by-type.png[Filtering the connector list by types of actions]
The *Actions* column indicates the number of actions that reference the connector. This count helps you confirm a connector is unused before you delete it, and tells you how many actions will be affected when a connector is modified.
[role="screenshot"]
image::images/connector-action-count.png[Filtering the connector list by types of actions]
You can delete individual connectors using the trash icon on the right of each row. Connectors can also be deleted in bulk by multi-selecting them and clicking the *Delete* button to the left of the search box.
[role="screenshot"]
image::images/connector-delete.png[Deleting connectors individually or in bulk]
[NOTE]
============================================================================
You can delete a connector even if there are still actions referencing it.
When this happens the action will fail to execute, and appear as errors in the {kib} logs.
============================================================================
==== Creating a new connector
New connectors can be created by clicking the *Create connector* button, which will guide you to select the type of connector and configure it's properties. Refer to <<action-types>> for the types of connectors available and how to configure them. Once you create a connector it will be made available to you anytime you set up an action in the current space.

Binary file not shown.

After

Width:  |  Height:  |  Size: 160 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

View file

@ -27,7 +27,7 @@ If not set, {kib} will generate a random key on startup, but all alert and actio
Although the key can be specified in clear text in `kibana.yml`, it's recommended to store this key securely in the <<secure-settings,{kib} Keystore>>.
[float]
[[alert-settings]]
[[action-settings]]
==== Action settings
`xpack.actions.whitelistedHosts`::
@ -41,7 +41,7 @@ A list of action types that are enabled. It defaults to `[*]`, enabling all type
Disabled action types will not appear as an option when creating new connectors, but existing connectors and actions of that type will remain in {kib} and will not function.
[float]
[[action-settings]]
[[alert-settings]]
==== Alert settings
You do not need to configure any additional settings to use alerting in {kib}.

View file

@ -0,0 +1,182 @@
[role="xpack"]
[[action-types]]
== Action and connector types
{kib} provides the following types of actions:
* <<email-action-type, Email>>
* <<index-action-type, Index>>
* <<pagerduty-action-type, PagerDuty>>
* <<server-log-action-type, ServerLog>>
* <<slack-action-type, Slack>>
* <<webhook-action-type, Webhook>>
This section describes how to configure connectors and actions for each type.
[NOTE]
==============================================
Some action types are paid commercial features, while others are free.
For a comparison of the Elastic license levels,
see https://www.elastic.co/subscriptions[the subscription page].
==============================================
[float]
[[email-action-type]]
=== Email
The email action type uses the SMTP protocol to send mail message, using an integration of https://nodemailer.com/[Nodemailer]. Email message text is sent as both plain text and html text.
[float]
[[email-connector-configuration]]
==== Connector configuration
Email connectors have the following configuration properties:
Name:: The name of the connector. The name is used to identify a connector in the management UI connector listing, or in the connector list when configuring an action.
Sender:: The from address for all emails sent with this connector, specified in `user@host-name` format.
Host:: Host name of the service provider. If you are using the <<action-settings, `xpack.actions.whitelistedHosts`>> setting, make sure this hostname is whitelisted.
Port:: The port to connect to on the service provider.
Secure:: If true the connection will use TLS when connecting to the service provider. See https://nodemailer.com/smtp/#tls-options[nodemailer TLS documentation] for more information.
Username:: username for 'login' type authentication.
Password:: password for 'login' type authentication.
[float]
[[email-action-configuration]]
==== Action configuration
Email actions have the following configuration properties:
To, CC, BCC:: Each is a list of addresses. Addresses can be specified in `user@host-name` format, or in `name <user@host-name>` format. One of To, CC, or BCC must contain an entry.
Subject:: The subject line of the email.
Message:: The message text of the email. Markdown format is supported.
[float]
[[index-action-type]]
=== Index
The index action type will index a document into {es}.
[float]
[[index-connector-configuration]]
==== Connector configuration
Index connectors have the following configuration properties:
Name:: The name of the connector. The name is used to identify a connector in the management UI connector listing, or in the connector list when configuring an action.
Index:: The {es} index to be written to.
Refresh:: Setting for the {ref}/docs-refresh.html[refresh] policy for the write request.
Execution time field:: This field will be automatically set to the time the alert condition was detected.
[float]
[[index-action-configuration]]
==== Action configuration
Index actions have the following properties:
Document:: The document to index in json format.
[float]
[[pagerduty-action-type]]
=== PagerDuty
The PagerDuty action type uses the https://v2.developer.pagerduty.com/docs/events-api-v2[v2 Events API] to trigger, acknowledge, and resolve PagerDuty alerts.
[float]
[[pagerduty-connector-configuration]]
==== Connector configuration
PagerDuty connectors have the following configuration properties:
Name:: The name of the connector. The name is used to identify a connector in the management UI connector listing, or in the connector list when configuring an action.
API URL:: An optional PagerDuty event URL. Defaults to `https://events.pagerduty.com/v2/enqueue`. If you are using the <<action-settings, `xpack.actions.whitelistedHosts`>> setting, make sure the hostname is whitelisted.
Routing Key:: A 32 character PagerDuty Integration Key for an integration on a service or on a global ruleset.
[float]
[[pagerduty-action-configuration]]
==== Action configuration
PagerDuty actions have the following properties:
Severity:: The perceived severity of on the affected system. This can be one of `Critical`, `Error`, `Warning` or `Info`(default).
Event action:: One of `Trigger` (default), `Resolve`, or `Acknowledge`. See https://v2.developer.pagerduty.com/docs/events-api-v2#event-action[event action] for more details.
Dedup Key:: All actions sharing this key will be associated with the same PagerDuty alert. This value is used to correlate trigger and resolution. This value is *optional*, and if unset defaults to `action:<action saved object id>`. The maximum length is *255* characters. See https://v2.developer.pagerduty.com/docs/events-api-v2#alert-de-duplication[alert deduplication] for details.
Timestamp:: An *optional* https://v2.developer.pagerduty.com/v2/docs/types#datetime[ISO-8601 format date-time], indicating the time the event was detected or generated.
Component:: An *optional* value indicating the component of the source machine that is responsible for the event, for example `mysql` or `eth0`.
Group:: An *optional* value indicating the logical grouping of components of a service, for example `app-stack`.
Source:: An *optional* value indicating the affected system, preferably a hostname or fully qualified domain name. Defaults to the {kib} saved object id of the action.
Summary:: An *optional* text summary of the event, defaults to `No summary provided`. The maximum length is 1024 characters.
Class:: An *optional* value indicating the class/type of the event, for example `ping failure` or `cpu load`.
For more details on these properties, see https://v2.developer.pagerduty.com/v2/docs/send-an-event-events-api-v2[PagerDuty v2 event parameters].
[float]
[[server-log-action-type]]
=== Server log
This action type writes and entry to the {kib} server log.
[float]
[[server-log-connector-configuration]]
==== Connector configuration
Server log connectors have the following configuration properties:
Name:: The name of the connector. The name is used to identify a connector in the management UI connector listing, or in the connector list when configuring an action.
[float]
[[server-log-action-configuration]]
==== Action configuration
Server log actions have the following properties:
Message:: The message to log.
[float]
[[slack-action-type]]
=== Slack
The Slack action type uses https://api.slack.com/incoming-webhooks[Slack Incoming Webhooks].
[float]
[[slack-connector-configuration]]
==== Connector configuration
Slack connectors have the following configuration properties:
Name:: The name of the connector. The name is used to identify a connector in the management UI connector listing, or in the connector list when configuring an action.
Webhook URL:: The URL of the incoming webhook. See https://api.slack.com/messaging/webhooks#getting_started[Slack Incoming Webhooks] for instructions on generating this URL. If you are using the <<action-settings, `xpack.actions.whitelistedHosts`>> setting, make sure the hostname is whitelisted.
[float]
[[slack-action-configuration]]
==== Action configuration
Slack actions have the following properties:
Message:: The message text, converted to the `text` field in the Webhook JSON payload. Currently only the text field is supported. Markdown, images, and other advanced formatting are not yet supported.
[float]
[[webhook-action-type]]
=== Webhook
The Webhook action type uses https://github.com/axios/axios[axios] to send a POST or PUT request to a web service.
[float]
[[webhook-connector-configuration]]
==== Connector configuration
Webhook connectors have the following configuration properties:
Name:: The name of the connector. The name is used to identify a connector in the management UI connector listing, or in the connector list when configuring an action.
URL:: The request URL. If you are using the <<action-settings, `xpack.actions.whitelistedHosts`>> setting, make sure the hostname is whitelisted.
Method:: HTTP request method, either `post`(default) or `put`.
Headers:: A set of key-value pairs sent as headers with the request
User:: An optional username. If set, HTTP basic authentication is used. Currently only basic authentication is supported.
Password:: An optional password. If set, HTTP basic authentication is used. Currently only basic authentication is supported.
[float]
[[webhook-action-configuration]]
==== Action configuration
Webhook actions have the following properties:
Body:: A json payload sent to the request URL.

View file

@ -0,0 +1,115 @@
[role="xpack"]
[[alert-types]]
== Alert types
{kib} supplies alerts types in two ways: some are built into {kib}, while domain-specific alert types are registered by {kib} apps such as <<xpack-apm,*APM*>>, <<xpack-infra,*Metrics*>>, and <<xpack-uptime,*Uptime*>>.
This section covers built-in alert types. For domain-specific alert types, refer to the documentation for that app.
Currently {kib} provides one built-in alert type: the <<alert-type-index-threshold>> type.
[float]
[[alert-type-index-threshold]]
=== Index threshold
The index threshold alert type is designed to run an {es} query over indices, aggregating field values from documents, comparing them to threshold values, and scheduling actions to run when the thresholds are met.
[float]
==== Creating the alert
An index threshold alert can be created from the *Create* button in the <<alert-management, alert management UI>>. Fill in the <<defining-alerts-general-details, general alert details>>, then select *Index Threshold*.
[role="screenshot"]
image::images/alert-types-index-threshold-select.png[Choosing an index threshold alert type]
[float]
==== Defining the conditions
The index threshold has 5 clauses that define the condition to detect.
[role="screenshot"]
image::images/alert-types-index-threshold-conditions.png[Five clauses define the condition to detect]
Index:: This clause requires an *index or index pattern* and a *time field* that will be used for the *time window*.
When:: This clause specifies how the value to be compared to the threshold is calculated. The value is calculated by aggregating a numeric field a the *time window*. The aggregation options are: `count`, `average`, `sum`, `min`, and `max`. When using `count` the document count is used, and an aggregation field is not necessary.
Over/Grouped Over:: This clause lets you configure whether the aggregation is applied over all documents, or should be split into groups using a grouping field. If grouping is used, an <<alerting-concepts-alert-instances, alert instance>> will be created for each group when it exceeds the threshold. To limit the number of instances on high cardinality fields, you must specify the number of groups to check against the threshold. Only the *top* groups are checked.
Threshold:: This clause defines a threshold value and a comparison operator (one of `is above`, `is above or equals`, `is below`, `is below or equals`, or `is between`). The result of the aggregation is compared to this threshold.
Time window:: This clause determines how far back to search for documents, using the *time field* set in the *index* clause. Generally this value should be to a value higher than the *check every* value in the <<defining-alerts-general-details, general alert details>>, to avoid gaps in detection.
If data is available and all clauses have been defined, a preview chart will render the threshold value and display a line chart showing the value for the last 30 intervals. This can provide an indication of recent values and their proximity to the threshold, and help you tune the clauses.
[role="screenshot"]
image::images/alert-types-index-threshold-preview.png[Five clauses define the condition to detect]
[float]
=== Example
In this section, you will use the {kib} <<add-sample-data, weblog sample dataset>> to setup and tune the conditions on an index threshold alert. For this example, we want to detect when any of our top three sites have served more than 420,000 bytes over a 24 hour period.
From the <<alert-management, alert management UI>>, create a new alert, and fill in the <<defining-alerts-general-details, general alert details>>. This alert will be checked every 4 hours, and will not execute actions more than once per day. Choose the index threshold alert type.
[role="screenshot"]
image::images/alert-types-index-threshold-select.png[Choosing an index threshold alert type]
Click on each clause to open a control that helps you set the value:
[float]
==== Index clause
The index clause control will list and allow you to search for available indices. Choose *kibana_sample_data_logs*
[role="screenshot"]
image::images/alert-types-index-threshold-example-index.png[Choosing an index]
Once an index is selected, the list of time fields for that index will be available to select. Choose *@timestamp*.
[role="screenshot"]
image::images/alert-types-index-threshold-example-timefield.png[Choosing a time field]
[float]
==== When clause
We want to detect the number of bytes served during the time window, so we select `sum` as the aggregation, and `bytes` as the field to aggregate.
[role="screenshot"]
image::images/alert-types-index-threshold-example-aggregation.png[Choosing the aggregation]
[float]
==== Over/Grouped over clause
We want to alert on the three sites that have the most traffic, so we'll group the sum of bytes by the `host.keyword` field and take the top 3 values.
[role="screenshot"]
image::images/alert-types-index-threshold-example-grouping.png[Choosing the groups]
[float]
==== Threshold clause
We want to alert when any site exceeds 420,000 bytes over a 24 hour period, so we'll set the threshold to 420,000 and use the `is above` comparison.
[role="screenshot"]
image::images/alert-types-index-threshold-example-threshold.png[Setting the threshold]
[float]
==== Time window clause
Finally, set the time window to 24 hours to complete the alert configuration.
[role="screenshot"]
image::images/alert-types-index-threshold-example-window.png[Setting the time window]
The preview chart will render showing the 24 hour sum of bytes at 4 hours intervals (the *check every* interval) for the past 120 hours (the last 30 intervals).
[role="screenshot"]
image::images/alert-types-index-threshold-example-preview.png[Setting the time window]
[float]
==== Comparing time windows
You can interactively change the time window and observe the effect it has on the chart. Compare a 24 window to a 12 hour window. Notice the variability in the sum of bytes, due to different traffic levels during the day compared to at night. This variability would result in noisy alerts, so the 24 hour window is better. The preview chart can help you find the right values for your alert.
[role="screenshot"]
image::images/alert-types-index-threshold-example-comparison.png[Comparing two time windows]

View file

@ -0,0 +1,28 @@
[role="xpack"]
[[alerting-scale-performance]]
== Scale and performance
{kib} alerting run both alert checks and actions as persistent background tasks. This has two major benefits:
* *Persistence*: all task state and scheduling is stored in {es}, so if {kib} is restarted, alerts and actions will pick up where they left off.
* *Scaling*: multiple {kib} instances can read from and update the same task queue in {es}, allowing the alerting and action load to be distributed across instances. In cases where a {kib} instance no longer has capacity to run alert checks or actions, capacity can be increased by adding additional {kib} instances.
[float]
=== Running background alert checks and actions
{kib} background tasks are managed by:
* Polling an {es} task index for overdue tasks at 3 second intervals.
* Tasks are then claiming them by updating them in the {es} index, using optimistic concurrency control to prevent conflicts. Each {kib} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval.
* Tasks are run on the {kib} server.
* In the case of alerts which are recurring background checks, upon completion the task is scheduled again according to the <<defining-alerts-general-details, check interval>>.
[IMPORTANT]
==============================================
Because tasks are polled at 3 second intervals and only 10 tasks can run concurrently per {kib} instance, it is possible for alert and action tasks to be run late. This can happen if:
* Alerts use a small *check interval*. The lowest interval possible is 3 seconds, though intervals of 30 seconds or higher are recommended.
* Many alerts or actions must be *run at once*. In this case pending tasks will queue in {es}, and be pulled 10 at a time from the queue at 3 second intervals.
* *Long running tasks* occupy slots for an extended time, leaving fewer slots for other tasks.
==============================================

View file

@ -0,0 +1,80 @@
[role="xpack"]
[[defining-alerts]]
== Defining alerts
{kib} alerts can be created in a variety of apps including <<xpack-apm,*APM*>>, <<xpack-infra,*Metrics*>>, <<xpack-siem,*SIEM*>>, <<xpack-uptime,*Uptime*>> and from <<management,*Management*>> UI. While alerting details may differ from app to app, they share a common interface for defining and configuring alerts that this section describes in more detail.
[float]
=== Alert flyout
When an alert is created in an app, the app will display a flyout panel with three main sections to configure:
. <<defining-alerts-general-details, General alert details>>
. <<defining-alerts-type-conditions, Alert type and conditions>>
. <<defining-alerts-actions-details, Action type and action details>>
image::images/alert-flyout-sections.png[The three sections of an alert definition]
[float]
[[defining-alerts-general-details]]
=== General alert details
All alert share the following four properties in common:
[role="screenshot"]
image::images/alert-flyout-general-details.png[All alerts have name, tags, check every, and re-notify every properties in common]
Name:: The name of the alert. While this name does not have to be unique, the name can be referenced in actions and also appears in the searchable alert listing in the management UI. A distinctive name can help identify and find an alert.
Tags:: A list of tag names that can be applied to an alert. Tags can help you organize and find alerts, because tags appear in the alert listing in the management UI which is searchable by tag.
Check every:: This value determines how frequently the alert conditions below are checked. Note that the timing of background alert checks are not guaranteed, particularly for intervals of less than 10 seconds. See <<alerting-scale-performance>> for more information.
Re-notify every:: This value limits how often actions are repeated when an alert instance remains active across alert checks. See <<alerting-concepts-suppressing-duplicate-notifications>> for more information.
[float]
[[defining-alerts-type-conditions]]
=== Alert type and conditions
Depending upon the {kib} app and context, you may be prompted to choose the type of alert you wish to create. Some apps will pre-select the type of alert for you.
[role="screenshot"]
image::images/alert-flyout-alert-type-selection.png[Choosing the type of alert to create]
Each alert type provides its own way of defining the conditions to detect, but an expression formed by a series of clauses is a common pattern. Each clause has a UI control that allows you to define the clause. For example, in an index threshold alert the `WHEN` clause allows you to select an aggregation operation to apply to a numeric field.
[role="screenshot"]
image::images/alert-flyout-alert-conditions.png[UI for defining alert conditions on an index threshold alert]
[float]
[[defining-alerts-actions-details]]
=== Action type and action details
To add an action to an alert, you first select the type of action:
[role="screenshot"]
image::images/alert-flyout-action-type-selection.png[UI for selecting an action type]
Each action must specify a <<alerting-concepts-connectors, connector>> instance. If no connectors exist for that action type, click "Add new" to create one.
Each action type exposes different properties. For example an email action allows you to set the recipients, the subject, and a message body in markdown format. See <<action-types>> for details on the types of actions provided by {kib} and their properties.
[role="screenshot"]
image::images/alert-flyout-action-details.png[UI for defining an email action]
Using the https://mustache.github.io/[Mustache] template syntax `{{variable name}}`, you can pass alert values at the time a condition is detected to an action. Available variables differ by alert type, and a list can be accessed using the "add variable" button at the right of the text box.
[role="screenshot"]
image::images/alert-flyout-action-variables.png[Passing alert values to an action]
You can attach more than one action. Clicking the "Add action" button will prompt you to select another alert type and repeat the above steps again.
[role="screenshot"]
image::images/alert-flyout-add-action.png[You can add multiple actions on an alert]
[NOTE]
==============================================
Actions are not required on alerts. In some cases you may want to run an alert without actions first to understand its behavior, and configure actions later.
==============================================
[float]
=== Managing alerts
To modify an alert after it was created, including muting or disabling it, use the <<alert-management, alert listing in the Management UI>>.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 153 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 191 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 258 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 275 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 224 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 88 KiB

View file

@ -0,0 +1,202 @@
[role="xpack"]
[[alerting-getting-started]]
= Alerting and Actions
beta[]
--
Alerting allows you to detect complex conditions within different {kib} apps and trigger actions when those conditions are met. Alerting is integrated with <<xpack-apm,*APM*>>, <<xpack-infra,*Metrics*>>, <<xpack-siem,*SIEM*>>, <<xpack-uptime,*Uptime*>>, can be centrally managed from the <<management,*Management*>> UI, and provides a set of built-in <<action-types, actions>> and <<alert-types, alerts>> for you to use.
image::images/alerting-overview.png[Alerts and actions UI]
[IMPORTANT]
==============================================
To make sure you can access alerting and actions, see the <<alerting-setup-prerequisites, setup and pre-requisites>> section.
==============================================
[float]
== Concepts and terminology
*Alerts* work by running checks on a schedule to detect conditions. When a condition is met, the alert tracks it as an *alert instance* and responds by triggering one or more *actions*.
Actions typically involve interaction with {kib} services or third party integrations. *Connectors* allow actions to talk to these services and integrations.
This section describes all of these elements and how they operate together.
[float]
=== What is an alert?
An alert specifies a background task that runs on the {kib} server to check for specific conditions. It consists of three main parts:
* *Conditions*: what needs to be detected?
* *Schedule*: when/how often should detection checks run?
* *Actions*: what happens when a condition is detected?
For example, when monitoring a set of servers, an alert might check for average CPU usage > 0.9 on each server for the two minutes (condition), checked every minute (schedule), sending a warning email message via SMTP with subject `CPU on {{server}} is high` (action).
image::images/what-is-an-alert.svg[Three components of an alert]
The following sections each part of the alert is described in more detail.
[float]
[[alerting-concepts-conditions]]
==== Conditions
Under the hood, {kib} alerts detect conditions by running javascript function on the {kib} server, which gives it flexibility to support a wide range of detections, anything from the results of a simple {es} query to heavy computations involving data from multiple sources or external systems.
These detections are packaged and exposed as *alert types*. An alert type hides the underlying details of the detection, and exposes a set of parameters
to control the details of the conditions to detect.
For example, an <<alert-types, index threshold alert type>> lets you specify the index to query, an aggregation field, and a time window, but the details of the underlying {es} query are hidden.
See <<alert-types>> for the types of alerts provided by {kib} and how they express their conditions.
[float]
[[alerting-concepts-scheduling]]
==== Schedule
Alert schedules are defined as an interval between subsequent checks, and can range from a few seconds to months.
[IMPORTANT]
==============================================
The intervals of alert checks in {kib} are approximate, their timing of their execution is affected by factors such as the frequency at which tasks are claimed and the task load on the system. See <<alerting-scale-performance>> for more information.
==============================================
[float]
[[alerting-concepts-actions]]
==== Actions
Actions are invocations of {kib} services or integrations with third-party systems, that run as background tasks on the {kib} server when alert conditions are met.
When defining actions in an alert, you specify
* the *action type*: the type of service or integration to use>
* the connection for that type by referencing a <<alerting-concepts-connectors, connector>>.
* a mapping of alert values to properties exposed for that type of action.
The result is a template: all the parameters needed to invoke a service are supplied except for specific values that are only known at the time the alert condition is detected.
In the server monitoring example, the `email` action type is used, and `server` is mapped to the body of the email, using the template string `CPU on {{server}} is high`.
When the alert detects the condition, it creates an <<alerting-concepts-alert-instances, alert instance>> containing the details of the condition, renders the template with these details such as server name, and executes the action on the {kib} server by invoking the `email` action type.
image::images/what-is-an-action.svg[Actions are like templates that are rendered when an alert detects a condition]
See <<action-types>> for details on the types of actions provided by {kib}.
[float]
[[alerting-concepts-alert-instances]]
=== Alert instances
When checking for a condition, an alert might identify multiple occurrences of the condition. {kib} tracks each of these *alert instances* separately and takes action per instance.
Using the server monitoring example, each server with average CPU > 0.9 is tracked as an alert instance. This means a separate email is sent for each server that exceeds the threshold.
image::images/alert-instances.svg[{kib} tracks each detected condition as an alert instance and takes action on each instance]
[float]
[[alerting-concepts-suppressing-duplicate-notifications]]
=== Suppressing duplicate notifications
Since actions are taken per instance, alerts can end up generating a large number of actions. Take the following example where an alert is monitoring three servers every minute for CPU usage > 0.9:
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
* Minute 2: X123 and Y456 > 0.9. *Two emails* are sent, on for X123 and one for Y456.
* Minute 3: X123, Y456, Z789 > 0.9. *Three emails* are sent, one for each of X123, Y456, Z789.
In the above example, three emails are sent for server X123 in the span of 3 minutes for the same condition. Often it's desirable to suppress frequent re-notification. Operations like muting and re-notification throttling can be applied at the instance level. If we set the alert re-notify interval to 5 minutes, we reduce noise by only getting emails for new servers that exceed the threshold:
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
* Minute 2: X123 and Y456 > 0.9. *One email* is sent for Y456
* Minute 3: X123, Y456, Z789 > 0.9. *One email* is sent for Z789.
[float]
[[alerting-concepts-connectors]]
=== Connectors
Actions often involve connecting with services inside {kib} or integrations with third-party systems.
Rather than repeatedly entering connection information and credentials for each action, {kib} simplifies action setup using *connectors*.
*Connectors* provide a central place to store connection information for services and integrations. For example if four alerts send email notifications via the same SMTP service,
they all reference the same SMTP connector. When the SMTP settings change they are updated once in the connector, instead of having to update four alerts.
image::images/alert-concepts-connectors.svg[Connectors provide a central place to store service connection settings]
[float]
=== Summary
An _alert_ consists of conditions, _actions_, and a schedule. When conditions are met, _alert instances_ are created that render _actions_ and invoke them. To make action setup and update easier, actions refer to _connectors_ that centralize the information used to connect with {kib} services and third-party integrations.
image::images/alert-concepts-summary.svg[Alerts, actions, alert instances and connectors work together to convert detection into action]
* *Alert*: a specification of the conditions to be detected, the schedule for detection, and the response when detection occurs.
* *Action*: the response to a detected condition defined in the alert. Typically actions specify a service or third party integration along with alert details that will be sent to it.
* *Alert instance*: state tracked by {kib} for every occurrence of a detected condition. Actions as well as controls like muting and re-notification are controlled at the instance level.
* *Connector*: centralized configurations for services and third party integration that are referenced by actions.
[float]
[[alerting-concepts-differences]]
== Differences from Watcher
{kib} alerting and <<watcher-ui, {es} alerting>> are both used to detect conditions and can trigger actions in response, but they are completely independent alerting systems.
This section will clarify some of the important differences in the function and intent of the two systems.
Functionally, {kib} alerting differs in that:
* Scheduled checks are run on {kib} instead of {es}
* {kib} <<alerting-concepts-conditions, alerts hide the details of detecting conditions>> through *alert types*, whereas watches provide low-level control over inputs, conditions, and transformations.
* {kib} alerts tracks and persists the state of each detected condition through *alert instances*. This makes it possible to mute and throttle individual instances, and detect changes in state such as resolution.
* Actions are linked to *alert instances* in {kib} alerting. Actions are fired for each occurrence of a detected condition, rather than for the entire alert.
At a higher level, {kib} alerts allow rich integrations across use cases like <<xpack-apm,*APM*>>, <<xpack-infra,*Metrics*>>, <<xpack-siem,*SIEM*>>, and <<xpack-uptime,*Uptime*>>.
Pre-packaged *alert types* simplify setup, hide the details complex domain-specific detections, while providing a consistent interface across {kib}.
[float]
[[alerting-setup-prerequisites]]
== Setup and prerequisites
If you are using an *on-premises* Elastic Stack deployment with <<using-kibana-with-security, *security*>>:
* TLS must be configured for communication <<configuring-tls-kib-es, between {es} and {kib}>>. {kib} alerting uses <<api-keys, API keys>> to secure background alert checks and actions, and API keys require {ref}/configuring-tls.html#tls-http[TLS on the HTTP interface].
* In the kibana.yml configuration file, add the <<alert-action-settings-kb,`xpack.encrypted_saved_objects.encryptionKey` setting>>
[float]
[[alerting-security]]
== Security
To access alerting in a space, a user must have access to one of the following features:
* <<xpack-apm,*APM*>>
* <<xpack-infra,*Metrics*>>
* <<xpack-siem,*SIEM*>>
* <<xpack-uptime,*Uptime*>>
See <<kibana-feature-privileges, feature privileges>> for more information on configuring roles that provide access to these features.
[float]
[[alerting-spaces]]
=== Space isolation
Alerts and connectors are isolated to the {kib} space in which they were created. An alert or connector created in one space will not be visible in another.
[float]
[[alerting-authorization]]
=== Authorization
Alerts, including all background detection and the actions they generate are authorized using an <<api-keys, API key>> associated with the last user to edit the alert. Upon creating or modifying an alert, an API key is generated for that user, capturing a snapshot of their privileges at that moment in time. The API key is then used to run all background tasks associated with the alert including detection checks and executing actions.
[IMPORTANT]
==============================================
If an alert requires certain privileges to run such as index privileges, keep in mind that if a user without those privileges updates the alert, the alert will no longer function.
==============================================
[float]
[[alerting-restricting-actions]]
=== Restricting actions
For security reasons you may wish to limit the extent to which {kib} can connect to external services. <<action-settings>> allows you to disable certain <<action-types>> and whitelist the hostnames that {kib} can connect with.
--
include::defining-alerts.asciidoc[]
include::action-types.asciidoc[]
include::alert-types.asciidoc[]
include::alerting-scale-performance.asciidoc[]

View file

@ -40,6 +40,8 @@ include::management.asciidoc[]
include::reporting/index.asciidoc[]
include::alerting/index.asciidoc[]
include::api.asciidoc[]
include::plugins.asciidoc[]

View file

@ -83,6 +83,10 @@ a| <<advanced-options, *Advanced Settings*>>
Customize {kib} to suit your needs. Change the format for displaying dates, turn on dark mode,
set the timespan for notification messages, and much more.
| <<managing-alerts-and-actions, *Alerts and Actions*>>
Centrally manage your alerts from across {kib}. Create and manage re-usable connectors for triggering actions.
| <<managing-fields, *Index Patterns*>>
Create and manage the index patterns that help you retrieve your data from {es}.
@ -111,6 +115,14 @@ so you can tailor it to your needs without impacting others.
include::{kib-repo-dir}/management/advanced-options.asciidoc[]
include::{kib-repo-dir}/management/alerting/alerts-and-actions-intro.asciidoc[]
include::{kib-repo-dir}/management/alerting/alert-management.asciidoc[]
include::{kib-repo-dir}/management/alerting/alert-details.asciidoc[]
include::{kib-repo-dir}/management/alerting/connector-management.asciidoc[]
include::{kib-repo-dir}/management/managing-beats.asciidoc[]
include::{kib-repo-dir}/management/index-lifecycle-policies/intro-to-lifecycle-policies.asciidoc[]