kibana/docs/user/alerting/alerting-getting-started.asciidoc
ymao1 a7c9d3f1e0
[Alerting] Update UI to reflect new terminology (#93597)
* Renaming alerts to rules

* Updating formatted messages

* Updating i18n labels

* Completed renaming in UI

* Updating client routes including redirect

* wip docs update

* Reverting title changes for now

* Fixing types check

* Fixing unit tests

* Fixing functional test

* Fixing functional test

* docs wip

* wip docs update

* Finished first run through docs

* docs docs docs

* Fixing bad merge

* Fixing functional test

* Docs cleanup

* Cleaning up i18n labels

* Fixing functional test

* Updating screenshots

* Updating screenshots

* Updating screenshots

* Updating terminology in alerting examples

* Updating terminology in alerting examples

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2021-03-15 10:03:39 -04:00

215 lines
13 KiB
Plaintext

[role="xpack"]
[[alerting-getting-started]]
= Alerting
--
Alerting allows you to define *rules* to detect complex conditions within different {kib} apps and trigger actions when those conditions are met. Alerting is integrated with {observability-guide}/create-alerts.html[*Observability*], {security-guide}/prebuilt-rules.html[*Security*], <<geo-alerting,*Maps*>> and {ml-docs}/ml-configuring-alerts.html[*{ml-app}*], can be centrally managed from the <<management,*Management*>> UI, and provides a set of built-in <<action-types, connectors>> and <<rule-types, rules>> (known as stack rules) for you to use.
image::images/alerting-overview.png[Rules and Connectors UI]
[IMPORTANT]
==============================================
To make sure you can access alerting and actions, see the <<alerting-setup-prerequisites, setup and pre-requisites>> section.
==============================================
[float]
== Concepts and terminology
Alerting works by running checks on a schedule to detect conditions defined by a *rule*. When a condition is met, the rule tracks it as an *alert* and responds by triggering one or more *actions*.
Actions typically involve interaction with {kib} services or third party integrations. *Connectors* allow actions to talk to these services and integrations.
This section describes all of these elements and how they operate together.
[float]
=== What is a rule?
A rule specifies a background task that runs on the {kib} server to check for specific conditions. It consists of three main parts:
* *Conditions*: what needs to be detected?
* *Schedule*: when/how often should detection checks run?
* *Actions*: what happens when a condition is detected?
For example, when monitoring a set of servers, a rule might check for average CPU usage > 0.9 on each server for the last two minutes (condition), checked every minute (schedule), sending a warning email message via SMTP with subject `CPU on {{server}} is high` (action).
image::images/what-is-a-rule.svg[Three components of a rule]
The following sections describe each part of the rule in more detail.
[float]
[[alerting-concepts-conditions]]
==== Conditions
Under the hood, {kib} rules detect conditions by running a javascript function on the {kib} server, which gives it the flexibility to support a wide range of conditions, anything from the results of a simple {es} query to heavy computations involving data from multiple sources or external systems.
These conditions are packaged and exposed as *rule types*. A rule type hides the underlying details of the condition, and exposes a set of parameters
to control the details of the conditions to detect.
For example, an <<rule-type-index-threshold, index threshold rule type>> lets you specify the index to query, an aggregation field, and a time window, but the details of the underlying {es} query are hidden.
See <<rule-types>> for the types of rules provided by {kib} and how they express their conditions.
[float]
[[alerting-concepts-scheduling]]
==== Schedule
Rule schedules are defined as an interval between subsequent checks, and can range from a few seconds to months.
[IMPORTANT]
==============================================
The intervals of rule checks in {kib} are approximate. The timing of their execution is affected by factors such as the frequency at which tasks are claimed and the task load on the system. See <<alerting-production-considerations, Alerting production considerations>> for more information.
==============================================
[float]
[[alerting-concepts-actions]]
==== Actions
Actions are invocations of connectors, which allow interaction with {kib} services or integrations with third-party systems. Actions run as background tasks on the {kib} server when rule conditions are met.
When defining actions in a rule, you specify:
* the *connector type*: the type of service or integration to use
* the connection for that type by referencing a <<alerting-concepts-connectors, connector>>
* a mapping of rule values to properties exposed for that type of action
The result is a template: all the parameters needed to invoke a service are supplied except for specific values that are only known at the time the rule condition is detected.
In the server monitoring example, the `email` connector type is used, and `server` is mapped to the body of the email, using the template string `CPU on {{server}} is high`.
When the rule detects the condition, it creates an <<alerting-concepts-alert-instances, alert>> containing the details of the condition, renders the template with these details such as server name, and executes the action on the {kib} server by invoking the `email` connector type.
image::images/what-is-an-action.svg[Actions are like templates that are rendered when an alert detects a condition]
See <<action-types>> for details on the types of connectors provided by {kib}.
[float]
[[alerting-concepts-alert-instances]]
=== Alerts
When checking for a condition, a rule might identify multiple occurrences of the condition. {kib} tracks each of these *alerts* separately and takes an action per alert.
Using the server monitoring example, each server with average CPU > 0.9 is tracked as an alert. This means a separate email is sent for each server that exceeds the threshold.
image::images/alerts.svg[{kib} tracks each detected condition as an alert and takes action on each alert]
[float]
[[alerting-concepts-suppressing-duplicate-notifications]]
=== Suppressing duplicate notifications
Since actions are executed per alert, a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9:
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
* Minute 2: X123 and Y456 > 0.9. *Two emails* are sent, one for X123 and one for Y456.
* Minute 3: X123, Y456, Z789 > 0.9. *Three emails* are sent, one for each of X123, Y456, Z789.
In the above example, three emails are sent for server X123 in the span of 3 minutes for the same rule. Often it's desirable to suppress frequent re-notification. Operations like muting and throttling can be applied at the alert level. If we set the rule re-notify interval to 5 minutes, we reduce noise by only getting emails for new servers that exceed the threshold:
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
* Minute 2: X123 and Y456 > 0.9. *One email* is sent for Y456.
* Minute 3: X123, Y456, Z789 > 0.9. *One email* is sent for Z789.
[float]
[[alerting-concepts-connectors]]
=== Connectors
Actions often involve connecting with services inside {kib} or integrating with third-party systems.
Rather than repeatedly entering connection information and credentials for each action, {kib} simplifies action setup using *connectors*.
*Connectors* provide a central place to store connection information for services and integrations. For example if four rules send email notifications via the same SMTP service, they can all reference the same SMTP connector. When the SMTP settings change, you can update them once in the connector, instead of having to update four rules.
image::images/rule-concepts-connectors.svg[Connectors provide a central place to store service connection settings]
[float]
=== Summary
A *rule* consists of conditions, *actions*, and a schedule. When conditions are met, *alerts* are created that render *actions* and invoke them. To make action setup and update easier, actions use *connectors* that centralize the information used to connect with {kib} services and third-party integrations. The following example ties these concepts together:
image::images/rule-concepts-summary.svg[Rules, connectors, alerts and actions work together to convert detection into action]
. Anytime a *rule*'s conditions are met, an *alert* is created. This example checks for servers with average CPU > 0.9. Three servers meet the condition, so three alerts are created.
. Alerts create *actions* as long as they are not muted or throttled. When actions are created, the template that was setup in the rule is filled with actual values. In this example, three actions are created, and the template string {{server}} is replaced with the server name for each alert.
. {kib} invokes the actions, sending them to a third party *integration* like an email service.
. If the third party integration has connection parameters or credentials, {kib} will fetch these from the *connector* referenced in the action.
[float]
[[alerting-concepts-differences]]
== Differences from Watcher
{kib} alerting and <<watcher-ui, {es} alerting>> are both used to detect conditions and can trigger actions in response, but they are completely independent alerting systems.
This section will clarify some of the important differences in the function and intent of the two systems.
Functionally, {kib} alerting differs in that:
* Scheduled checks are run on {kib} instead of {es}
* {kib} <<alerting-concepts-conditions, rules hide the details of detecting conditions>> through *rule types*, whereas watches provide low-level control over inputs, conditions, and transformations.
* {kib} rules track and persist the state of each detected condition through *alerts*. This makes it possible to mute and throttle individual alerts, and detect changes in state such as resolution.
* Actions are linked to *alerts* in {kib} alerting. Actions are fired for each occurrence of a detected condition, rather than for the entire rule.
At a higher level, {kib} alerting allows rich integrations across use cases like <<xpack-apm,*APM*>>, <<metrics-app,*Metrics*>>, <<xpack-siem,*Security*>>, and <<uptime-app,*Uptime*>>.
Pre-packaged *rule types* simplify setup and hide the details of complex, domain-specific detections, while providing a consistent interface across {kib}.
[float]
[[alerting-setup-prerequisites]]
== Setup and prerequisites
If you are using an *on-premises* Elastic Stack deployment:
* In the kibana.yml configuration file, add the <<general-alert-action-settings,`xpack.encryptedSavedObjects.encryptionKey`>> setting.
* For emails to have a footer with a link back to {kib}, set the <<server-publicBaseUrl, `server.publicBaseUrl`>> configuration setting.
If you are using an *on-premises* Elastic Stack deployment with <<using-kibana-with-security, *security*>>:
* You must enable Transport Layer Security (TLS) for communication <<configuring-tls-kib-es, between {es} and {kib}>>. {kib} alerting uses <<api-keys, API keys>> to secure background rule checks and actions, and API keys require {ref}/configuring-tls.html#tls-http[TLS on the HTTP interface]. A proxy will not suffice.
[float]
[[alerting-setup-production]]
== Production considerations and scaling guidance
When relying on alerting and actions as mission critical services, make sure you follow the <<alerting-production-considerations,Alerting production considerations>>.
See <<alerting-scaling-guidance>> for more information on the scalability of {kib} alerting.
[float]
[[alerting-security]]
== Security
To access alerting in a space, a user must have access to one of the following features:
* Alerting
* <<xpack-apm,*APM*>>
* <<logs-app,*Logs*>>
* <<xpack-ml,*{ml-cap}*>>
* <<metrics-app,*Metrics*>>
* <<xpack-siem,*Security*>>
* <<uptime-app,*Uptime*>>
See <<kibana-feature-privileges, feature privileges>> for more information on configuring roles that provide access to these features.
Also note that a user will need +read+ privileges for the *Actions and Connectors* feature to attach actions to a rule or to edit a rule that has an action attached to it.
[float]
[[alerting-spaces]]
=== Space isolation
Rules and connectors are isolated to the {kib} space in which they were created. A rule or connector created in one space will not be visible in another.
[float]
[[alerting-authorization]]
=== Authorization
Rules, including all background detection and the actions they generate are authorized using an <<api-keys, API key>> associated with the last user to edit the rule. Upon creating or modifying a rule, an API key is generated for that user, capturing a snapshot of their privileges at that moment in time. The API key is then used to run all background tasks associated with the rule including detection checks and executing actions.
[IMPORTANT]
==============================================
If a rule requires certain privileges to run, such as index privileges, keep in mind that if a user without those privileges updates the rule, the rule will no longer function.
==============================================
[float]
[[alerting-restricting-actions]]
=== Restricting actions
For security reasons you may wish to limit the extent to which {kib} can connect to external services. <<action-settings>> allows you to disable certain <<action-types>> and allowlist the hostnames that {kib} can connect with.
--