kibana/docs/user/monitoring/kibana-alerts.asciidoc
Mat Schaffer 0ec5148c73
Fix grammar on stack monitoring alerts modal (#109360)
Pretty sure the rules only get created in the current kibana space, but the message is plural. Switching to singular.

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2021-08-23 13:06:24 +09:00

133 lines
5.8 KiB
Text

[role="xpack"]
[[kibana-alerts]]
= {kib} alerts
The {stack} {monitor-features} provide
<<alerting-getting-started,{kib} alerting rules>> out-of-the box to notify you
of potential issues in the {stack}. These rules are preconfigured based on the
best practices recommended by Elastic. However, you can tailor them to meet your
specific needs.
[role="screenshot"]
image::user/monitoring/images/monitoring-kibana-alerting-notification.png["{kib} alerting notifications in {stack-monitor-app}"]
When you open *{stack-monitor-app}* for the first time, you will be asked to acknowledge the creation of these default rules. They are initially configured to detect and notify on various
conditions across your monitored clusters. You can view notifications for: *Cluster health*, *Resource utilization*, and *Errors and exceptions* for {es}
in real time.
NOTE: The default {watcher} based "cluster alerts" for {stack-monitor-app} have
been recreated as rules in {kib} {alert-features}. For this reason, the existing
{watcher} email action
`monitoring.cluster_alerts.email_notifications.email_address` no longer works.
The default action for all {stack-monitor-app} rules is to write to {kib} logs
and display a notification in the UI.
To review and modify existing *{stack-monitor-app}* rules, click *Enter setup mode* on the *Cluster overview* page.
Alternatively, to manage all rules, including create and delete functionality go to *Stack Management > Rules and Connectors*.
[discrete]
[[kibana-alerts-cpu-threshold]]
== CPU usage threshold
This rule checks for {es} nodes that run a consistently high CPU load. By
default, the condition is set at 85% or more averaged over the last 5 minutes.
The default rule checks on a schedule time of 1 minute with a re-notify interval of 1 day.
[discrete]
[[kibana-alerts-disk-usage-threshold]]
== Disk usage threshold
This rule checks for {es} nodes that are nearly at disk capacity. By default,
the condition is set at 80% or more averaged over the last 5 minutes. The default rule
checks on a schedule time of 1 minute with a re-notify interval of 1 day.
[discrete]
[[kibana-alerts-jvm-memory-threshold]]
== JVM memory threshold
This rule checks for {es} nodes that use a high amount of JVM memory. By
default, the condition is set at 85% or more averaged over the last 5 minutes.
The default rule checks on a schedule time of 1 minute with a re-notify interval of 1 day.
[discrete]
[[kibana-alerts-missing-monitoring-data]]
== Missing monitoring data
This rule checks for {es} nodes that stop sending monitoring data. By default,
the condition is set to missing for 15 minutes looking back 1 day. The default rule checks on a schedule
time of 1 minute with a re-notify interval of 6 hours.
[discrete]
[[kibana-alerts-thread-pool-rejections]]
== Thread pool rejections (search/write)
This rule checks for {es} nodes that experience thread pool rejections. By
default, the condition is set at 300 or more over the last 5 minutes. The default rule
checks on a schedule time of 1 minute with a re-notify interval of 1 day. Thresholds can be set
independently for `search` and `write` type rejections.
[discrete]
[[kibana-alerts-ccr-read-exceptions]]
== CCR read exceptions
This rule checks for read exceptions on any of the replicated {es} clusters. The
condition is met if 1 or more read exceptions are detected in the last hour. The
default rule checks on a schedule time of 1 minute with a re-notify interval of 6 hours.
[discrete]
[[kibana-alerts-large-shard-size]]
== Large shard size
This rule checks for a large average shard size (across associated primaries) on
any of the specified index patterns in an {es} cluster. The condition is met if
an index's average shard size is 55gb or higher in the last 5 minutes. The default rule
matches the pattern of `-.*` by running checks on a schedule time of 1 minute with a re-notify interval of 12 hours.
[discrete]
[[kibana-alerts-cluster-alerts]]
== Cluster alerting
These rules check the current status of your {stack}. You can drill down into
the metrics to view more information about your cluster and specific nodes, instances, and indices.
An action is triggered if any of the following conditions are met within the
last minute:
* {es} cluster health status is yellow (missing at least one replica)
or red (missing at least one primary).
* {es} version mismatch. You have {es} nodes with
different versions in the same cluster.
* {kib} version mismatch. You have {kib} instances with different
versions running against the same {es} cluster.
* Logstash version mismatch. You have Logstash nodes with different
versions reporting stats to the same monitoring cluster.
* {es} nodes changed. You have {es} nodes that were recently added or removed.
* {es} license expiration. The cluster's license is about to expire.
+
--
If you do not preserve the data directory when upgrading a {kib} or
Logstash node, the instance is assigned a new persistent UUID and shows up
as a new instance.
--
* Subscription license expiration. When the expiration date
approaches, you will get notifications with a severity level relative to how
soon the expiration date is:
** 60 days: Informational alert
** 30 days: Low-level alert
** 15 days: Medium-level alert
** 7 days: Severe-level alert
+
The 60-day and 30-day thresholds are skipped for Trial licenses, which are only
valid for 30 days.
[discrete]
== Alerts and rules
[discrete]
=== Create default rules
This option can be used to create default rules in this kibana space. This is
useful for scenarios when you didn't choose to create these default rules initially
or anytime later if the rules were accidentally deleted.
NOTE: Some action types are subscription features, while others are free.
For a comparison of the Elastic subscription levels, see the alerting section of
the {subscriptions}[Subscriptions page].