[DOCS] Move machine learning details out of Kibana Guide (#45855)

This commit is contained in:
Lisa Cawley 2019-09-17 14:00:46 -07:00 committed by GitHub
parent f6be95a751
commit c17188010c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 31 additions and 269 deletions

View file

@ -35,3 +35,18 @@ This page has moved. Please see <<infra-configure-source>>.
This page has moved. Please see <<xpack-logs-configuring>>.
[role="exclude",id="creating-df-kib"]
== Creating {transforms}
This page is deleted. Please see
{stack-ov}/ecommerce-dataframes.html[Transforming the eCommerce sample data].
[role="exclude",id="ml-jobs"]
== Creating {anomaly-jobs}
This page has moved. Please see {stack-ov}/create-jobs.html[Creating {anomaly-jobs}].
[role="exclude",id="job-tips"]
== Machine learning job tips
This page has moved. Please see {stack-ov}/job-tips.html[Machine learning job tips].

12
docs/user/extend.asciidoc Normal file
View file

@ -0,0 +1,12 @@
[[extend]]
= Extend your use case
[partintro]
--
//TBD
* <<xpack-ml>>
--
include::ml/index.asciidoc[]

View file

@ -16,7 +16,7 @@ include::dashboard.asciidoc[]
include::canvas.asciidoc[]
include::ml/index.asciidoc[]
include::extend.asciidoc[]
include::{kib-repo-dir}/maps/index.asciidoc[]

View file

@ -1,50 +0,0 @@
[role="xpack"]
[[creating-df-kib]]
== Creating {dataframe-transforms}
beta[]
You can create {stack-ov}/ml-dataframes.html[{dataframe-transforms}] in the
{kib} Machine Learning application.
[role="screenshot"]
image::user/ml/images/ml-definepivot.jpg["Defining a {dataframe} pivot"]
Select the index pattern or saved search you want to transform. To pivot your
data, you must group the data by at least one field and apply at least one
aggregation. The {dataframe} pivot preview on the right side provides a visual
verification.
Once you have created the pivot, add a job ID and define the index for the
transformed data (_target index_). If the target index does not exist, it will be
created automatically. You can optionally select to create a {kib} index pattern
for the target index. At the end of the process, a {dataframe} job is created as
a result.
[role="screenshot"]
image::user/ml/images/ml-jobid.jpg["Job ID and target index"]
After you create {dataframe} jobs, you can start, stop, and delete them
and explore their progress and statistics from the jobs list.
For a more detailed example of using {dataframes} with the {kib} sample data,
see {stack-ov}/ecommerce-dataframes.html[Transforming your data].
[NOTE]
===============================
If {stack} {security-features} are enabled, you must have appropriate authority
to work with {dataframes}. For example, there are built-in
`data_frame_transforms_admin` and `data_frame_transforms_user` roles that have
`manage_data_frame_transforms` and `monitor_data_frame_transforms` cluster
privileges respectively. See
{stack-ov}/built-in-roles.html[Built-in roles] and
{stack-ov}/security-privileges.html[Security privileges].
Depending on what tasks you perform, you might require additional privileges.
For example, to create a {dataframe-transform} and generate a new target index,
you need `manage_data_frame_transforms` cluster privileges, `read` and
`view_index_metadata` privileges on the source index, and `read`, `create_index`,
and `index` privileges on the target index. For more information, see the
authorization details for each {ref}/data-frame-apis.html[{dataframe} API].
===============================

View file

@ -1,83 +0,0 @@
[role="xpack"]
[[ml-jobs]]
== Creating {anomaly-jobs}
{anomaly-jobs-cap} contain the configuration information and metadata
necessary to perform an analytics task.
{kib} provides the following wizards to make it easier to create jobs:
[role="screenshot"]
image::user/ml/images/ml-create-job.jpg[Create New Job]
A _single metric job_ is a simple job that contains a single _detector_. A
detector defines the type of analysis that will occur and which fields to
analyze. In addition to limiting the number of detectors, the single metric job
creation wizard omits many of the more advanced configuration options.
A _multi-metric job_ can contain more than one detector, which is more efficient
than running multiple jobs against the same data.
A _population job_ detects activity that is unusual compared to the behavior of
the population. For more information, see
{stack-ov}/ml-configuring-pop.html[Performing population analysis].
An _advanced job_ can contain multiple detectors and enables you to configure all
job settings.
{kib} can also recognize certain types of data and provide specialized wizards
for that context. For example, if you
<<add-sample-data,added the sample web log data set>>, the following wizard
appears:
[role="screenshot"]
image::user/ml/images/ml-data-recognizer-sample.jpg[A screenshot of the {kib} sample data web log job creation wizard]
TIP: Alternatively, after you load a sample data set on the {kib} home page, you can click *View data* > *ML jobs*. There are {anomaly-jobs} for both the sample eCommerce orders data set and the sample web logs data set.
If you use {filebeat-ref}/index.html[{filebeat}]
to ship access logs from your
http://nginx.org/[Nginx] and https://httpd.apache.org/[Apache] HTTP servers to
{es} and store it using fields and datatypes from the
{ecs-ref}/ecs-reference.html[Elastic Common Schema (ECS)], the following wizards
appear:
[role="screenshot"]
image::user/ml/images/ml-data-recognizer-filebeat.jpg[A screenshot of the {filebeat} job creation wizards]
If you use {auditbeat-ref}/index.html[{auditbeat}] to audit process
activity on your systems, the following wizards appear:
[role="screenshot"]
image::user/ml/images/ml-data-recognizer-auditbeat.jpg[A screenshot of the {auditbeat} job creation wizards]
Likewise, if you use the {metricbeat-ref}/metricbeat-module-system.html[{metricbeat} system module] to monitor your servers, the following
wizards appear:
[role="screenshot"]
image::user/ml/images/ml-data-recognizer-metricbeat.jpg[A screenshot of the {metricbeat} job creation wizards]
These wizards create {anomaly-jobs}, dashboards, searches, and visualizations
that are customized to help you analyze your {auditbeat}, {filebeat}, and
{metricbeat} data.
[NOTE]
===============================
If your data is located outside of {es}, you cannot use {kib} to create
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
{anomal-detect-cap} is still possible, however, by using APIs to
create and manage jobs and post data to them. For more information, see
{ref}/ml-apis.html[{ml-cap} {anomaly-detect} APIs].
===============================
////
Ready to get some hands-on experience? See
{stack-ov}/ml-getting-started.html[Getting Started with Machine Learning].
The following video tutorials also demonstrate single metric, multi-metric, and
advanced jobs:
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
* https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning for the Elastic Stack: Detect Outliers in a Population]
////

Binary file not shown.

Before

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 204 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 156 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 319 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 237 KiB

View file

@ -1,9 +1,6 @@
[role="xpack"]
[[xpack-ml]]
= {ml-cap}
[partintro]
--
== {ml-cap}
As datasets increase in size and complexity, the human effort required to
inspect dashboards or maintain rules for spotting infrastructure problems,
@ -29,7 +26,7 @@ size). The *Data Visualizer* identifies the file format and field mappings. You
can then optionally import that data into an {es} index.
If you have a trial or platinum license, you can
<<ml-jobs,create {anomaly-jobs}>> and manage jobs and {dfeeds} from the *Job
create {anomaly-jobs} and manage jobs and {dfeeds} from the *Job
Management* pane:
[role="screenshot"]
@ -67,11 +64,6 @@ browser so that it does not block pop-up windows or create an exception for your
{kib} URL.
For more information about the {anomaly-detect} feature, see
https://www.elastic.co/what-is/elastic-stack-machine-learning and
{stack-ov}/xpack-ml.html[{ml-cap} {anomaly-detect}].
--
include::creating-jobs.asciidoc[]
include::job-tips.asciidoc[]
include::creating-df-kib.asciidoc[]

View file

@ -1,124 +0,0 @@
[role="xpack"]
[[job-tips]]
=== Machine learning job tips
++++
<titleabbrev>Job tips</titleabbrev>
++++
When you create an {anomaly-job} in {kib}, the job creation wizards can
provide advice based on the characteristics of your data. By heeding these
suggestions, you can create jobs that are more likely to produce insightful {ml}
results.
[[bucket-span]]
==== Bucket span
The bucket span is the time interval that {ml} analytics use to summarize and
model data for your job. When you create an {anomaly-job} in {kib}, you can
choose to estimate a bucket span value based on your data characteristics.
NOTE: The bucket span must contain a valid time interval. For more information,
see {ref}/ml-job-resource.html#ml-analysisconfig[Analysis configuration objects].
If you choose a value that is larger than one day or is significantly different
than the estimated value, you receive an informational message. For more
information about choosing an appropriate bucket span, see
{stack-ov}/ml-buckets.html[Buckets].
[[cardinality]]
==== Cardinality
If there are logical groupings of related entities in your data, {ml} analytics
can make data models and generate results that take these groupings into
consideration. For example, you might choose to split your data by user ID and
detect when users are accessing resources differently than they usually do.
If the field that you use to split your data has many different values, the
job uses more memory resources. In particular, if the cardinality of the
`by_field_name`, `over_field_name`, or `partition_field_name` is greater than
1000, you are advised that there might be high memory usage.
Likewise if you are performing population analysis and the cardinality of the
`over_field_name` is below 10, you are advised that this might not be a suitable
field to use. For more information, see
{stack-ov}/ml-configuring-pop.html[Performing Population Analysis].
[[detectors]]
==== Detectors
Each {anomaly-job} must have one or more _detectors_. A detector applies an
analytical function to specific fields in your data. If your job does not
contain a detector or the detector does not contain a
{stack-ov}/ml-functions.html[valid function], you receive an error.
If a job contains duplicate detectors, you also receive an error. Detectors are
duplicates if they have the same `function`, `field_name`, `by_field_name`,
`over_field_name` and `partition_field_name`.
[[influencers]]
==== Influencers
When you create an {anomaly-job}, you can specify _influencers_, which are also
sometimes referred to as _key fields_. Picking an influencer is strongly
recommended for the following reasons:
* It allows you to more easily assign blame for the anomaly
* It simplifies and aggregates the results
The best influencer is the person or thing that you want to blame for the
anomaly. In many cases, users or client IP addresses make excellent influencers.
Influencers can be any field in your data; they do not need to be fields that
are specified in your detectors, though they often are.
As a best practice, do not pick too many influencers. For example, you generally
do not need more than three. If you pick many influencers, the results can be
overwhelming and there is a small overhead to the analysis.
The job creation wizards in {kib} can suggest which fields to use as influencers.
[[model-memory-limits]]
==== Model memory limits
For each {anomaly-job}, you can optionally specify a `model_memory_limit`, which
is the approximate maximum amount of memory resources that are required for
analytical processing. The default value is 1 GB. Once this limit is approached,
data pruning becomes more aggressive. Upon exceeding this limit, new entities
are not modeled.
You can also optionally specify the `xpack.ml.max_model_memory_limit` setting.
By default, it's not set, which means there is no upper bound on the acceptable
`model_memory_limit` values in your jobs.
TIP: If you set the `model_memory_limit` too high, it will be impossible to open
the job; jobs cannot be allocated to nodes that have insufficient memory to run
them.
If the estimated model memory limit for an {anomaly-job} is greater than the
model memory limit for the job or the maximum model memory limit for the cluster,
the job creation wizards in {kib} generate a warning. If the estimated memory
requirement is only a little higher than the `model_memory_limit`, the job will
probably produce useful results. Otherwise, the actions you take to address
these warnings vary depending on the resources available in your cluster:
* If you are using the default value for the `model_memory_limit` and the {ml}
nodes in the cluster have lots of memory, the best course of action might be to
simply increase the job's `model_memory_limit`. Before doing this, however,
double-check that the chosen analysis makes sense. The default
`model_memory_limit` is relatively low to avoid accidentally creating a job that
uses a huge amount of memory.
* If the {ml} nodes in the cluster do not have sufficient memory to accommodate
a job of the estimated size, the only options are:
** Add bigger {ml} nodes to the cluster, or
** Accept that the job will hit its memory limit and will not necessarily find
all the anomalies it could otherwise find.
If you are using {ece} or the hosted Elasticsearch Service on Elastic Cloud,
`xpack.ml.max_model_memory_limit` is set to prevent you from creating jobs
that cannot be allocated to any {ml} nodes in the cluster. If you find that you
cannot increase `model_memory_limit` for your {ml} jobs, the solution is to
increase the size of the {ml} nodes in your cluster.
For more information about the `model_memory_limit` property and the
`xpack.ml.max_model_memory_limit` setting, see
{ref}/ml-job-resource.html#ml-analysisconfig[Analysis limits] and
{ref}/ml-settings.html[Machine learning settings].