Commit graph

73 commits

Author SHA1 Message Date
Georgii Gorbachev d16101f377
[Event Log] Extend ECS event schema with fields needed for Detection Engine (#95067)
**Related to:** https://github.com/elastic/kibana/pull/94143

## Summary

This PR adds new fields to the schema (`EventSchema`, `IEvent`):

- standard ECS fields: `error.*`, `event.*`, `log.level`, `log.logger`, `rule.*`
- custom field set `kibana.detection_engine`

We need these fields on the Detections side to implement detection rule execution log. See the related proposal (https://github.com/elastic/kibana/pull/94143) for more details.

Also, this PR bumps ECS used in Event Log from `1.6.0` to the current `1.8.0` version. They are 100% same in terms of fields used in Event Log, so no changes in the schema were caused by this version increment.
2021-03-29 14:59:36 +02:00
Gidi Meir Morris 619db36591
[Task manager] Adds support for limited concurrency tasks (#90365)
Adds support for limited concurrency on a Task Type.
2021-02-11 14:46:14 +00:00
Brandon Kobel 4584a8b570
Elastic License 2.0 (#90099)
* Updating everything except the license headers themselves

* Applying ESLint rules

* Manually replacing the stragglers
2021-02-03 18:12:39 -08:00
Gidi Meir Morris c89f1f18d3
[Task Manager] Increment task attempts when they fail during markTaskAsRunning (#88669)
When something causes an exception in `TaskRunner.markTaskAsRunning()` its execution fails, but this happens before we update the SO, which means that this failure does not count towards the `attempts` on the task. Task Manager will continue to try running this task for ever.

This PR increments the `attempts` when a failure occurs during `TaskRunner.markTaskAsRunning()` to ensure such a task doesn't continue to run to infinity.
Note that this fix will not affect `scheduled` tasks, as they are designed to _ignore_ their `attempts` and run for ever. In such a case this task will continue to consume Task Manager resources until canceled, but these failures will be logged and could be identified when needed.
2021-01-21 14:04:42 +00:00
Tiago Costa 69182a8628
chore(NA): create new x-pack cigroups and rebalancing them all (#88366)
* chore(NA): create new x-pack cigroups and rebalancing them all

* chore(NA): better cigroups balancing

* chore(NA): push rollup tests back into ciGroup1

* chore(NA): move some functional ml tests from cigroup3 into cigroup13

* chore(NA): move some more tests into ciGroup13

* chore(NA): use a single top level describe at x-pack/test/functional/apps/ml

* chore(NA): move settings into ciGroup13

* temporary test for es snapshots env

* Revert "temporary test for es snapshots env"

This reverts commit 789ebe7b9c.

* docs(NA): add missing documentation on the function tests describe split

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2021-01-20 13:47:08 +00:00
Gidi Meir Morris 5e4402c374
[Alerting] Shift polling interval by random amount when Task Manager experiences consistent claim version conflicts (#88020)
This PR Introduces a `pollingDelay` which is applied to the polling interval whenever the average percentage of tasks experiencing a version conflict is higher than a preconfigured threshold (default to 80%).
2021-01-12 23:34:07 +00:00
Gidi Meir Morris f384c484b7
[Task Manager] adds additional polling stats to Task Manager monitoring (#87766)
Adds additional polling stats to Task Manager monitoring:

- **duration**: Running average of polling duration measuring the time from the scheduled polling cycle start until all claimed tasks are marked as running
- **claim_conflicts**: Running average of number of version clashes caused by the markAvailableTasksAsClaimed stage of the polling cycle
- **claim_mismatches**: Running average of mismatch between the number of tasks updated by the markAvailableTasksAsClaimed stage of the polling cycle and the number of docs found by the sweepForClaimedTasks stage
- **load** - Running average of the percentage of workers in use at the end of each polling cycle.
2021-01-11 18:32:24 +00:00
Gidi Meir Morris e0db4a3f0b
[Task Manager] adds more granular polling results to monitoring stats (#87494)
Added the following values to the Polling stats:

- **NoAvailableWorkers**: This tells us when a polling cycle resulted in no tasks being claimed due to there being no available workers 
- **RunningAtCapacity**: This tells us when a polling cycle resulted in tasks being claimed at 100% capacity of the available workers
- **Failed**: This tells us when the poller failed to claim
2021-01-06 18:00:52 +00:00
Tyler Smalley c5e9543fc9 Revert "chore(NA): rebalance x-pack cigroups (#85797)"
This reverts commit 1e3a483b06.
2020-12-16 15:28:53 -08:00
Tiago Costa 1e3a483b06
chore(NA): rebalance x-pack cigroups (#85797) 2020-12-16 09:58:46 -08:00
ymao1 cbc61afcce
[Task Manager] Skip removed task types when claiming tasks (#84273)
* Checking if task type is in registered list

* Loading esArchiver data with removed task type for testing

* PR fixes
2020-12-02 11:49:24 -05:00
Patrick Mueller 50dbe8f171
[event_log] index event docs in bulk instead of individually (redo) (#83927)
resolves #55634
resolves #65746

Buffers event docs being written for a fixed interval / buffer size,
and indexes those docs via a bulk ES call.

Also now flushing those buffers at plugin stop() time, which
we couldn't do before with the single index calls, which were
run via `setImmediate()`.

This is a redo of PR https://github.com/elastic/kibana/pull/80941 which
had to be reverted.
2020-11-20 13:49:30 -05:00
spalger 2fef237ca0 Revert "[event_log] index event docs in bulk instead of individually (#80941)"
This reverts commit 5bfe665028.
2020-11-19 19:15:58 -07:00
Patrick Mueller 5bfe665028
[event_log] index event docs in bulk instead of individually (#80941)
resolves https://github.com/elastic/kibana/issues/55634
resolves https://github.com/elastic/kibana/issues/65746

Buffers event docs being written for a fixed interval / buffer size,
and indexes those docs via a bulk ES call.

Also now flushing those buffers at plugin stop() time, which
we couldn't do before with the single index calls, which were
run via `setImmediate()`.
2020-11-19 20:21:34 -05:00
Gidi Meir Morris 3b0215c26b
[Task Manager] Ensures retries are inferred from the schedule of recurring tasks (#83682)
This addresses a bug in Task Manager in the task timeout behaviour. When a recurring task's `retryAt` field is set (which happens at task run), it is currently scheduled to the task definition's `timeout` value, but the original intention was for these tasks to retry on their next scheduled run (originally identified as part of https://github.com/elastic/kibana/issues/39349).

In this PR we ensure recurring task retries are scheduled according to their recurring schedule, rather than the default `timeout` of the task type.
2020-11-19 14:37:28 +00:00
Gidi Meir Morris 13fe95b400
Enables the EventLog Client to query across ILM versions of the .event-log index (#81920)
Fixes a bug in the EventLog client which caused it to query for events created in the current version instead of querying across versions.
2020-10-29 12:32:36 +00:00
ymao1 e7f425a8ab
Fixing flaky test (#81901) 2020-10-28 14:21:41 -04:00
Tyler Smalley 07930895cf skip flaky suite (#81853) 2020-10-27 14:22:33 -07:00
Gidi Meir Morris 5dfa45d666
[Task Manager] adds basic observability into Task Manager's runtime operations (#77868)
This PR adds an an internal monitoring mechanism in Task Manager which keep track of a variety of metrics and a health api endpoint which makes the monitored statistics accessible.
2020-10-27 15:58:04 +00:00
ymao1 e6ab812891
[Task Manager] Mark task as failed if maxAttempts has been met. (#80681)
* wip

* Adding updateFieldsAndMarkAsFailed function

* Updating UBQ

* Only updating retryAt if marking as claiming

* Updating query

* Updating query to only fail one time tasks that have exceeded max attempts

* Fixing tests

* Fixing tests

* Handling claiming tasks by id

* Removing unused function

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2020-10-27 07:40:44 -04:00
Gidi Meir Morris eb03295f85
[Task manager] Prevents edge case where already running tasks are reschedule every polling interval (#74606)
Fixes flaky tests in Task Manager and Alerting.

The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over.

This prevents that edge case which was effecting both TM in general and Alerting specifically.
2020-08-13 12:20:38 +01:00
spalger b001301f5a skip flaky suite (#71390)
(cherry picked from commit d0afbd887d)
2020-08-05 15:02:24 -07:00
Gidi Meir Morris 5c770e5930
[Task Manager] Correctly handle running tasks when calling RunNow and reduce flakiness in related tests (#73244)
This PR addresses two issues which caused several tests to be flaky in TM.

When `runNow` was introduced to TM we added a pinned query which returned specific tasks by ID.
This query does not have the filter applied to it which causes task to return when they're already marked as `running` but we didn't address these correctly which caused flakyness in the tests.
This didn't cause a broken beahviour, but it did cause beahviour that was hard to reason about - we now address them correctly.

It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user `refresh:true` on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.
2020-08-05 17:35:38 +01:00
Mikhail Shustov 88c0631344
Update @typescript-eslint to ensure compatibility with TypeScript v3.9 (#74091)
* bump @typescript-eslint deps

* update rules

* fix errors in pacakges

* fix src/

* fix x-pack

* fix test

* fix typings

* fix examples

* allow _ as prefix and suffix

* roll back prefix and suffix changes

* add eslint-plugin-eslint-comments

* report unused rules

* remove unused eslint comments from tests

* remove unused eslint comments 2nd pass

* remove unused eslint comments from src/

* remove unused comments in x-pack

* use no-script-url and no-unsanitized/property for ts files

* remove unused eslint comments

* eui/href-or-on-click removed when not complained

* no import/* rules for ts files

* cleanup

* remove the unused eslint-disable

* rollback unnecessary changes

* allow underscore prefix & sufix in type name

* update docs

* fix type error in enterprise search plugin mocks

* rename platform hack __coreProvider --> _coreProvider

* rollback space removal in src/core/public/legacy/legacy_service.test.ts

* fix naming convention in APM
2020-08-05 17:32:19 +02:00
Mikhail Shustov 585d58c202
[KP] Expose new es client (#73651)
* mark legacy ES client types as deprecated

* expose es client to plugins and update mocks

* ElasticSearchClientMock --> ElasticsearchClientMock

* expose es client mocks

* expose es client via RequestHandlerContext

* convert test/plugin_functional/config into ts

* convert top_nav test into ts

* add an integration test for the es client

* update comments to refer to the new es client

* fix import paths. do not use extensions

temp

* update docs

* fix other refs

* add test for a custom client

* fix context

* add test for scoped client

* update docs
2020-07-30 19:12:37 +02:00
spalger d0afbd887d skip flaky suite (#71390) 2020-07-16 08:47:23 -07:00
Patrick Mueller b167d77e3e
[eventLog] search for actions/alerts as hidden saved objects (#70395)
resolves https://github.com/elastic/kibana/issues/70086

Configures the saved object client for the event log to access the recently
hidden action and alert saved objects.

We didn't have tests for action/alert event log activity, so added some now.

Also found a buglet that was preventing access to event log data from actions
and alerts in non-default spaces.
2020-07-16 09:10:51 -04:00
Larry Gregory a9b2d50e76
Record security feature usage (#67526)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-04 12:29:28 -04:00
Brandon Kobel ce47ef5d24
Updating the licensed feature usage API response format (#67712)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-01 10:09:07 -07:00
restrry bf04235dae apply prettier styles 2020-05-22 09:08:58 +02:00
Gidi Meir Morris 64c09318fe
[Event log] Fix flaky test (#65658)
fixes flaky test in Event Log
2020-05-07 20:52:11 +01:00
spalger a1f8ad039e skip flaky suite (#64812) (#64723) 2020-05-05 16:17:23 -07:00
Gidi Meir Morris 9cfe4cf659
[Event Log] Ensure sorting tests are less flaky (#64781)
Creating events in parallel may be causing a slight flakyness, this change staggers creation to ensure this doesn't happen.
In addition it turned out the `event.end` field was missing in certain cases, causing the test that sorts by `end` to fail.
2020-05-04 13:39:25 +01:00
Patrick Mueller f85b3898f6
[Event Log] add rel=primary to saved objects for query targets (#64615)
resolves https://github.com/elastic/kibana/issues/62668

Adds a property named `rel` to the nested saved objects in the event
documents, whose value should not be set, or set to `primary`.
The query by saved object function changes to only match event documents
with that saved objects if it has the `rel: primary` value.

This is used to limit searching alerting's executeAction event document
with only the alert saved object, and not the action saved object (this
document has an alert and action saved object). The alert saved object
has the `rel: primary` field set, and the action does not.  Previously,
those documents were returned with a query of the action saved object.
2020-04-30 00:27:51 -04:00
spalger 2e410d8952 skip flaky suite (#64812) (#64723) 2020-04-29 12:53:06 -07:00
Gidi Meir Morris 9fe7229357
[Alerting] migrates all remaining plugins to new platform (#64335)
Completes the migration of all Alerting Services plugins onto the Kibana Platform

It includes:

1. Actions plugin
2. Alerting plugin
3. Task Manager plugin
4. Triggers UI plugin

And touches the Uptime and Siem plugins as their use of the Task Manager relied on some of the legacy lifecycle to work (registering AlertTypes and Telemetry tasks after the Start stage has already began). The fix was simply to moves these registrations to the Setup stage.
2020-04-29 15:46:54 +01:00
Patrick Mueller 4e0c11ea40
[Event Log] use @timestamp field for queries (#64391)
resolves https://github.com/elastic/kibana/issues/64275

Changes the fields used to query the event log by time range to use the
`@timestamp` field.

Also allow `@timestamp` as a sort option, and make it the default sort option.
2020-04-28 12:37:25 -04:00
Pierre Gayvallet 2b3fadebf9
add licensed feature usage API (#63549)
* add licensed feature usage API

* add FTR test and plugin

* jsdoc

* fix FTR test suite name

* remove clear API

* accept Date for notifyUsage
2020-04-28 09:39:57 +02:00
Yuliia Naumenko 2af91b3c51
Added server api tests for event log service (#63540)
* Added server api tests for event log service

* fixed tests

* fixed type check issue

* Fixed failing tests

* fixed jest tests

* Fixed due to comments

* Removed flackiness tests

* fixed type check error

* Fixed func test
2020-04-17 09:50:08 -07:00
Gidi Meir Morris 1f732ad29a
[Event Log] Adds namespace into save objects (#62974)
Adds a namespace attribute to the saved object object within the Event Log so that each Saved Object can have its own. This change also removes the existing kibana.namespace field.

As Event Log is not yet in use, this does not include a migration.
2020-04-14 10:57:46 +01:00
Gidi Meir Morris e7a4ca261b
[Event Log] adds query support to the Event Log (#62015)
* added Start api on Event Log plugin

* added empty skeleton for Event Log FTs

* added functional test to public find events api

* added test for pagination

* fixed unit tests

* added support for date ranges

* removed unused code

* replaces valdiation typing

* Revert "replaces valdiation typing"

This reverts commit 711c098e9b.

* replaces match with term

* added sorting

* fixed saved objects nested query

* updated plugin FTs path

* Update x-pack/plugins/encrypted_saved_objects/README.md

Co-Authored-By: Aleh Zasypkin <aleh.zasypkin@gmail.com>

* Update x-pack/plugins/encrypted_saved_objects/README.md

Co-Authored-By: Aleh Zasypkin <aleh.zasypkin@gmail.com>

* remofed validation from tests

* fixed typos

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Aleh Zasypkin <aleh.zasypkin@gmail.com>
2020-04-06 18:02:58 +01:00
Aleh Zasypkin 4d8bae4a4c
Migrate Security and EncryptedSavedObjects test plugins to the Kibana Platform (#61614) 2020-03-30 19:38:39 +02:00
Gidi Meir Morris a02232d62b adds ability to fetch Alert and Alert Instance state (#56625)
Enables access to the Alert State, which allows us to see which current Alert Instances are active.

This includes:

1. Addition of a `get` api on Task Manager
2. Typing and validation on Serialisation & Deserialisation of the State of an Alert's underlying Task
3. Addition of the `getAlertState` api on AlertsClient
2020-02-10 12:11:20 +13:00
Gidi Meir Morris 8458e47614
removes usage of the _id field in Task manager (#54765)
As of Elasticsearch 8.0.0 it will no longer be possible to use the _id field on documents.
This PR removes the usage that Task Manager makes of this field and switches to pinned queries to achieve a similar effect.
2020-01-16 09:55:51 +00:00
Gidi Meir Morris ea9a7b8a16
migrate TaskManager Plugin to the Kibana Platform (#53869)
Migrates the existing TaskManager plugin from Legacy to Kibana Platform.
We retain the Legacy API to prevent a breaking change, but under the hood, the legacy plugin is now using the Kibana Platform plugin.

Another reason we retain the Legacy plugin to support several features that the Platform team has yet to migrate to Kibana Platform (mapping, SO schema and migrations).
2020-01-13 19:09:57 +00:00
Gidi Meir Morris b09653ac74
moved Task Manager server code under "server" directory (#53777)
Changes Task Manager folder structure to include a "server" folder as required by our linting rules as part of the migration to the Kibana Platform
2020-01-03 12:07:17 +00:00
Gidi Meir Morris 2b6ef5c2bb
Moves Task manager's interval under a generic schedule field (#52727)
This moves the interval field under a generic schedule object field in preparation for the introduction of richer scheduling options (such as cron).

It includes a migration for existing tasks, and we've ensured no existing Task Type Definitions exist in Kibana that rely on Interval.

This includes support for the deprecated interval field (which gets mapped to schedule) but that support will be removed in 8.0.0, as it's a breaking change.
2019-12-17 15:16:40 +00:00
Gidi Meir Morris bb98e9a2b8
[Task Manager] Adds runNow api to Task Manager (#51601)
Adds a `runNow` api to Task Manager, allowing us to force the refresh of a recurring task.

This PR includes a couple of sustainability changes as well as the feature itself.

1. **Declarative query composition.** At the moment the queries in the TaskStore are huge JSON objects that are hard to maintain and understand. This PR introduces a pattern where the different parts of the query are composed out of type-checked functions, making it easier to maintain and to construct dynamically as needs change. _This was included in this PR as the **markAvailableTasksAsClaimed** query needs different query clauses depending on whether there are specific Tasks we wish to claim first.

2. **Refactoring of the Task Poller** As the `runNow` api is introduced we find Task Manager's lifecycle in a weird state where it has both a _pull_ model, where timeouts & callbacks interact without having to responsd to any external requests, and a _push_ model where requests are made to the new `runNow` api. Balancing these two proved error prone, hard to maintain and had the potential of _lossy_ behaviour where requests are dropped accidentally. To address this TaskPoller has been refactored using Rxjs observables, remodelling the existing _pull_ mechanism as a _push_ mechanism so Task Manager can _respond_ to both _polling_ calls and _runNow_ in a similar fashion.

And ofcourse the main feature of this PR:

3. **runNow api** An api on TaskManager that takes a _task ID_ and attempts to run the task. The call returns a promise which resolves with a result which notifies the caller when the task has either completed successfully, or result in an error.
2019-12-16 14:12:25 +00:00
spalger 8e9a8a84dc autofix all violations 2019-12-13 23:17:13 -07:00
Gidi Meir Morris cfed9c6c48
[Task Manager] Tests for the ability to run tasks of varying durations in parallel (#51572)
This PR adds a test that ensures Task Manager is capable of picking up new tasks in parallel to a long running tasks that might otherwise hold up task execution.

This doesn't add functionality - just a missing test case.
2019-11-26 10:35:56 +00:00