Coder Social home page Coder Social logo

opendistro-for-elasticsearch / alerting Goto Github PK

View Code? Open in Web Editor NEW
279.0 20.0 80.0 1.46 MB

📟 Open Distro Alerting Plugin

Home Page: https://opendistro.github.io/for-elasticsearch/features/alerting.html

License: Apache License 2.0

Kotlin 91.62% Java 8.38%
elasticsearch-plugin elasticsearch opendistroforelasticsearch gradle

alerting's Introduction

Test Workflow codecov Documentation Chat PRs welcome!

Open Distro for Elasticsearch Alerting

The Open Distro for Elasticsearch Alerting enables you to monitor your data and send alert notifications automatically to your stakeholders. With an intuitive Kibana interface and a powerful API, it is easy to set up, manage, and monitor your alerts. Craft highly specific alert conditions using Elasticsearch's full query language and scripting capabilities.

Highlights

Scheduled searches use cron expressions or intervals (e.g. every five minutes) and the Elasticsearch query DSL.

To define trigger conditions, use the Painless scripting language or simple thresholds (e.g. count > 100).

When trigger conditions are met, you can publish messages to the following destinations:

Messages can be static strings, or you can use the Mustache templates to include contextual information.

Documentation

Please see our documentation.

Setup

  1. Check out this package from version control.
  2. Launch Intellij IDEA, choose Import Project, and select the settings.gradle file in the root of this package.
  3. To build from the command line, set JAVA_HOME to point to a JDK >= 14 before running ./gradlew.

Build

This package is organized into subprojects, most of which contribute JARs to the top-level plugin in the alerting subproject.

All subprojects in this package use the Gradle build system. Gradle comes with excellent documentation that should be your first stop when trying to figure out how to operate or modify the build.

However, to build the alerting plugin subproject, we also use the Elastic build tools for Gradle. These tools are idiosyncratic and don't always follow the conventions and instructions for building regular Java code using Gradle. Not everything in alerting will work the way it's described in the Gradle documentation. If you encounter such a situation, the Elastic build tools source code is your best bet for figuring out what's going on.

Building from the command line

  1. ./gradlew build builds and tests all subprojects.
  2. ./gradlew :alerting:run launches a single node cluster with the alerting plugin installed.
  3. ./gradlew :alerting:run -PnumNodes=3 launches a multi-node cluster with the alerting plugin installed.
  4. ./gradlew :alerting:integTest launches a single node cluster with the alerting plugin installed and runs all integ tests.
  5. ./gradlew :alerting:integTest -PnumNodes=3 launches a multi-node cluster with the alerting plugin installed and runs all integ tests.
  6. ./gradlew :alerting:integTest -Dtests.class="*MonitorRunnerIT" runs a single integ test class
  7. ./gradlew :alerting:integTest -Dtests.method="test execute monitor with dryrun" runs a single integ test method (remember to quote the test method name if it contains spaces).

When launching a cluster using one of the above commands, logs are placed in alerting/build/testclusters/integTest-0/logs/. Though the logs are teed to the console, in practices it's best to check the actual log file.

Run integration tests with Security enabled

  1. Setup a local odfe cluster with security plugin.

    • ./gradlew :alerting:integTestRunner -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername=es-integrationtest -Dhttps=true -Dsecurity=true -Duser=admin -Dpassword=admin

    • ./gradlew :alerting:integTestRunner -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername=es-integrationtest -Dhttps=true -Dsecurity=true -Duser=admin -Dpassword=admin --tests "com.amazon.opendistroforelasticsearch.alerting.MonitorRunnerIT.test execute monitor returns search result"

Debugging

Sometimes it's useful to attach a debugger to either the Elasticsearch cluster or the integ tests to see what's going on. When running unit tests, hit Debug from the IDE's gutter to debug the tests. You must start your debugger to listen for remote JVM before running the below commands.

To debug code running in an actual server, run:

./gradlew :alerting:integTest -Dcluster.debug # to start a cluster and run integ tests

OR

./gradlew :alerting:run --debug-jvm # to just start a cluster that can be debugged

The Elasticsearch server JVM will launch suspended and wait for a debugger to attach to localhost:5005 before starting the Elasticsearch server. The IDE needs to listen for the remote JVM. If using Intellij you must set your debug configuration to "Listen to remote JVM" and make sure "Auto Restart" is checked. You must start your debugger to listen for remote JVM before running the commands.

To debug code running in an integ test (which exercises the server from a separate JVM), run:

./gradlew :alerting:integTest -Dtest.debug 

The test runner JVM will start suspended and wait for a debugger to attach to localhost:8000 before running the tests.

Additionally, it is possible to attach one debugger to the cluster JVM and another debugger to the test runner. First, make sure one debugger is listening on port 5005 and the other is listening on port 8000. Then, run:

./gradlew :alerting:integTest -Dtest.debug -Dcluster.debug

Advanced: Launching multi-node clusters locally

Sometimes you need to launch a cluster with more than one Elasticsearch server process.

You can do this by running ./gradlew :alerting:run -PnumNodes=<numberOfNodesYouWant>

You can also run the integration tests against a multi-node cluster by running ./gradlew :alerting:integTest -PnumNodes=<numberOfNodesYouWant>

You can also debug a multi-node cluster, by using a combination of above multi-node and debug steps. But, you must set up debugger configurations to listen on each port starting from 5005 and increasing by 1 for each node.

Code of Conduct

This project has adopted an Open Source Code of Conduct.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public GitHub issue.

Licensing

See the LICENSE file for our project's licensing. We will ask you to confirm the licensing of your contribution.

Copyright

Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.

alerting's People

Contributors

aetter avatar allenyin96 avatar ann3431 avatar bowenlan-amzn avatar camerski avatar carlmeadows avatar dbbaughe avatar elfisher avatar fabiocorneti avatar ftianli-amzn avatar gaiksaya avatar imranh2 avatar jcleezer avatar jinsoor-amzn avatar kaituo avatar lezzago avatar lucaswin-amzn avatar mathewlsm avatar mihirsoni avatar qreshi avatar rish1397 avatar rishabh6788 avatar seraphjiang avatar skkosuri-amzn avatar stockholmux avatar tryantwit avatar vamshin avatar weicongs-amazon avatar ylwu-amzn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alerting's Issues

Attepmt to send notification in case of Alert ERROR.

Currently, we do not send notification for alert which in ERRORed state. We should try sending out Notification in case trigger is in ERROR state.

There could be a case where ERROR is due to issues in notification but I think it makes sense to at least try and notify in case ERROR is in prior stage (i.e run monitor, query monitor, painless failure) ?

Acknowledge alert for period of time

Use case:
I have received one alert notification and plan to work on it soon. So I just want to suppress the alert notification for next 4 hours. But currently acknowledge will suppress the notification totally.

Suggestion
Enhance current acknowledge to support just acknowledge the alert for a period of time.

Opendistro configuration support

Hi all,
I use elastiflow to monitor network traffic, and I'm stuck in the opendistro configure, I want to use opendistro to trigger alert, for example when the client IP total traffic exceed 1GB in last 15 min, opendistro will trigger an alert via email or webhook. but now I don't know opendistro how to get the index data, and how to configure condition when client traffic sum exceed 1GB, here is my index templet, I need the filed "flow"{"bytes:", "client_hostname":},hope someone can help me, thanks in advance.

{
	"_index": "elastiflow-3.4.1-2019.06.13",
	"_type": "doc",
	"_id": "XjDTTmsBwJmiCTy6dE1w",
	"_version": 1,
	"_score": null,
	"_source": {
		"@Version": "3.4.1",
		"event": {
			"host": "10.10.101.1",
			"type": "netflow"
		},
		"node": {
			"ipaddr": "10.10.101.1",
			"hostname": "10.10.101.1"
		},
		"flow": {
			"tos": 20,
			"src_hostname": "172.217.24.206",
			"input_snmp": 24,
			"client_autonomous_system": "private",
			"ip_version": "IPv4",
			"server_hostname": "172.217.24.206",
			"bytes": 2917,
			"dst_mask_len": 23,
			"dst_port": 60813,
			"server_autonomous_system": "Google LLC (15169)",
			"client_hostname": "10.213.221.184",
			"dst_addr": "10.213.221.184",
			"src_country_code": "US",
			"sampling_interval": 0,
			"traffic_locality": "public",
			"src_geo_location": {
				"lon": -97.822,
				"lat": 37.751
			},
			"output_ifname": "index: 3",
			"autonomous_system": "Google LLC (15169)",
			"service_port": "443",
			"src_port_name": "https (TCP/443)",
			"service_name": "https (TCP/443)",
			"server_geo_location": "37.751,-97.822",
			"country_code": "US",
			"server_country": "United States",
			"dst_port_name": "TCP/60813",
			"dst_hostname": "10.213.221.184",
			"ip_protocol": "TCP",
			"input_ifname": "index: 24",
			"dst_autonomous_system": "private",
			"src_addr": "172.217.24.206",
			"server_asn": "15169",
			"application": "Google Docs/Drive",
			"client_addr": "10.213.221.184",
			"src_mask_len": 0,
			"src_country": "United States",
			"server_country_code": "US",
			"src_port": 443,
			"src_autonomous_system": "Google LLC (15169)",
			"packets": 31,
			"country": "United States",
			"server_addr": "172.217.24.206",
			"src_asn": 15169,
			"direction": "unspecified",
			"output_snmp": 3
		},
		"@timestamp": "2019-06-13T03:13:28.000Z",
		"netflow": {
			"first_switched": "2019-06-13T03:12:28.999Z",
			"in_bytes": 2917,
			"version": 9,
			"flowset_id": 289,
			"ipv4_dst_prefix": "10.213.220.0",
			"flow_seq_num": 4902055,
			"last_switched": "2019-06-13T03:13:20.999Z",
			"in_pkts": 31
		}
	},
	"fields": {
		"netflow.first_switched": [
			"2019-06-13T03:12:28.999Z"
		],
		"@timestamp": [
			"2019-06-13T03:13:28.000Z"
		],
		"netflow.last_switched": [
			"2019-06-13T03:13:20.999Z"
		]
	},
	"sort": [
		1560395608000
	]
}

Running integration tests fails on Windows

Using a current fork and fresh clone I'm getting this:

>gradlew :alerting:integTest

> Configure project :alerting
=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 5.2.1
  OS Info               : Windows 7 6.1 (amd64)
  JDK Version           : 12 (Oracle Corporation 12.0.1 [Java HotSpot(TM) 64-Bit
 Server VM 12.0.1+12])
  JAVA_HOME             : C:\Program Files\Java\jdk-12.0.1
  Random Testing Seed   : FC05C630D576ED08
=======================================

> Task :alerting-notification:compileJava
Note: \alerting\notification\src\main\java\com\amazon\opendistrofore
lasticsearch\alerting\destination\Notification.java uses unchecked or unsafe ope
rations.
Note: Recompile with -Xlint:unchecked for details.

> Task :alerting:integTestCluster#wait FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':alerting:integTestCluster#wait'.
> Failed to locate seed node transport file [\alerting\alerting\buil
d\cluster\integTestCluster node0\elasticsearch-7.1.1\logs\transport.ports]: time
d out waiting for it to be created after 40 seconds
...
BUILD FAILED in 1m 20s
25 actionable tasks: 24 executed, 1 up-to-date

Figure out how to expose domain model objects directly to mustache action scripts

Currently we can directly pass our domain model objects to the painless trigger scripts without converting them into an intermediate representation like Map<String, String>. This ensures that new fields are automatically available to trigger scripts when they're introduced.

However due to the mustache script sandbox in ES we've been unable to do this for action templates scripts which means we need to manually convert domain models to String Maps. This is inefficient and leads to issues like #26 .

This issue is meant to investigate approaches to avoiding this conversion and exposing the domain model directly to action template scripts.

Manual plugin installation

I have compiled and installed the alerting plugin for both elasticsearch and kibana for version 7.1.1 on a fresh install via docker

When I visit the alerts page and try to create a monitor nothing happens but I get this error in my elasticsearch log:

{"type": "server", "timestamp": "2019-07-13T11:49:32,176+0000", "level": "ERROR", "component": "c.a.o.a.MonitorRunner", "cluster.name": "docker-cluster", "node.name": "es01", "cluster.uuid": "L1uBBYf8T_qajnQ6fq4HlQ", "node.id": "wMMTAusnSG20x1lV8YEakA",  "message": "Error loading alerts for monitor: _na_" ,
"stacktrace": ["org.elasticsearch.ElasticsearchSecurityException: action [indices:admin/exists] is unauthorized for user [_system]",
"at org.elasticsearch.xpack.core.security.support.Exceptions.authorizationError(Exceptions.java:34) ~[?:?]",
"at org.elasticsearch.xpack.security.authz.AuthorizationService.denialException(AuthorizationService.java:576) ~[?:?]",
"at org.elasticsearch.xpack.security.authz.AuthorizationService.authorizeSystemUser(AuthorizationService.java:378) ~[?:?]",
"at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:183) ~[?:?]",
"at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.authorizeRequest(SecurityActionFilter.java:172) ~[?:?]",
"at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:158) ~[?:?]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$writeAuthToContext$24(AuthenticationService.java:563) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.writeAuthToContext(AuthenticationService.java:572) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$handleNullToken$18(AuthenticationService.java:465) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.handleNullToken(AuthenticationService.java:472) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.consumeToken(AuthenticationService.java:356) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$extractToken$9(AuthenticationService.java:327) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.extractToken(AuthenticationService.java:345) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$checkForApiKey$3(AuthenticationService.java:288) ~[?:?]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.xpack.security.authc.ApiKeyService.authenticateWithApiKeyIfPresent(ApiKeyService.java:357) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.checkForApiKey(AuthenticationService.java:269) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:252) ~[?:?]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.xpack.security.authc.TokenService.getAndValidateToken(TokenService.java:297) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:248) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:306) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:317) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:244) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:196) ~[?:?]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:139) ~[?:?]",
"at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.applyInternal(SecurityActionFilter.java:155) ~[?:?]",
"at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:107) ~[?:?]",
"at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:143) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:121) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:64) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:393) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1213) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.exists(AbstractClient.java:1228) ~[elasticsearch-7.1.1.jar:7.1.1]",
"at com.amazon.opendistroforelasticsearch.alerting.alerts.AlertIndices$createIndex$existsResponse$1.invoke(AlertIndices.kt:183) ~[opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",
"at com.amazon.opendistroforelasticsearch.alerting.alerts.AlertIndices$createIndex$existsResponse$1.invoke(AlertIndices.kt:58) ~[opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",
"at com.amazon.opendistroforelasticsearch.alerting.elasticapi.ElasticExtensionsKt.suspendUntil(ElasticExtensions.kt:142) ~[alerting-core-1.1.0.0.jar:?]",
"at com.amazon.opendistroforelasticsearch.alerting.alerts.AlertIndices.createIndex(AlertIndices.kt:182) ~[opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",
"at com.amazon.opendistroforelasticsearch.alerting.alerts.AlertIndices.createIndex$default(AlertIndices.kt:178) ~[opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",
"at com.amazon.opendistroforelasticsearch.alerting.alerts.AlertIndices.createOrUpdateAlertIndex(AlertIndices.kt:159) ~[opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",
"at com.amazon.opendistroforelasticsearch.alerting.MonitorRunner.runMonitor(MonitorRunner.kt:184) [opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",
"at com.amazon.opendistroforelasticsearch.alerting.resthandler.RestExecuteMonitorAction$prepareRequest$1$executeMonitor$1$1.invokeSuspend(RestExecuteMonitorAction.kt:71) [opendistro_alerting-1.1.0.0-SNAPSHOT.jar:1.1.0.0-SNAPSHOT]",

I get this error in the kibana log:

  body:
   { error:
      { root_cause: [Array],
        type: 'index_not_found_exception',
        reason: 'no such index [.opendistro-alerting-config]',
        'resource.type': 'index_or_alias',
        'resource.id': '.opendistro-alerting-config',
        index_uuid: '_na_',
        index: '.opendistro-alerting-config' },
     status: 404 },

It looks like it doesn't create the indexes automatically, but there is no indication of any problems during startup, the plugin starts without errors:

{"type":"log","@timestamp":"2019-07-13T12:20:26Z","tags":["status","plugin:[email protected]","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}

Support for retriggering notification

Hi.

I think its a must to be able to select if an alarm should retrigger every time the condition is fulfilled, potentially with a delay timer before retriggering.

Usecase:
As briefly discussed in this issue: #13

If you have an alarm condition checking several independent entities at once, how should you make sure that you will get notified when new new entities fall into the category that match the trigger?

As is now, you will never know unless you are able to fix all other alarms before another alarm is triggered? I would have to fix every entity before getting alerts about new entities being triggered.

Example:
system-1 is somehow low in disk-space, so an alarm is triggered.
I see the alarm, but this not being very critical, I will fix this in a couple of hours.
Then, suddenly system-2 is almost totally full, being much more critical that the triggered event for system-1, but I wont get a notification about this.

I realize that one could build separate levels of alarming to handle this, and sometimes this is the best approach, but the complexity would quickly increase.

I see your point about not wanting to implement a full fledged alerting system, but this lack of optional re-triggering of actions makes it almost useless when dealing with monitoring "multi-tenant" systems.

Impossible to send message with Korean/Japanese on Action Message

I tried to send message with Korean/Japanese on Action Message("Send test message").
The Slack message was like this "?????"
How do I send a message with the other language except English?

I tested with below docker images.
amazon/opendistro-for-elasticsearch:0.8.0
amazon/opendistro-for-elasticsearch-kibana:0.8.0

스크린샷 2019-04-16 오후 2 55 04

스크린샷 2019-04-16 오후 3 01 02

Alerts not reflecting on dashboard

as of now alerts function is not working like SIEM alerts or real time alerts. expectation is alerts should show on dashboard or notify on each and every condition match irrespective of whether it is active or acknowledge or completed. If condition matching two time then two alerts should be on dashboard , not only one

False positive alerts

I have alert like

{
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "*",
                        "fields": [],
                        "type": "best_fields",
                        "default_operator": "or",
                        "max_determinized_states": 10000,
                        "enable_position_increments": true,
                        "fuzziness": "AUTO",
                        "fuzzy_prefix_length": 0,
                        "fuzzy_max_expansions": 50,
                        "phrase_slop": 0,
                        "escape": false,
                        "auto_generate_synonyms_phrase_query": true,
                        "fuzzy_transpositions": true,
                        "boost": 1
                    }
                },
                {
                    "range": {
                        "@timestamp": {
                            "from": "now-5m",
                            "to": null,
                            "include_lower": true,
                            "include_upper": true,
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "sort": [
        {
            "@timestamp": {
                "order": "desc"
            }
        }
    ]
}

I can see many logs in "Discovery" but I`m also have false positive messages about "All logs is missing"

Server works slowly because of

elasticsearch log
image

Do you use any kind of query response timeout in monitor? I think we should check query status (running \ failed \ completed) before sending alert.

Index mapping schema versioning

Unrelated to this specific PR, but:
Adding Throttling introduced an update to our mappings. We need to add "schema_version" to the mappings and set it to 1 with this update. And in the code if schema_version does not exist on the document we'll assume it's 0 (ie initial release/before throttling was added).

We also need to add a way to update the mappings on each cluster. Currently none of the index mappings will be updated since we only do that once on initial use of alerting.

All of this needs to happen ASAP before we cut a release for 7.0

Originally posted by @dbbaughe in #59 (comment)

Change alert ids to be deterministic instead of auto generated

Right now we index alerts and use the auto-generated id's for the documents.
We should move to deterministic ids to solve multiple problems.

ACTIVE alerts should use an id of: <monitor_id>-<trigger_id>
COMPLETED alerts should use an id of: <monitor_id>-<trigger_id>-<start_time>

Benefits:
This would allow us to provide the soon to be created alert's alert_id to the mustache context variable for users to use.
Removes the chance of duplicate alerts being created at the same time.

Stop sending notification to one destination

Use case:
We are sending alert notification to multiple destinations. For some reason, we don't want to send notification to one destination any more.

I see the destination can be deleted on "Destinations" page. But if the destination is in usage, it can't be deleted and will pop up such warning "Couldn't delete destination dest1. One or more monitors uses this destination." As the message doesn't show which monitor is using the destination, I have to go through all of the triggers and delete the actions which are using the destination (currently, one action can only send notification to one destination, maybe we can enhance it to support multiple destinations in one action?).

Suggestions:
1.Show which monitor/trigger/action are using the destination
2.Just suspend one destination if the delete action is too heavy.

Alerting - monitors context.alert null or missing id

Hello,

First alert is coming with a null alert property, It is the first time I am using alerts but I guess it should be Active and has an id.

first alert
"alert": null

and then second one is missing the alert id.
"alert":{ "state":"ACTIVE", "error_message":null, "acknowledged_time":null, "last_notification_time":1553847937681 },

I am trying to send an message with an alert id so I can use alert.id to automatically acknowledge this alert with an url in the first alert message.

Destination: Slack

Thanks in advance.

Support for Alert Monitor Chaining

Provide users with the ability to run a chain of monitor queries from multiple data sources to trigger on. Chains should run queries in order and enable customers to use the previous query results as inputs to subsequent queries.

Allow users to set up a special alert when their Monitor is erroring

Current behavior is to send the user's alert text when their monitor or trigger is in the "Error" state. This is confusing, since I don't know whether the problem is captured by the data or whether the problem is evaluating the trigger.

I need some way to have Open Distro alert me that there's a problem in my setup.

Custom Webhook not triggered [Custom Webhook not triggered [ endpoint empty. Fall back to host:port/path] ]

HI everyone,

i have a strange behavior with custome Webhook.

I'm testing opendistro-alerting plugin with an elasticsearch cluster. Webhook are not triggered when i try with the "send test message" button.

To resume, i have this error stack trace on cluster side when click on "send test message" :

 [2019-06-18T09:21:04,490][INFO` ][c.a.o.a.d.c.DestinationHttpClient] [es-cluster-2] endpoint empty. Fall back to host:port/path
    [2019-06-18T09:21:04,524][ERROR][c.a.o.a.d.f.CustomWebhookDestinationFactory] [es-cluster-2] Exception publishing Message: DestinationType: CUSTOMWEBHOOK, DestinationName:Opsgenie_XXXXX, Url: , scheme: https, Host: api.eu.opsgenie.com, Port: 443, Path: null, Message: Monitor test just entered an alert state. Please investigate the issue.
Send test message
Message preview

- Trigger: test
- Severity: 1
- Period start: 2019-06-18T09:20:04.484Z
- Period end: 2019-06-18T09:21:04.484Z
java.io.IOException: Failed: HttpResponseProxy{HTTP/1.1 404 Not Found [Content-Type: application/json; charset=UTF-8, Transfer-Encoding: chunked, Connection: keep-alive, Date: Tue, 18 Jun 2019 09:21:04 GMT, Allow: HEAD, GET, X-Response-Time: 0.002, X-Request-ID: ad95db49-64ca-461e-af95-9eaee870ae68, X-Cache: Error from cloudfront, Via: 1.1 3ccfbae98f5816b531634c1e82e45259.cloudfront.net (CloudFront), X-Amz-Cf-Pop: FRA50, X-Amz-Cf-Id: j5a5w82Z-DyyZDFdrfXQIFaou3t3TYTQ5dj3i-Cx8SzxvwDiXszNPg==] ResponseEntityProxy{[Content-Type: application/json; charset=UTF-8,Chunked: true]}}
    at com.amazon.opendistroforelasticsearch.alerting.destination.client.DestinationHttpClient.validateResponseStatus(DestinationHttpClient.java:156) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.client.DestinationHttpClient.execute(DestinationHttpClient.java:81) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.factory.CustomWebhookDestinationFactory.publish(CustomWebhookDestinationFactory.java:42) [alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.factory.CustomWebhookDestinationFactory.publish(CustomWebhookDestinationFactory.java:29) [alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.Notification.lambda$publish$0(Notification.java:43) [alerting-notification-0.9.0.0.jar:?]
    at java.security.AccessController.doPrivileged(AccessController.java:310) [?:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.Notification.publish(Notification.java:41) [alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.model.destination.Destination.publish(Destination.kt:167) [opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at com.amazon.opendistroforelasticsearch.alerting.MonitorRunner.runAction(MonitorRunner.kt:393) [opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at com.amazon.opendistroforelasticsearch.alerting.MonitorRunner.runMonitor(MonitorRunner.kt:196) [opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at com.amazon.opendistroforelasticsearch.alerting.resthandler.RestExecuteMonitorAction$prepareRequest$1$executeMonitor$1$1.run(RestExecuteMonitorAction.kt:65) [opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.7.1.jar:6.7.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-06-18T09:21:04,525][INFO ][c.a.o.a.m.MonitorRunResult] [es-cluster-2] Internal error: java.io.IOException: Failed: HttpResponseProxy{HTTP/1.1 404 Not Found [Content-Type: application/json; charset=UTF-8, Transfer-Encoding: chunked, Connection: keep-alive, Date: Tue, 18 Jun 2019 09:21:04 GMT, Allow: HEAD, GET, X-Response-Time: 0.002, X-Request-ID: ad95db49-64ca-461e-af95-9eaee870ae68, X-Cache: Error from cloudfront, Via: 1.1 3ccfbae98f5816b531634c1e82e45259.cloudfront.net (CloudFront), X-Amz-Cf-Pop: FRA50, X-Amz-Cf-Id: j5a5w82Z-DyyZDFdrfXQIFaou3t3TYTQ5dj3i-Cx8SzxvwDiXszNPg==] ResponseEntityProxy{[Content-Type: application/json; charset=UTF-8,Chunked: true]}}. See the Elasticsearch.log for details
java.lang.IllegalStateException: java.io.IOException: Failed: HttpResponseProxy{HTTP/1.1 404 Not Found [Content-Type: application/json; charset=UTF-8, Transfer-Encoding: chunked, Connection: keep-alive, Date: Tue, 18 Jun 2019 09:21:04 GMT, Allow: HEAD, GET, X-Response-Time: 0.002, X-Request-ID: ad95db49-64ca-461e-af95-9eaee870ae68, X-Cache: Error from cloudfront, Via: 1.1 3ccfbae98f5816b531634c1e82e45259.cloudfront.net (CloudFront), X-Amz-Cf-Pop: FRA50, X-Amz-Cf-Id: j5a5w82Z-DyyZDFdrfXQIFaou3t3TYTQ5dj3i-Cx8SzxvwDiXszNPg==] ResponseEntityProxy{[Content-Type: application/json; charset=UTF-8,Chunked: true]}}
    at com.amazon.opendistroforelasticsearch.alerting.destination.factory.CustomWebhookDestinationFactory.publish(CustomWebhookDestinationFactory.java:46) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.factory.CustomWebhookDestinationFactory.publish(CustomWebhookDestinationFactory.java:29) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.Notification.lambda$publish$0(Notification.java:43) ~[alerting-notification-0.9.0.0.jar:?]
    at java.security.AccessController.doPrivileged(AccessController.java:310) ~[?:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.Notification.publish(Notification.java:41) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.model.destination.Destination.publish(Destination.kt:167) ~[opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at com.amazon.opendistroforelasticsearch.alerting.MonitorRunner.runAction(MonitorRunner.kt:393) ~[opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at com.amazon.opendistroforelasticsearch.alerting.MonitorRunner.runMonitor(MonitorRunner.kt:196) [opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at com.amazon.opendistroforelasticsearch.alerting.resthandler.RestExecuteMonitorAction$prepareRequest$1$executeMonitor$1$1.run(RestExecuteMonitorAction.kt:65) [opendistro_alerting-0.9.0.0.jar:0.9.0.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.7.1.jar:6.7.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.IOException: Failed: HttpResponseProxy{HTTP/1.1 404 Not Found [Content-Type: application/json; charset=UTF-8, Transfer-Encoding: chunked, Connection: keep-alive, Date: Tue, 18 Jun 2019 09:21:04 GMT, Allow: HEAD, GET, X-Response-Time: 0.002, X-Request-ID: ad95db49-64ca-461e-af95-9eaee870ae68, X-Cache: Error from cloudfront, Via: 1.1 3ccfbae98f5816b531634c1e82e45259.cloudfront.net (CloudFront), X-Amz-Cf-Pop: FRA50, X-Amz-Cf-Id: j5a5w82Z-DyyZDFdrfXQIFaou3t3TYTQ5dj3i-Cx8SzxvwDiXszNPg==] ResponseEntityProxy{[Content-Type: application/json; charset=UTF-8,Chunked: true]}}
    at com.amazon.opendistroforelasticsearch.alerting.destination.client.DestinationHttpClient.validateResponseStatus(DestinationHttpClient.java:156) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.client.DestinationHttpClient.execute(DestinationHttpClient.java:81) ~[alerting-notification-0.9.0.0.jar:?]
    at com.amazon.opendistroforelasticsearch.alerting.destination.factory.CustomWebhookDestinationFactory.publish(CustomWebhookDestinationFactory.java:42) ~[alerting-notification-0.9.0.0.jar:?]
... 14 more`

https://discuss.opendistrocommunity.dev/t/custom-webhook-not-triggered-for-opsgenie-integration/911

Thanks for your help. :-)

Integrate Security with Alerting

Integrate Alerting with security allowing for the following:

  • Create action groups and roles to control alerting CRUD operations
  • Enable alerts and alert history indexes to be assigned to tenants so that individual teams can share alerts, alert history and notification channels with each other while be isolated from those not in their tenant.
  • Pre-validate that users have the ability to access an index before allowing them to create a monitor against the index

API allows invalid values for schedule

API allows us to create monitors that run every -1 minutes.

Start seeing a bunch of errors in logs:
Caused by: java.lang.IllegalArgumentException: Zero or negative time interval not supported

Should do validation in the monitor (schedule) data classes to ensure we have valid values.

How do I complete an alert?

I have created a monitor and trigger which as resulted in an alert being triggered.

In the UI I can acknowledge the alert. However, there doesn't appear to be a UI element that allows me to move the alert from the "Acknowledged" state to the "Completed" state which means that when new instances of the condition trigger, I do not receive a new alert.

How do I move the existing acknowledged alert into a completed state?

I am using the Open Distro For Elasticsearch which is packaged with AWS ElasticSearch v6.1.3.

More logging

We are currently missing some logging information around trigger evaluation, and in general we could improve debug capability for users with more logging.

See here for an example

Validating WebHook-Response

Hello,

we created an alerting with an action calling a webhook. The webhook calls a restservice which creates a ticket for the responsible person. Our Ticketsystem return a 201 if the ticket was created successfull.

The Problem is that the DestinationHTTPClient just accept a 200 HTTP-Status as valid.

The HttpClient should also accept a 201 return code.

Add support for Cross-cluster search (CCS)

Hello,

Do you have any plan to support CCS?

I have a use case when we use a centralized CCS cluster which holds no data but connects to remote clusters and serves as a proxy.
I'd like to create alerts on the same way we search, this is in order to eliminate the need for duplicate and maintain the same alerts xN times.

Currently, the Alert plugin for Kibana only shows the local indices and doesn't expose any remote indices.
As far as I know, Elasticsearch does not support the retrieving of remote cluster's indices from CCS, therefore I was thought about fetching this information from Kibana's Saved Objects API.

WDYT?

Alerting per documents/events

Please add functionality to create alerts per document/events.
Ex: I have created virus detection alerts and Alert trigger for system xyx. Within 30 second another system abc got infected.mean virus detection events for system abc is there within 30 second of first alerts.but for second system alert is not getting trigger bcz first alerts active for one mn.
This is very big issue for security detection perspective

Feature Request: Kibana URL Link in Alerts

Thanks for the awesome work on the alerting plugin. I was curious if it would be feasible if the alerts generated have a link to Kibana dashboard for example here is an alert I have:

Monitor Free taco Monitor just entered alert status. Please investigate the issue.
- Trigger: Taco Parrot is Hungry
- Severity: 5
- Period start: 2019-04-25T18:32:14.670Z
- Period end: 2019-04-25T18:33:14.670Z

It will be great to include a link to the Kibana URL for the query in this monitor, alternatively if the link to the Monitor page is included that would be great to get some context around this alert and to ease triage process of the error.

Add SQL support to Alerting

Make it easier for customers to write sophisticated monitors by adding support for SQL in the Alerting plugin.

We do not give access to the full alert properties in context variable; name is confusing

Currently the alert in the context variable only has four of it's properties.
On documentation we mention a few properties which are missing

The current, active alert (if it exists). Includes ctx.alert.id, ctx.alert.version, and ctx.alert.isAcknowledged. Null if no alert is active.

It's also confusing since people expect this to be the new alert being generated instead of the existing alert (if it exists). Perhaps we should rename this variable.

Simple change to make for adding properties here

allow to trigger an action if monitor changes to green

Hi,

we are looking for a new alerting system and we are wondering if it is possible to trigger an action when the monitor changes from the "error" state back to "normal" state?

It is a good automated feedback for all in the team that a problem is fixed again.

What do you think about this? Or am I missing something and it is already possible to do this kind of alerts?

Support for GET _opendistro/_alerting/destinations in API

Currently the API lacks a way to discover the identifiers of existing destination definitions.

Two attempts to create a destination with exactly the same definition will result in duplicate definitions of the same destination being created with no way to reconcile the two definitions or do a preceding lookup to discover the identifier of an existing definition.

To help prevent this occurring, it seems the user of the API must record the identifiers of all previously created destinations in some external scratch pad which seems a fragile solution at best.

Unable to update the cluster settings to disable alerting

Currently trying to disable the monitors via the cluster settings following your documentation.

curl -X PUT ".../_cluster/settings" -H 'Content-Type: application/json' -d'
> {
>     "persistent" : {
>         "opendistro.alerting.enabled" : "false"
>     }
> }
> '

I've tried using a cluster. prefix to the setting. Not sure what I'm missing.

Alerting Monitors should allow per-second

Currently, setting by interval, I can choose minutes, hours, or days under Every. Please add seconds to this menu as well. There's no need to make me fiddle with a custom cron for a non-complicated, every 30 second (or every 10 second) monitor.

If opendistro for ELK 7.x

Hi all,
I see now the opendistro v0.9.0 only support to ELK 6.7.1, and with the ELK 7.x.x Released, it seems that opendistro was incompatible with new ELK, so if you consider to update the version compatible with ELK 7.x.x, thanks in advance .

Throttle Period

If we configure email or slack in the action block, it will send alerts each time monitor is triggered. In xpack of elasticseach this can be controlled by setting up throttle_period in action block. It will wait for the throttle_period amount of time after the first alert and then resend the alerts if the issue is not resolved yet once. Can we have the same functionality as in this?

For more info, you can look at https://www.elastic.co/guide/en/x-pack/current/actions.html#actions-ack-throttle

Add support for tags on destinations / monitors

When using the API to configure alerting, it would be helpful to create destinations (and potentially monitors) with tags so that the destinations could be retrieved by searching for a given set of tags.

For example, when creating a new monitor, I might want to re-use the destination for "production" and "team A", but there doesn't appear to be a good way to explicitly label destinations so they can be programatically retrieved.

As a couple of workarounds, I suppose:

  • this information could be stored in the name of a destination
  • the underlying object in the index could be updated

Related to #56 a bit.

Deleted monitors keep executing a queries

I created a monitor in the cluster and proceded to delete it. After deletion I verified in my instrumentation the there was still a query running on a schedule, so I verified via curl /_opendistro/_alerting/stats and the deleted id still shows up

...
		"kmCLAWwB22FWnZ-IG6_Y": {
          "last_execution_time": 1563541589590,
          "running_on_time": true
        },
...

But the monitor reference seems to not exist anymore. If I try to GET it no response comes back, I've also tried to execute it and I'm getting the following back:

{
  "message" : "Can't find monitor with id: kmCLAWwB22FWnZ-IG6_Y"
}

How can I make sure I delete this job and stop it from querying?

Low disk alerting does not work

Hello. I`m trying to create monitor that will send notifications about low disk space
This query works fine

{
“query”: {
“bool”: {
“must”: [
{
“range”: {
“system.filesystem.used.pct”: {
“gte”: 0
}
}
},
{
“range”: {
“@timestamp”: {
“gte”: “now-5m”,
“lte”: “now”
}
}
}
]
}
}
}

but when Im trying to change "range": { "system.filesystem.used.pct": { "gte": 0 from 0 to 0.5 its stop working 😦
In logs I can see system.filesystem.used.pct

0.521
0.702


Can anyone help? I`m already trying to double quote “0.5”

Support for Alert Trigger Chaining

A mechanism of chaining we should support is chaining triggers. The focus is to conditionally execute additional monitors/triggers based on based on previous trigger outcomes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.