Coder Social home page Coder Social logo

ibmstreams / streamsx.monitoring Goto Github PK

View Code? Open in Web Editor NEW
5.0 9.0 5.0 33.56 MB

The com.ibm.streamsx.monitoring toolkit provides capabilities to create applications that monitor IBM Streams and its applications.

Home Page: https://ibmstreams.github.io/streamsx.monitoring/

License: Other

Java 98.65% Python 1.35%
metrics stream-processing ibm-streams performance monitoring toolkit

streamsx.monitoring's Introduction

streamsx.monitoring

The com.ibm.streamsx.monitoring provides capabilities to create applications that monitor IBM Streams and its applications.

The toolkit contains operators that uses the JMX API to monitor applications:

  • com.ibm.streamsx.monitoring.metrics::MetricsSource retrieves metrics from one or more jobs and provides them as tuple stream.
  • com.ibm.streamsx.monitoring.jobs::JobStatusMonitor receives notifications of PE status changes from one or more jobs and provides them as tuple stream.
  • com.ibm.streamsx.monitoring.system::LogSource receives notifications of application error and warning logs and provides them as tuple stream.

Documentation

Find the full documentation here.

IBM Streams 5.x - IBM Cloud Pak for Data

This toolkit is compatible with the IBM Streams version 5.x running in IBM Cloud Pak for Data.

IBM Streams 4.3.x - Streaming Analytics service on IBM Cloud

For IBM Streams version 4.3.x and Streaming Analytics service on IBM Cloud you need to use version 2 of the com.ibm.streamsx.monitoring toolkit.

streamsx.monitoring's People

Contributors

nelsonong avatar petenicholls avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streamsx.monitoring's Issues

Need Getting Started instructions for toolkit

I wanted to try out the toolkit to monitor my application. When I tried to build by running ant, I got the following errors.

I am unsure how to get started with this toolkit.

[chanskw@chanskw1 streamsx.metrics]$ ant
Buildfile: /home/chanskw/git/streamsx.metrics/build.xml

toolkit:

init:
[copy] Copying 1 file to /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics

compile:
[javac] Compiling 38 source files to /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/build
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: illegal start of type
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: = expected
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: ';' expected
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: illegal start of type
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: expected
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: = expected
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: ';' expected
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: illegal start of type
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:5: error: expected
[javac] default void close() throws Exception {
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:7: error: = expected
[javac] }
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:7: error: reached end of file while parsing
[javac] }
[javac] ^
[javac] /home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/impl/java/src/com/ibm/streamsx/metrics/internal/Closeable.java:8: error: reached end of file while parsing
[javac] 12 errors

BUILD FAILED
/home/chanskw/git/streamsx.metrics/build.xml:28: The following error occurred while executing this line:
/home/chanskw/git/streamsx.metrics/com.ibm.streamsx.metrics/build.xml:44: Compile failed; see the compiler error output for details.

Total time: 1 second

Add notification lost & connection notifications to JMX sources.

The JMX connection indicating notifications may have been lost is useful information for downstream operators.

Similarly the loss of the connection would be useful.

These are all standard JMX notifications so they should be able to be submitted on the existing schemas, with their existing type and a timestamp, and everything else the default value.

Role of operator name in filter document for JobStatusSource?

From the SPLDOC I couldn't fully understand the role the operator name in the filter document.

I guess it's probably selecting which PEs are monitored, but it would be good to clearly state this in the operator description and filter document overview.

MetricsSource: if a new job is submitted, PeBean.retrieveMetrics fails

If the test application is running, and a new job is submitted (in this case, the same test app), PeBean.retrieveMetrics raises a NullPointerException.

ERROR #splapptrc,J[0],P[0],ChangeNotifications M[MetricsSource.java:com.ibm.streamsx.metrics.MetricsSource$1.run:357]  - Operator error
ERROR #splapptrc,J[0],P[0],ChangeNotifications M[?:?:0]  - java.lang.NullPointerException
ERROR #splapptrc,J[0],P[0],ChangeNotifications M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.BaseMetrics.getTimestamp(BaseMetrics.java:340)
ERROR #splapptrc,J[0],P[0],ChangeNotifications M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.BaseMetrics.<init>(BaseMetrics.java:192)
ERROR #splapptrc,J[0],P[0],ChangeNotifications M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.PEMetrics.<init>(PEMetrics.java:64)
ERROR #splapptrc,J[0],P[0],ChangeNotifications M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.SrmClient.getPEMetrics(SrmClient.java:292)
ERROR #splapptrc,J[0],P[0],ChangeNotifications M[?:?:0]  - 	at com.ibm.streams.management.internal.PeBean.retrieveMetrics(PeBean.java:952)

Not sure whether this is important but just before the exception is raised, 4 time the following notification is received.

ERROR #splapptrc,J[0],P[0],Numbers M[PeHandler.java:com.ibm.streamsx.metrics.internal.PeHandler.handleNotification:114]  - notification: javax.management.Notification[source=com.ibm.streams.management:type=domain.instance.pe,domain="domain4.2.0.0",instance="streams4.2.0.0",id=1][type=com.ibm.streams.management.pe.changed][message=], userData=null

Change style of SPL types in toolkits.

The SPL types in this toolkit have the convention of ending with _t.
https://github.com/IBMStreams/streamsx.metrics/blob/master/com.ibm.streamsx.metrics/com.ibm.streamsx.metrics/Types.spl

That doesn't match the recommended style:
http://www.ibm.com/support/knowledgecenter/en/SSCRJU_4.0.0/com.ibm.streams.dev.doc/doc/str_nametype.html

or types in other toolkits.

For consistency within the toolkits and product can the types be changed to names like Origin, Notification.

JobStatusSource generates near duplicate events when a PE starts up.

When a PE starts up I consistently see two events saying it is healthy and running. They only differ in the timestamp.

This may be due to some underlying cause, where the PE has changed but the differences cannot be seen in the notification info. If that's the case then the SPLDOC should document more info about the events and what tuples might be emitted.

{notifyType="com.ibm.streams.management.pe.changed",domainId="standard1",instanceId="3a4b0956-ec2c-4cd0-98e7-9da7c70c0fa3",jobId=28,jobName="test2_ec::test_app_log_28",resource="10.143.13.241",peId=29,peHealth="healthy",peStatus="running",eventTimestamp=(1505363270,520000000,0)}
{notifyType="com.ibm.streams.management.pe.changed",domainId="standard1",instanceId="3a4b0956-ec2c-4cd0-98e7-9da7c70c0fa3",jobId=28,jobName="test2_ec::test_app_log_28",resource="10.143.13.241",peId=29,peHealth="healthy",peStatus="running",eventTimestamp=(1505363270,515000000,0)}

Handle JMX problems

The current implementation does not handle JMX connection problems, for example, a wrong connectionURL is provided, or the domain is stopped.

Identify problems and handle them accordingly.

From issue #19:

  • I assume it won't currently work on Bluemix, but it should be improved to work on Bluemix. The issue is how to get the JMX connect URL.
  • The JMX connect URL can change on a failure, so ideally the operator would take a list of URLs, or even be able to figure out the URLs automatically and retry on a connection failure.

Misleading default comment in JobStatusSource

Per default, the JobStatusSource operator monitors neither any domain, nor any instances, nor job, nor any other Streams job-related object.

I don't think this is technically true, the operator fails if there is no filter document so there is no default setting, the filtering is driven by the document.

Though I think if there is a default it should be the opposite, monitor every available item. See #79

Support Metric Sink for integration in IBM Bluemix Metric/Log Service

IBM Bluemix provides a solution/service for metric/log storage and appropriate visualization with Grafana (metrics) and Kibana (logs).
This is ready to use. An application or service just needs to write its metrics/log records via the provided API and one can immediately see the metrics/logs in dashboards created on ones own need.
Any Streams application can immediately show and process own metrics in IBM Bluemix when this new MetricSink would be available.

Composites/microservices that perform high-level monitoring.

Is this a suitable repo for having microservices that support common monitoring tasks, such as a service that creates an alert tuple each time a job is having issues?

Then these alerts could be subscribed to by alert distributed services, such as sending a slack or text message.

Solve issue with user-specific paths in .classpath files

The .classpath files that are part of Streams Studio projects, contain absolute paths to JAR files in $STREAMS_INSTALL/lib and $STREAMS_INSTALL/ext/lib. Find a solution to get rid of the already expanded (to /home/xxx/InfoSphere_Streams/4.2.0.0/) environment variables.

In other projects, the required JAR files are copied to the project, for example, opt folder. From my point of view, this is a bad solution because it requires to update the JAR files for each release. Instead, I would prefer that we have some preparation script that must be called before the projects are opened in Streams Studio.

The preparation script could create soft links (ln -s) or copy the files from $STREAMS_INSTALL.

This task is related to issue #3. The build environment that is used from command line, is probably using the direct paths using $STREAMS_INSTALL.

Provide LogSource Operator

Per discussion with Streams architecture team, in order to support immediate consumption of Streams log messages for alerting purposes, we'd like to have an operator that can be used to streams the Streams log messages.
The operator would subscribe to JMX and expose the messages as structured tuples, making the following available per message:

  • Job ID
  • PE ID
  • Operator ID
  • Event timestamp
  • Message code
  • Message text
  • Parameter map (optional)

It's assumed that in most cases, the functionality exists today in the JMX interfaces and is being leveraged is most toolkits, but actually building this source and using it will help determine if there are any gaps in the message subscription mechanism and/or leverage of the appropriate logging by common toolkits.

MetricsSource supports application configuration

Credentials or other application configuration can change during the lifetime of a job. Streams offers application configuration objects to store such information in Apache ZooKeeper in an encoded state.

The following MetricsSource parameters are candidates to be stored in application configuration objects:

  • connectionURL: except that we find a solution to retrieve it automatically (see issue #5)
  • password: passwords often change periodically and we do not want to restart or recompile apps
  • filterDocument: instead of storing the configuration as JSON-encoded file, it might be useful to have the configuration centralized; we must be aware that different operators can have different configurations in the same application

Using application configuration objects stands in contrast to issue #18 because this feature is available since Streams v4.2 only.

Change structure of the JSON filter document (filterDocument parameter)

The current implementation uses a quite simple JSON format to specify the filters for domain, instance, job, operator, and metric names.

Excerpt from the checked-in filters.json file:

[
	{
		"domainNames":"domain[0-9.]+",
		"instanceNames":"instance[0-9.]+",
		"jobNames":"\\S+",
		"operatorNames":".*",
		"metricNames":"nWindowPunctsSubmitted",
	},
	{
		"domainNames":"domain\\S+",
		"instanceNames":
		[
			"instance4\\.2\\.0\\.0",
			"instance4\\.2\\.0\\.1",
			"streams4\\.2\\.0\\.0",
		],
		"jobNames":".*",
		"operatorNames":"\\S+",
		"metricNames":
		[
			"nTuplesSubmitted",
			"nWindowPunctsQueued",
			"nTuplesProcessed",
		]
	},

The filter document contains a list of tuples. Each tuple (currently) 5 attributes: domainNames, instanceNames, jobNames, operatorNames, and metricNames.

Each attribute can be a single string value, or a list of string values. The string values are regular expressions. An instance is monitored if its instance name matches at least one of the specified filters. For an instances, this means that the domain and the instance name (of the same filter tuple) must match.

With the introduction of port metrics (and in future perhaps also PE and PE connection metrics), a restructured filter document might be useful.

The proposal is to use the same contained-relation as it exists in Streams. This means, domains have instances, instances have jobs, jobs have operators, etc. This tree-like structure allows adding new elements on every level. The following JSON string shall sketch this idea:

[
	{
		domainNames:["a","b"]
		instances:
		[
			{
				instanceNames:["a","b"]
				jobs:
				[
					{
						jobNames:["x","y"]
						operators:
						[
							{
								operatorNames:["a","s","d"]
								metrics:[nPuncts,nSent,rnd]
								(input|output)Ports: # optional
								[
									{
										index:[1,2,3]
										metrics:["a","b","c"]
									}
								]
							},
							...
						],
					},
				]
			}
		]
	},
	{
		...
	},
]

The idea is to have a structure that can easily parsed and allows extensions without the need to rework the whole operator.

Use multi-threading to retrieve metrics for many MXBean objects in parallel

The current implementation uses a sequential approach to retrieve the metrics: The operator holds MXBean objects for all monitored objects, OperatorMXBean, OperatorInputPortMXBean, and OperatorOutputPortMXBean. It iterates through all these objects and calls the retrieveMetrics method. The result is processed before the function is called on the next object.

It might be useful to parallelize the function call, for example, processing the metrics of 5, 10, or 20 objects in parallel.

The task is to analyze whether this is a required and useful enhancement, and how it can be solved.

Feedback on toolkit:

Some feedback from using the toolkit and sample: It was easy (with some modifications) to get the sample app running, but thinking about it's use for real monitoring I came up with these items:

  • Should MetricsSource domain parameter be optional, defaulting to the domain the job is running in? For Bluemix it's not actually documented what the domain identifier is.
  • Should MetricsSource filterDocument parameter be optional, defaulting to all metrics for all jobs in all instances in the specified domain identifier.
  • I believe all existing apis use domainId or domainID (i.e. it's a domain identifier), but MetricsSource uses domain (operator parameter) and domainName in the filter document. Maybe the same issue for instance name vs. identifier.
  • The sample app fails on the quick start vm since the domain is StreamsDomain which is not matched in the filter document.
  • Filter document being a file would seem to be limiting, especially on Bluemix. Having it being able to accept a JSON serialized object as an rstring would allow the filtering to be stored in an application configuration.
  • I assume it won't currently work on Bluemix, but it should be improved to work on Bluemix. The issue is how to get the JMX connect URL.
  • The JMX connect URL can change on a failure, so ideally the operator would take a list of URLs, or even be able to figure out the URLs automatically and retry on a connection failure.

Have a consistent notification type?

Just to note that the SPL standard toolkit already has a notification type for JMX, if Notification_t is a JMX notification then should it be consistent with the existing type.

Mainly entered as a reminder to follow up with more investigation

Change Origin.Type enum to be consistent with existing terminology.

Origin.Type uses the value Operator to describe custom metrics, why not just use Custom to be aligned with the existing terminology. I also wonder if the type can be simplified to just MetricType rather than wrapping in an Origin composite. I tend to find composite embedded types non-intuitive to use, so a simple type is clearer.

type MetricType = enum {
			Custom,
			OperatorInputPort,
			OperatorOutputPort,
                        PEInputPort,
                        PEOutputPort
		};

DomainHandler errors on instance created and deleted ChangeNotifications

Noticed these in the logs - didn't verify whether or not this was only a logging issue or if was preventing these events from being emitted:
16 Aug 2017 13:23:18.081 [5747] ERROR #splapptrc,J[501],P[1280],ChangeNotifications M[DomainHandler.java:com.ibm.streamsx.monitoring.jmx.internal.DomainHandler.handleNotification:130] - received INSTANCE_DELETED notification: user data is not an instance of String

16 Aug 2017 13:42:46.499 [5747] ERROR #splapptrc,J[501],P[1280],ChangeNotifications M[DomainHandler.java:com.ibm.streamsx.monitoring.jmx.internal.DomainHandler.handleNotification:130] - received INSTANCE_DELETED notification: user data is not an instance of String

@markheger ^

Better default for filterDocument?

While JobStatusSource was fairly easy to get running it seemed strange that I had to copy and paste a filter document from the SPLDOC and then figure out how to use it with the operator.

For the common use with Streaming Analytics service it seems a filter of all jobs and all PEs in my single instance should be easier.

I was thinking the having standard filter docs in com.ibm.streamsx.monitoring/etc might be useful, e.g. jobSourceAll.json. But I'm not sure if there's an SPL function to pick up the root of another toolkit (there is at the C++/Java operator API).

Then maybe could the default be changed to be include everything, so that filterDocument is optional?

Discuss MetricsSource parameters

The first approach for the operator parameters was:

  • connectionURL: Specifies the connection URL as returned by the streamtool getjmxconnect command.
  • user: Specifies the user that is required for the JMX connection.
  • password: Specifies the password that is required for the JMX connection.
  • domain: Specifies the domain that is monitored.
  • retryPeriod: Specifies the period after which a failed JMX connect is retried. The default is 10.0 seconds.
  • retryCount: Specifies the retry count for failed JMX connects. The default is -1, which means infinite retries.
  • filterDocument: Specifies the path to a JSON-formatted document that specifies the domain, instance, job, operator, and metric name filters as regular expressions. Each regular expression must follow the rules that are specified for Java Pattern.
  • scanPeriod: Specifies the period after which a new metrics scan is initiated. The default is 5.0 seconds.

Following questions have to be discussed:

  • Similar to many other operators the user and password parameters would be specified as plain text. Shall the operator be prepared to (a) receive password changes on a control port, (b) receive notifications from the some kind of configuration plane? Or, can we assume that this password is stable forever? Or, is there another mechanism that we can use to get rid of the two parameters?
  • Is it possible that JMX provides a list of available domains without the need to know the domain?
  • Does it make sense to have the retryPeriod and retryCount parameters, or should the operator retry always? A use case might be that the MetricsSource-using application runs on domain A but monitors domain B, which might be down for whatever reason. Do we want that the application on domain A fails?
  • Similar to many other operators, the scanPeriod is a pause between two scan cycles. Shall the parameter be renamed to scanPause?
  • The filterDocument parameter will be discussed in another issue. See issue #4 for details.

Error when application configuration's filterDocument has tabs

If the application configuration's filterDocument contains tabs in it's value, the MetricsSource retrieves a string that looks like this:

[ \t{ \t\t"domainIdPatterns":".*", \t\t"instances": \t\t[ \t\t\t{ \t\t\t\t"instanceIdPatterns":".*", \t\t\t\t"jobs": \t\t\t\t[ \t\t\t\t\t{ \t\t\t\t\t\t"jobNamePatterns":"com.ibm.streamsx.metricsMonitor.*", \t\t\t\t\t\t"pes": \t\t\t\t\t\t[ \t\t\t\t\t\t\t{ \t\t\t\t\t\t\t\t"metricNamePatterns":".*",
...

This isn't recognized as valid JSON and, thus, I get this operator trace error:

...  - Unexpected character '\' on line 1, column 16
...  - java.io.IOException: Unexpected character '\' on line 1, column 16
...  - com.ibm.json.java.internal.Tokenizer.next(Tokenizer.java:129)
...  - com.ibm.json.java.internal.Parser.parseArray(Parser.java:147)
...  - com.ibm.json.java.internal.Parser.parseValue(Parser.java:230)
...  - com.ibm.json.java.internal.Parser.parseObject(Parser.java:110)
...  - com.ibm.json.java.internal.Parser.parse(Parser.java:58)
...  - com.ibm.json.java.internal.Parser.parse(Parser.java:47)
...  - com.ibm.json.java.JSONObject.parse(JSONObject.java:79)
...  - com.ibm.json.java.JSONObject.parse(JSONObject.java:91)
...  - com.ibm.json.java.JSONArray.parse(JSONArray.java:161)
...  - com.ibm.json.java.JSON.parse(JSON.java:80)
...  - com.ibm.json.java.JSON.parse(JSON.java:135)
...  - com.ibm.json.java.JSON.parse(JSON.java:154)
...  - com.ibm.streamsx.metrics.internal.filter.Filters.setupFilters(Filters.java:276)
...  - com.ibm.streamsx.metrics.MetricsSource.setupFilters(MetricsSource.java:625)
...  - com.ibm.streamsx.metrics.MetricsSource.initialize(MetricsSource.java:363)
...  - com.ibm.streams.operator.internal.runtime.api.OperatorAdapter.initialize(OperatorAdapter.java:735)
...  - com.ibm.streams.operator.internal.jni.JNIBridge.<init>(JNIBridge.java:271)

My temporary fix for this is removing all the \t's from the JSON string with: .replaceAll("\\\\t", "");.

Provide build environment

Currently, Streams Studio is used to build and run the toolkit and the sample application.

Provide, for example, an Ant build.xml or Makefile to support building (including SPLDOC) and running the toolkit and the sample(s).

Leverage the updateWeb.pl tool from the streamsx.inet toolkit to update SPLDOC in the gh-pages branch.

Document how to build the toolkit in BUILD.md and SPLDOC.

NullPointerException on retrieveMetrics()

I'm not sure if anybody else has run across this error but it pops up frequently for me. I run the MetricsSource and sometimes it randomly throws a NPE, other times, it doesn't. When it does, a relaunch of the MetricsSource usually fixes the issue. Here is the operator trace I get:

01 Jun 2017 13:57:34.480 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[MetricsSource.java:com.ibm.streamsx.metrics.MetricsSource$1.run:398]  - Operator error
01 Jun 2017 13:57:34.482 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - java.lang.NullPointerException
01 Jun 2017 13:57:34.482 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.BaseMetrics.getTimestamp(BaseMetrics.java:340)
01 Jun 2017 13:57:34.482 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.BaseMetrics.<init>(BaseMetrics.java:192)
01 Jun 2017 13:57:34.482 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.PEMetrics.<init>(PEMetrics.java:64)
01 Jun 2017 13:57:34.482 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.ibm.streams.instance.srm.client.SrmClient.getPEMetrics(SrmClient.java:292)
01 Jun 2017 13:57:34.483 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.ibm.streams.management.internal.PeBean.retrieveMetrics(PeBean.java:952)
01 Jun 2017 13:57:34.483 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
01 Jun 2017 13:57:34.483 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
01 Jun 2017 13:57:34.483 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at java.lang.reflect.Method.invoke(Method.java:508)
01 Jun 2017 13:57:34.483 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:83)
01 Jun 2017 13:57:34.484 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
01 Jun 2017 13:57:34.484 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
01 Jun 2017 13:57:34.484 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at java.lang.reflect.Method.invoke(Method.java:508)
01 Jun 2017 13:57:34.484 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:287)
01 Jun 2017 13:57:34.484 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:205)
01 Jun 2017 13:57:34.485 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:187)
01 Jun 2017 13:57:34.485 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:129)
01 Jun 2017 13:57:34.485 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:66)
01 Jun 2017 13:57:34.485 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:249)
01 Jun 2017 13:57:34.485 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:150)
01 Jun 2017 13:57:34.486 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:264)
01 Jun 2017 13:57:34.486 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:831)
01 Jun 2017 13:57:34.486 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:813)
01 Jun 2017 13:57:34.486 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at javax.management.remote.generic.ServerIntermediary.handleRequest(ServerIntermediary.java:280)
01 Jun 2017 13:57:34.486 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at javax.management.remote.generic.ServerIntermediary$PrivilegedRequestJob.run(ServerIntermediary.java:951)
01 Jun 2017 13:57:34.486 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at java.security.AccessController.doPrivileged(AccessController.java:686)
01 Jun 2017 13:57:34.487 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleMBSReqMessage(ServerIntermediary.java:727)
01 Jun 2017 13:57:34.487 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:629)
01 Jun 2017 13:57:34.487 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:266)
01 Jun 2017 13:57:34.487 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
01 Jun 2017 13:57:34.487 [24022] ERROR #splapptrc,J[39],P[40],MetricsSource_1 M[?:?:0]  - 	at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)

I've been try-catching the retrieveMetrics() function call in MetricOwningHandler.captureAndSubmitMetrics() and that seems to have fixed the problem.

Maybe the MetricsSource tries to retrieve the PeMXBean for jobs too soon after they're launched (have not come across this error for any other MXBean).

MetricsSource: Support snapshotMetrics in addition to retrieveMetrics

The MetricsSource operator calls retrieveMetrics on operator, port, or PE MxBean objects to get the corresponding metrics according to the specified filters. This might result in a huge amount of JMX calls, depending on the criteria in the filter document and the size of the monitored applications.

For some use cases, it might be useful to use snapshotMetrics instead of retrieveMetrics. snapshotMetrics can be called on instance, job, or PE MxBean objects. The returned URL is used to get all metrics via an HTTP GET request. The advantage is the significantly reduced number of JMX calls, the disadvantage is the probably larger amount of data that is returned and which requires JSON parsing.

Some code for snapshotMetrics is available as comment from earlier prototyping. Also the JmxTrustManager is available. Its code is copied from here. It might be necessary to implement additional security features into the trust manager.

So far, snapshotMetrics returns all available metrics, but this might change in the future, for example, filters might be supported. If filters are supported, it has to be evaluated whether they match to the regex syntax and features of the filter document, and whether they can be used to reduce the number of JMX calls.

So far, I recommend to introduce a parameter that allows switching between snapshotMetrics and retrieveMetrics mode.

Add tests for toolkit and samples

Add test applications and test suites for unit test

  • Locate tests scripts in tests directory.
  • Verify operators in standalone and distributed mode
  • Verify with multiple domains, Monitor is running in different domain as the jobs to be monitored, for example "reconnect scenario".

MetricsSource shall send tuples even if a metric value did not change

The current implementation of the MetricsSource operator sends tuples only if a metric value is changed.

For certain use cases, it might be required to get a metric value periodically even if the value did not change.

Either completely remove this feature from the operator and implement it as, for example, composite operator that is also available in this toolkit, or let the application developer decide whether this feature is enabled or disabled (via parameter).

To not duplicate functions, the recommendation is to remove the feature from this operator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.