Coder Social home page Coder Social logo

Comments (7)

dmuino avatar dmuino commented on June 24, 2024

Servo does not support mbean operations. BasicCounter does not support resetting the value. One way to get that functionality for the Counter would be to use annotations around an AtomicLong/
AtomicInteger and then expose some mechanism that would reset it.

@Monitor(name="TotalErrors", type=COUNTER)
 private AtomicInteger totalErrors = new AtomicInteger(0);

public void resetErrors() {
    totalErrors.set(0);
}

from servo.

brharrington avatar brharrington commented on June 24, 2024

It should also be pointed out that it isn't generally a good idea to reset a BasicCounter. If you have multiple observers polling the value then they each need to have independent state to get accurate values. The CounterToRateTransform can be wrapped around a particular observer to keep the state and convert the cumulative total to a rate per second which is typically more useful. If you reset the value, then you are corrupting the state for other observers.

from servo.

freesoft avatar freesoft commented on June 24, 2024

@dmuino Thank you for your suggestion, but I don't think that's something external system can do it from outside of running JVM. Btw, the value in your example need to be AtomicLong.

@brharrington That's really depends on how LiveOps/DevOps/GNOC guys ( or someone in charge of live operation ) are want to use, and still I think resetting values through mbean operation is handy and useful.

from servo.

brharrington avatar brharrington commented on June 24, 2024

Can you elaborate on the overall workflow you are wanting?

Internally we typically have a setup that supports multiple observers:

  • Send data to internal time series database.
    • Small subset with higher resolution, every 10s
    • Most data every 1m to main stack
  • Send some data to CloudWatch to support auto-scaling
  • Optionally log values to local file for debugging purposes
  • Local observer on the instance that checks conditions and can trigger an alert
  • Viewing data via jmx

So we typically have 5+ different observers configured to receive all or parts of the data and we need each of them to get a consistent view. Resetting a BasicCounter means that at least some of these will get a bad rate value for the polling interval where the reset occurred. Servo does support ResettableMonitor types, these would be configured such that there is a primary poller that would be responsible for resetting the value and then we would typically have the observer that receives the data tee it to all the others that need samples at that interval.

from servo.

freesoft avatar freesoft commented on June 24, 2024

Let's assume

  • You already have your own monitoring/alert system written in C/C++ or Python. The system is invented to cover different types of OS and languages. The system has its own logging protocol other than JMX.
  • You have small agent running on each server to collect metrics and send the metrics to the centralized monitoring server. So yes, it has observer, but neither Servo version nor Java. It is common case for many companies or developers who were working on different projects with different platform/languages for several years.
  • Now, let's say you have new Java application with Servo library. You need to add JMX attributes gathering feature in your agent to send metrics to the monitoring server, which is non-Java system.
  • Monitoring server triggers alerts based on different measure, some are rates, others are some specific number/amount. Let's say one condition of alerts will triggered based on current success/fail ratio ( like "alert when FAIL CNT / SUCCESS CNT > 0.1" or something ). Once fail rate is over 10%, it will keep alert every 5 minutes or any given time frame until someone fix the issue and SUCCESS CNT is increased enough to make fail rate < 10%. => Monitoring system will keep alerting until the system has enough success count EVEN AFTER PROBLEM HAS SOLVED.

Solution without resetting JMX attributes through mbean operation would be

  1. Restart service server to stop alert.
  2. Change every alert measure to use rate per given time frame instead of using counter.
    but those are sounds odd to me.

I understand your concern about inconsistency when counter has reset, but still those feature can be useful depends on the system or situation.

from servo.

freesoft avatar freesoft commented on June 24, 2024

Can I get any updates if you guys are thinking about this feature or not? Or maybe you guys will accept code changes if I commit? If you think it's unnecessary, I'm going to find workaround for my case rather than waiting response.
Thank you!

from servo.

brharrington avatar brharrington commented on June 24, 2024

In response to your first three bullets:

  • Servo can send to other systems that are not jvm based. For example there is an observer implementation that forwards to graphite which is python. You could also write one that sends to your local agent on the machine.
  • For servo, JMX is just a view of the data. Servo data can be captured by plugging in an observer that then communicates with whatever your backend is.

On the last bullet, I disagree. The goal of servo is to provide a way to indicate monitors and collect the data via observers. It should be able to tell you what happened during the last polling interval (provided you have wrapped the observer in CounterToRateTransform for the case of monotonic counters). This is critical because it means that the signal you are getting to the monitoring system will also tell you when the problem actually goes way in terms of what is measured, not just when someone clicks reset and says it is fixed. In our case the monitoring system supports defining alerts and we can visually depict this information so we'll see something like:

failure_example

We'll then resolve the alert after confirming that the state is back to normal. In short I don't think we will accept this change because:

  1. It doesn't seem necessary. I don't see why you couldn't write an observer that bridges your internal system with the data coming in from servo. Look at graphite as an example of talking to a non-java system.
  2. Resetting the state of a basic counter breaks the model and has undesirable pitfalls. If you really need this it should follow the gauge contract and you can wrap a gauge around any Number implemenation like AtomicLong that would give you full control if you needed it. Note "counter" is a bit overloaded, we use the RRD notion where it is a monotonically increasing value used to generate a rate per second.
  3. As described above, I think the current servo approach is better in that it gives the downstream monitoring/collection system an input signal that can tell you when the actual measurement shows the issue was resolved.

from servo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.