Coder Social home page Coder Social logo

Comments (9)

BrunoBonacci avatar BrunoBonacci commented on September 4, 2024 2

Generally, it is better to apply the transformation to the destination publisher rather than the source.

That's an interesting take: for filtering transforms I think that makes sense (and it's what we do). For the type of operation I'm hoping to do (i.e. split mulog events into multiple events) requiring that transform on every other publisher feels a bit icky.

The reason why I would say it is better to apply the transformation at the destination publisher is that the transformation and target representation might be valid (or required) only for one specific system.

You shouldn't need to change a log event representation at the source only because one target system doesn't support something. For example, if the same event is sent to kinesis and then stored to S3, you might want to keep the full sample in that system.

However, the point of μ/log is to give you the possiblity to manage your metering data the way you want, and you know your particular use-case better than anyone else.

Will transform-samples be the only way to transform samples (which I'm OK with since it's a superset of the currently implemented transform code)?

Yes, the intention is to have a transform-samples to apply a generic transformation to all the samples before the are logged,
and then use transform to apply a generic transformation to all events before they are sent to their destination.

To summarise:

  • transform: will be available in all built-in publishers (but not samplers)
  • transform-samples: will be available in all built-in samplers (but not publishers)

from mulog.

BrunoBonacci avatar BrunoBonacci commented on September 4, 2024 1

Hi @darindouglass

I've made the following change:

662e3d0

and also added a deprecation warning for the old configuration, the change is described here:

#75

If everything looks good I will release the version this weekend.

from mulog.

BrunoBonacci avatar BrunoBonacci commented on September 4, 2024

Hi @darindouglass,

Keep in mind that it is possible to increase the total_fields limit in Elasticsearch.

All built-in publishers support a custom transform. I would say that this would be the preferred approach since you can control on a publisher basis how the data will be sent to the specific publisher.

for example you could even split the mbean sample in their own index

(require '[where.core :refer [where]])

(μ/start-publisher!
 {:type :multi
  :publishers
  [{:type :console}
 
    ;; exclude mbean samples
    {:type :elasticsearch :url  "http://localhost:9200/" 
     :transform (partial filter (where :mulog/event-name not= :mulog/mbean-sampled))}

    ;; secondary index with only mbean samples
    {:type :elasticsearch :url  "http://localhost:9200/" 
     :index-pattern "'mulog-mbeans-'yyyy.MM.dd"
     ;; select only mbean sample and split by attribute
     :transform #(->> % 
                   (filter (where :mulog/event-name :is? :mulog/mbean-sampled))
                   (split-by-attribute)}]}))

from mulog.

bn-darindouglass-zz avatar bn-darindouglass-zz commented on September 4, 2024

Yeah I know you can increase the field limit in ES, however that's something we want to avoid, though it's definitely an option. We're not using the ES publisher (the ELK stack is being managed by another team and they don't expose ES directly) otherwise that'd work.

We make liberal use of transform in our code (we have a wrapper around mulog that adds tagging/filtering/etc automagically). I'd love to use transform on the mbean publisher but when looking at solutions I noticed that the mbean transform is applied differently than other publishers': mbean acts on a single map vs a seq of maps like other publishers.

If the behavior was the same I'd just be able to {:type :mbean :transform #(mapcat split! %)} but that won't work as is. Is there any chance we could have the transform applied here instead?

from mulog.

BrunoBonacci avatar BrunoBonacci commented on September 4, 2024

Generally, it is better to apply the transformation to the destination publisher rather than the source.
For example, I would apply the transform on the ELS publisher rather than the sampler.

However, the issue you pointed out about the different signature for :transform is my mistake.
Fix it, will cause a breaking change ;-( however I think it needs to be conformed to all the other transform functions.

from mulog.

BrunoBonacci avatar BrunoBonacci commented on September 4, 2024

On a deeper look, I see why it is this way. Let me explain.

The publishers transform functions have the following signature:

transform -> event-seq -> event-seq

The transform function is a function that goes from a sequence-of-events to a modified sequence-of-events.
A μ/log's event is a map with the following structure:

{:mulog/event-name :your-ns/event-name,
 :mulog/timestamp 1587501375129,
 :mulog/trace-id #mulog/flake "4VTCYUcCs5KRbiRibgulnns3l6ZW_yxk",
 :mulog/namespace "your-ns",
 :your-custom-key "and-values"}

Samplers like mbean-sampler use the publisher infrastructure to take samples at a regular interval.
However, when the sample is logged in μ/log (here) at this stage the event doesn't exist yet.
What you have here is a sample value that is not in the same shape as the event above.

the shape of the value transformed here (the sample) depends on the MBean being sampled, for example for the java.nio:* might look like:

{:canonical-name "java.nio:name=direct,type=BufferPool",
  :domain "java.nio",
  :keys {"name" "direct", "type" "BufferPool"},
  :attributes
  {:Name "direct",
   :Count 10,
   :TotalCapacity 228226,
   :MemoryUsed 228226,
   :ObjectName "java.nio:name=direct,type=BufferPool"}}

Therefore it cannot expect the same transform event functions. The current transform on mbean sampler is like:

;; currrently (0.7.1)
transform -> sample -> sample

But I can see how this can be confusing.
It is probably better to change the name of the function to avoid confusion.
My current thinking is to change it to: transform-samples with the following signature:

;; future (0.8.0)
transform-samples -> sample-seq -> sample-seq

This type of transform will be applied to the sample value (not the event) prior the logging, thus offering the possibility to filter/transform/split at will.

What do you think, does this make sense to you?

from mulog.

bn-darindouglass-zz avatar bn-darindouglass-zz commented on September 4, 2024

Generally, it is better to apply the transformation to the destination publisher rather than the source.
For example, I would apply the transform on the ELS publisher rather than the sampler.

That's an interesting take: for filtering transforms I think that makes sense (and it's what we do). For the type of operation I'm hoping to do (i.e. split mulog events into multiple events) requiring that transform on every other publisher feels a bit icky.

My current thinking is to change it to: transform-samples with the following signature:

I think this is fine. Will transform-samples be the only way to transform samples (which I'm OK with since it's a superset of the currently implemented transform code)?

from mulog.

bn-darindouglass-zz avatar bn-darindouglass-zz commented on September 4, 2024

Looks good to me. Thanks again!

from mulog.

BrunoBonacci avatar BrunoBonacci commented on September 4, 2024

Released in: v0.8.0 (please see deprecation warning: #75)

from mulog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.