Coder Social home page Coder Social logo

Tags in Object Key about fluent-plugin-s3 HOT 21 CLOSED

fluent avatar fluent commented on August 25, 2024
Tags in Object Key

from fluent-plugin-s3.

Comments (21)

repeatedly avatar repeatedly commented on August 25, 2024

Currently no.
Do you want to specify %{tag} in s3_object_key_format?

from fluent-plugin-s3.

sgessa avatar sgessa commented on August 25, 2024

Yes please! How can I achieve this? I just could implement this locally but I just started playing with fluentd and plugins.

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

How to implement?

We can't assume single tag in s3 plugin because event has own tag.
For example:

<match foo.**>
  type s3
  # ...
</match>

In this case, events in s3 plugin may have foo.bar, foo.baz... tags.

from fluent-plugin-s3.

sgessa avatar sgessa commented on August 25, 2024

Yep I just want to add ${tag} in object key like this:

<match foo.**>
  type s3
  s3_object_key_format %{time_slice}_${tag}_%{index}.%{file_extension}
   ....
</match>

If event has tag foo.bar for example, I'm expecting to find it in the object key.
Also, if I want to take only "bar", I should be able to call remove_tag_prefix foo.

Thanks

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

Hm.

In your approach, S3 plugin stores multiple objects into S3 at the same time, right?

from fluent-plugin-s3.

sgessa avatar sgessa commented on August 25, 2024

Yes. I need the %{tag} because I'm storing access logs grouped by domain and I'm passing the domain name in the tag..

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

Okay. Could you send the pull request?
Maybe error handling and breaking idempotent are important factor.

from fluent-plugin-s3.

sgessa avatar sgessa commented on August 25, 2024

I don't know how to implement this, that's why I asked here :(
I started playing with fluentd yesterday :D

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

I see. We need some time if implement.

from fluent-plugin-s3.

dave7373 avatar dave7373 commented on August 25, 2024

The is another plugin that adds this feature to the s3 plugin. Please check it out here:
https://github.com/campanja/fluent-output-router

from fluent-plugin-s3.

jsermeno avatar jsermeno commented on August 25, 2024

We just began using fluentd in production. Right now we're using the plugin that dave7373 mentioned to achieve storing logs for each event in a different folder. It's working, although I would like to explore if there is a more efficient way. The fluent-output-router starts a new fluent-plugin-s3 for every event. This creates a lot of threads if you have a lot of events. Is it due to fluentd having a single buffer queue structure that new outputs must be instantiated if you want to have separate chunks for each event?

In your approach, S3 plugin stores multiple objects into S3 at the same time, right?

In the approach you discussed above, did you mean that in the write method you would split the chunk into separate pieces based on tag and then write each sub-chunk to a different S3 file? The only problem I see here is that you may get very small S3 files if an event only occurs a few times within a chunk. Where as if you had a individual chunk for each event this would occur less often. Maybe that is not a problem, and can be mitigated by making the chunk size larger? Perhaps it is also more efficient than creating a new output for each event. Are there downsides to making the chunk size larger? According to the documentation the default chunk size is 8m.

I should also mention that I would love to work on implementation if we come to some agreement on what the best solution is.

Thanks!

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

@jsermeno

We just began using fluentd in production.

Coool 👍

The fluent-output-router starts a new fluent-plugin-s3 for every event

forest and router plugin creates new output when receive the new tag, not every event. So the number of outputs / threads doesn't expload on many cases.

The only problem I see here is that you may get very small S3 files if an event only occurs a few times within a chunk.

Hmm... My concern is error handling.
S3 plugin and forest based tag separation use Fluentd's retry mechanizm when error occurred.

On the other hand, if we supports tag separation in S3 plugin, then we should implement own retry mechanizm which similar to Fluentd.
Because tag separation often executes multiple requests to S3. I already mentioned this point:

"Maybe error handling and breaking idempotent are important factor."

Maintain duplicated retry feature seems high cost and not so many advantages I think.

from fluent-plugin-s3.

jsermeno avatar jsermeno commented on August 25, 2024

forest and router plugin creates new output when receive the new tag, not every event. So the number of outputs / threads doesn't expload on many cases.

Oops sorry, I did mean new tag.

Hmm... My concern is error handling.
S3 plugin and forest based tag separation use Fluentd's retry mechanizm when error occurred.

I see, do you believe that this optimization would be better suited to become part of fluentd itself? Perhaps there would be a configuration option that limits the number of threads somehow. Scribe for example, has a configuration option to prevent creating a new thread for each category / tag.

Maintain duplicated retry feature seems high cost and not so many advantages I think.

The cost does seem to be becoming larger than I initially thought. There are many advantages though. There are a number of use cases that require a high number of tags. Particularly when handling multiple applications. The number of tags in our case could easily exceed 1000 in the near future and could grow larger. We are already at several hundred. The main benefit I see in storing that many tags in separate folders is if you want to perform analytics on a small subset of events you do not have to open every file to search for the events and potentially speed up queries by quite a bit.

from fluent-plugin-s3.

ryanc4 avatar ryanc4 commented on August 25, 2024

Can we follow the same approach as in this plugin?

https://github.com/fluent/fluent-plugin-mongo/blob/master/lib/fluent/plugin/out_mongo.rb#L93

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

Sorry for the late reply.

@jsermeno

Scribe for example, has a configuration option to prevent creating a new thread for each category / tag.

This is interesting. I will check Scribe source code later.

@ryanc4

Currently no because S3 plugin already use same approach to separate record with event time.
For almost users, forest and S3 plugin is enough.
But above jsermeno case, we need more better performance option.

from fluent-plugin-s3.

ryanc4 avatar ryanc4 commented on August 25, 2024

@repeatedly I am not seeing s3 plugin is using emit to split the tag, I think by allowing splitting using the tag it allow us to do log analysis more quickly in S3 (with EMR)

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

@ryanc4 S3 plugin itself doesn't extend emit but TimeSlicedOutput, super class of S3 plugin, set time sliced string to key in emit. If supports tag included key in S3 plugin, we should extend TimeSlicedOutputs#emit. Maybe forest plugin is now better unless user has special reason.

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

@jsermeno I checked Scribe's newThreadPerCategory and I understood Scribe's buffer and thread management. I will think about implementing same feature on top of fluentd.

from fluent-plugin-s3.

dieend avatar dieend commented on August 25, 2024

Is there any update for using tags in object key?

from fluent-plugin-s3.

repeatedly avatar repeatedly commented on August 25, 2024

You can use fluent-plugin-forest to realize this goal: https://github.com/tagomoris/fluent-plugin-forest

from fluent-plugin-s3.

prtk-ngm avatar prtk-ngm commented on August 25, 2024

please provide example how we can integarte forest plugin with s3 plugin to give dynaminc tag support in path

from fluent-plugin-s3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.