is it possible to have tags in the object key?

Yep I just want to add ${tag} in object key like this: <div class="snippet-clipboa

Tags in Object Key about fluent-plugin-s3 HOT 21 CLOSED

fluent commented on August 25, 2024

Tags in Object Key

from fluent-plugin-s3.

Comments (21)

repeatedly commented on August 25, 2024

Currently no.
Do you want to specify %{tag} in s3_object_key_format?

from fluent-plugin-s3.

sgessa commented on August 25, 2024

Yes please! How can I achieve this? I just could implement this locally but I just started playing with fluentd and plugins.

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

How to implement?

We can't assume single tag in s3 plugin because event has own tag.
For example:

<match foo.**>
  type s3
  # ...
</match>

In this case, events in s3 plugin may have foo.bar, foo.baz... tags.

from fluent-plugin-s3.

sgessa commented on August 25, 2024

Yep I just want to add ${tag} in object key like this:

<match foo.**>
  type s3
  s3_object_key_format %{time_slice}_${tag}_%{index}.%{file_extension}
   ....
</match>

If event has tag foo.bar for example, I'm expecting to find it in the object key.
Also, if I want to take only "bar", I should be able to call remove_tag_prefix foo.

Thanks

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

Hm.

In your approach, S3 plugin stores multiple objects into S3 at the same time, right?

from fluent-plugin-s3.

sgessa commented on August 25, 2024

Yes. I need the %{tag} because I'm storing access logs grouped by domain and I'm passing the domain name in the tag..

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

Okay. Could you send the pull request?
Maybe error handling and breaking idempotent are important factor.

from fluent-plugin-s3.

sgessa commented on August 25, 2024

I don't know how to implement this, that's why I asked here :(
I started playing with fluentd yesterday :D

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

I see. We need some time if implement.

from fluent-plugin-s3.

dave7373 commented on August 25, 2024

The is another plugin that adds this feature to the s3 plugin. Please check it out here:
https://github.com/campanja/fluent-output-router

from fluent-plugin-s3.

jsermeno commented on August 25, 2024

We just began using fluentd in production. Right now we're using the plugin that dave7373 mentioned to achieve storing logs for each event in a different folder. It's working, although I would like to explore if there is a more efficient way. The fluent-output-router starts a new fluent-plugin-s3 for every event. This creates a lot of threads if you have a lot of events. Is it due to fluentd having a single buffer queue structure that new outputs must be instantiated if you want to have separate chunks for each event?

In your approach, S3 plugin stores multiple objects into S3 at the same time, right?

In the approach you discussed above, did you mean that in the write method you would split the chunk into separate pieces based on tag and then write each sub-chunk to a different S3 file? The only problem I see here is that you may get very small S3 files if an event only occurs a few times within a chunk. Where as if you had a individual chunk for each event this would occur less often. Maybe that is not a problem, and can be mitigated by making the chunk size larger? Perhaps it is also more efficient than creating a new output for each event. Are there downsides to making the chunk size larger? According to the documentation the default chunk size is 8m.

I should also mention that I would love to work on implementation if we come to some agreement on what the best solution is.

Thanks!

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

@jsermeno

We just began using fluentd in production.

Coool 👍

The fluent-output-router starts a new fluent-plugin-s3 for every event

forest and router plugin creates new output when receive the new tag, not every event. So the number of outputs / threads doesn't expload on many cases.

The only problem I see here is that you may get very small S3 files if an event only occurs a few times within a chunk.

Hmm... My concern is error handling.
S3 plugin and forest based tag separation use Fluentd's retry mechanizm when error occurred.

On the other hand, if we supports tag separation in S3 plugin, then we should implement own retry mechanizm which similar to Fluentd.
Because tag separation often executes multiple requests to S3. I already mentioned this point:

"Maybe error handling and breaking idempotent are important factor."

Maintain duplicated retry feature seems high cost and not so many advantages I think.

from fluent-plugin-s3.

jsermeno commented on August 25, 2024

forest and router plugin creates new output when receive the new tag, not every event. So the number of outputs / threads doesn't expload on many cases.

Oops sorry, I did mean new tag.

Hmm... My concern is error handling.
S3 plugin and forest based tag separation use Fluentd's retry mechanizm when error occurred.

I see, do you believe that this optimization would be better suited to become part of fluentd itself? Perhaps there would be a configuration option that limits the number of threads somehow. Scribe for example, has a configuration option to prevent creating a new thread for each category / tag.

Maintain duplicated retry feature seems high cost and not so many advantages I think.

The cost does seem to be becoming larger than I initially thought. There are many advantages though. There are a number of use cases that require a high number of tags. Particularly when handling multiple applications. The number of tags in our case could easily exceed 1000 in the near future and could grow larger. We are already at several hundred. The main benefit I see in storing that many tags in separate folders is if you want to perform analytics on a small subset of events you do not have to open every file to search for the events and potentially speed up queries by quite a bit.

from fluent-plugin-s3.

ryanc4 commented on August 25, 2024

Can we follow the same approach as in this plugin?

https://github.com/fluent/fluent-plugin-mongo/blob/master/lib/fluent/plugin/out_mongo.rb#L93

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

Sorry for the late reply.

@jsermeno

Scribe for example, has a configuration option to prevent creating a new thread for each category / tag.

This is interesting. I will check Scribe source code later.

@ryanc4

Currently no because S3 plugin already use same approach to separate record with event time.
For almost users, forest and S3 plugin is enough.
But above jsermeno case, we need more better performance option.

from fluent-plugin-s3.

ryanc4 commented on August 25, 2024

@repeatedly I am not seeing s3 plugin is using emit to split the tag, I think by allowing splitting using the tag it allow us to do log analysis more quickly in S3 (with EMR)

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

@ryanc4 S3 plugin itself doesn't extend emit but TimeSlicedOutput, super class of S3 plugin, set time sliced string to key in emit. If supports tag included key in S3 plugin, we should extend TimeSlicedOutputs#emit. Maybe forest plugin is now better unless user has special reason.

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

@jsermeno I checked Scribe's newThreadPerCategory and I understood Scribe's buffer and thread management. I will think about implementing same feature on top of fluentd.

from fluent-plugin-s3.

dieend commented on August 25, 2024

Is there any update for using tags in object key?

from fluent-plugin-s3.

repeatedly commented on August 25, 2024

You can use fluent-plugin-forest to realize this goal: https://github.com/tagomoris/fluent-plugin-forest

from fluent-plugin-s3.

prtk-ngm commented on August 25, 2024

please provide example how we can integarte forest plugin with s3 plugin to give dynaminc tag support in path

from fluent-plugin-s3.

Tags in Object Key about fluent-plugin-s3 HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent