Coder Social home page Coder Social logo

Comments (16)

antonymilne avatar antonymilne commented on June 2, 2024 3
  • +1 on setting the default level of the Kedro logger to INFO (if just for backwards compatibility), and then having -q set it to WARNING and -qq to ERROR

And -v to DEBUG?

  • I don't think the current logging.yml logic is the only way to achieve that though
  • I'm also ambivalent on whether we should change the global logging level to INFO

Not sure whether @noklam filled you in on the chat we had about #3657 (there's a brief summary there but not all the details). But I think we should definitely not set the global logging level to INFO. This is actually what used to be done on kedro and I think it was wrong - see #1732 for more.

Agree that the current logging.yml isn't the only way to achieve setting kedro logger to INFO. Also discussed in #1732, kedro could just add a Python setLevel call manually rather than configuring logging from a file config.

from kedro.

astrojuanlu avatar astrojuanlu commented on June 2, 2024 2

Quick 2 cents

  • +1 on setting the default level of the Kedro logger to INFO (if just for backwards compatibility), and then having -q set it to WARNING and -qq to ERROR
    • I don't think the current logging.yml logic is the only way to achieve that though
    • I'm also ambivalent on whether we should change the global logging level to INFO
  • -1 on keeping the current logging.yml logic - whoever wants fine grained control of logs, file logging, rotation etc should be using journald, supervisor, Datadog, or whatever other solution. this is not Kedro's responsibility
  • +1 on directing users to current best-practices for logging, including creating official integrations and docs for Sentry, OpenTelemetry

from kedro.

astrojuanlu avatar astrojuanlu commented on June 2, 2024 1

Not sure about this. There's a myriad mechanisms to manage environment variables, for example:

$ cat .rtx.toml
[tools]
python = {version="3.10.11", virtualenv=".venv"}

[env]
GOOGLE_APPLICATION_CREDENTIALS = 'kedro-pypi-stats-dde2342d3c96.json'

I'm not sure what makes .env special. On one hand we could say "this is what we support", but what if folks are using something else? Also, what if there's a clash between Kedro loading .env and then direnv reading .envrc?

I think we should stay away from being too smart about the user environment, and let them manage it.

The core issue is:

With the new kedro new flow, when logging is selected, it is not automatically used until user set KEDRO_LOGGING_CONFIG manually.

which is a real user problem. But maybe we need to explore other solutions.

from kedro.

noklam avatar noklam commented on June 2, 2024 1

Updated the title to focus more on the logging problem.

from kedro.

noklam avatar noklam commented on June 2, 2024 1

@DimedS IIRC it's because of two conflicting features:

  1. ConfigLoader is able to load from conf_source which can be a zip file (default of kedro package)
  2. Logging is decoupled from ConfigLoader

This means that we need to read logging without configloader, and it's very common to put logging.yml in conf folder even it's not used by the config loader. I agree that should be the default and the environment variable should be a workaround, thus I suggest to load this automatically with our best effort.

Cc @merelcht

from kedro.

astrojuanlu avatar astrojuanlu commented on June 2, 2024 1

Some extra reading material ahead of a TD session:

from kedro.

noklam avatar noklam commented on June 2, 2024 1

The solution is basically this: be3d28d in #3577. It should be handles the pattern slightly better, and try to access CONF_SOURCE if possible (may be not).

from kedro.

sheldontsen-qb avatar sheldontsen-qb commented on June 2, 2024 1

(Not adding to the solution but more sharing my view...)

As a user for several years (+5 years) of kedro since the early days before it was even called kedro, I was a bit thrown off that logging was removed by default in 0.19. Personally would still prefer it out of the box enabled by default (instead of having to read the docs to find out how to enable it, I would rather read the docs on how to disable if not needed). Also appreciate my view is one of many out there.

To counter Juan's views:
(counter) +1 on keeping the current logging.yml logic - whoever wants fine grained control of logs, file logging, rotation etc should be using journald, supervisor, Datadog, or whatever other solution. this is not Kedro's responsibility.

This is super helpful to just get people setup with something basic.

(for) +1 on directing users to current best-practices for logging, including creating official integrations and docs for Sentry, OpenTelemetry

More docs on how to integrate is always helpful

Both would be good in my case but it was really nice that logging was configured out of the box for a standard project, just like how we can choose spark to be part of the starter.

Last resort: add documentation on how to configure logging.yaml to replicate old behaviour as we know after_context_created hook would have the caveat of not logging anything before context is created. It's a one-time setup per new project which is fine but right now there's no advice on the best place to achieve this without the caveat above.

Thankssss!

from kedro.

noklam avatar noklam commented on June 2, 2024

Related: #3474

https://blog.bytehackr.in/5-effective-ways-to-prevent-directory-traversal#heading-input-validation-and-sanitization

It seems that having user input for filepath is not the best idea. Besides, do we have a reason to do this? Why would we want user to have absolute freedom to put this logging file, when in reality they should just packaged with conf_source? De-coupling the read of logging file from ConfigLoader doesn't mean that we want user to keep the logging config separately.

from kedro.

noklam avatar noklam commented on June 2, 2024

Supplement on this, this will be a breaking change - because the config will no longer accept a full path but only the name of the files (or only relative to conf_source)

from kedro.

astrojuanlu avatar astrojuanlu commented on June 2, 2024

Another user confused by this https://linen-slack.kedro.org/t/16362029/hi-i-have-recently-updated-the-kedro-version-but-now-i-am-ge#f6a6d4d0-7615-496e-b1a5-18037b944eda

from kedro.

DimedS avatar DimedS commented on June 2, 2024

Why is this environment variable necessary? Can we instead observe the project directory for a 'logging.yml' file and use it if found, or default to standard logging options if not?
From what I understand, the environment variable might also introduce a security issue we're currently unable to address, as outlined here: (#3474)

from kedro.

astrojuanlu avatar astrojuanlu commented on June 2, 2024

I think we should consider @noklam's #3591 seriously, abstain from fiddling with user-specified logging settings (hence avoid #3474 instead of ignoring it), and if the user wants to provide a file-based logging config, they can always do so in their own code and place it wherever they like. It's clear that logging is an ongoing pain (almost 2 years since #1470 was opened), and we haven't yet fully solved it.

from kedro.

noklam avatar noklam commented on June 2, 2024

This issue is partially overlap to #3591

  • Automatic enable logging.yml and kedro run -v is slightly overlap
  • logging.yml affect beyonds kedro run, do we need to have -v for all kedro command?
  • The default of the logging.yml is more independent, as I am challenging the idea of having too many LEVEL default there. This video seems to support that everything should propagate to root unless necessary. Only for rare case you need non-root level logger filtering. (info-handler)

I am particularly interested in your opinion about this @antonymilne.

from kedro.

antonymilne avatar antonymilne commented on June 2, 2024

I'm a bit out of the loop here and will talk to @noklam to share my thoughts on it all in more detail, but some rough points:

  • the current (I think?) environment variable solution was added after a lot of thought and consideration with @idanov. You should definitely pick his brains on the topic too since he will definitely have some useful things to say here
  • the kedro handling of logging is indeed a direct departure from the 12 factor app guidance. I believe this was a conscious decision made in the early days of kedro since users wouldn't know how to e.g. show INFO level messages or direct their logs to a file easily. Hence the logging functionality was built into kedro out of the box for better UX. Possibly expertise of users and logging tools has changed since then, but the handling of logging in kedro has always been a matter of "practicality beats purity". 12FA is useful guidance but not gospel on kedro, e.g. our handling of run environments does exactly what 12FA says not to do
  • personally I don't care much about the file-based logging, but being able to expose INFO level messages by default I think is important. There is an annoying conflict here where the default Python log level of WARNING contradicts with the fact that it's assumed to be good UX to update the user with INFO on progress as their pipeline runs. Of course this assumption could be questioned with some user research since AFAIK we've always just taken it for granted. Remember that kedro pipelines can take a loooong time, and unless there's some other way of demonstrating progress (e.g. a progress bar, which we did also try in a hackathon and I liked) then I feel like kedro should display INFO messages by default. So maybe a better way to handle it would instead be a --quiet option that changes the level from INFO to WARNING?
  • generally agree that kedro should interfere as little as possible with Python's default logging system, e.g. by delegating as much as possible to the root logger. But with the caveat that again this is a question of purity vs. practicality, and Python default's logging level of WARNING doesn't seem suitable on kedro, and the default format of log messages is also pretty gross.

from kedro.

noklam avatar noklam commented on June 2, 2024

I notice the discussion in this thread are mainly related to #3591, though they are related and maybe overlapping.

I have this comment in #3657 #3657 (comment)

Automatic loading logging is fine, but there are edge cases with CONF_SOURCE. I think it's still an improvement and it doesn't make anything worse in the current states.

The idea is almost the same what @DimedS suggested, which we try to locate the logging.yml in the project directory. There are some complication of whether CONF_SOURCE will be available. In any case, having a non-default CONF_SOURCE is not the majority case, having logging.yml not inside the conf folder is even rarer. This is a 90% solution and maybe good enough unless we come up with something better.

from kedro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.