Coder Social home page Coder Social logo

Comments (12)

jlewi avatar jlewi commented on August 11, 2024

Lots of workarounds discussed in sirupsen/logrus#63. prometheus implemented this way so we could potentially copy them.

from training-operator.

jlewi avatar jlewi commented on August 11, 2024

I'm adding this to our next milestone because I think structured logging will be critical to scaling. As we scale up to more jobs and larger jobs we will need to be able to easily filter logs by pod, job etc... to get to relevant logs.

from training-operator.

gaocegege avatar gaocegege commented on August 11, 2024

Personally, I recommend glog since most of repos in the Kubernetes community use glog.

from training-operator.

ScorpioCPH avatar ScorpioCPH commented on August 11, 2024

+1 for glog :)

from training-operator.

jlewi avatar jlewi commented on August 11, 2024

If we use glog is there a way to output json logs with metadata such as the job and replica a log message is associated with?

from training-operator.

gaocegege avatar gaocegege commented on August 11, 2024

I am afraid not 🤔 , since there is no function about it in the docs https://godoc.org/github.com/golang/glog

from training-operator.

jlewi avatar jlewi commented on August 11, 2024

With glog how do we make it really easy to filter the TFJob operator logs so we can see log messages for a particular job.

I think this will be super useful for debugging troubleshooting.

If we use structured logging then we can add a tag corresponding to the job name. Then it should be very easy to filter the logs to find all log messages for a particular job.

from training-operator.

jlewi avatar jlewi commented on August 11, 2024

This solution looks promising
sirupsen/logrus#63 (comment)
sirupsen/logrus#63 (comment)

I believe this solution just uses the filename hook
https://github.com/onrik/logrus

I think we can just define a logrus logger with that hook and it will work.
https://github.com/onrik/logrus

Would be great if someone could just try it out using the example here:
https://github.com/onrik/logrus

from training-operator.

ankushagarwal avatar ankushagarwal commented on August 11, 2024

I'm looking into this. I will try the filenameHook from onrik/logrus and post the results here

from training-operator.

gaocegege avatar gaocegege commented on August 11, 2024

Now we use flag package to support command line flags, and glog also uses it by default. Then you can see our binary have more flags than we thing although we use logrus instead of glog:

➜  tf-operator git:(416) ✗ ./tf-operator -h               
Usage of ./tf-operator:
  -alsologtostderr
    	log to standard error as well as files
  -chaos-level int
    	DO NOT USE IN PRODUCTION - level of chaos injected into the TFJob created by the operator. (default -1)
  -controller-config-file string
    	Path to file containing the controller config.
  -gc-interval duration
    	GC interval (default 10m0s)
  -json-log-format
    	Set true to use json style log format. Set false to use plaintext style log format (default true)
  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -log_dir string
    	If non-empty, write log files in this directory
  -logtostderr
    	log to standard error instead of files
  -stderrthreshold value
    	logs at or above this threshold go to stderr
  -v value
    	log level for V logs
  -version
    	Show version and quit
  -vmodule value
    	comma-separated list of pattern=N settings for file-filtered logging

There are some pros and cons:

  • We can support vendor and client's glog, since we have glog's flags
  • But the users may be confused since the tf-operator outputs logs regardless of the flag -logtostderr

from training-operator.

gaocegege avatar gaocegege commented on August 11, 2024

I think we could close the issue after #416 merged. And I will file a new issue for the extra flag problem. But it is now a big problem. We can refer to etcd/etcd-operator.

from training-operator.

gaocegege avatar gaocegege commented on August 11, 2024

xref #424

from training-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.