Coder Social home page Coder Social logo

mljflow.jl's People

Contributors

ablaom avatar pebeto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mljflow.jl's Issues

Improve test of `save`.

Can we make this test a little stronger? For example, make a prediction using the original machine and check it agrees with the prediction of the reconstructed machine.

@test loaded_mach.model isa ProbabilisticPipeline

Can't find my saved machine artifact

This used to work for me but doesn't any longer. What's strange is that tests successfully pass locally for me, and I believe saving artifacts is in the tests.

Following the instructions in the README.md:

using MLJ, MLFlowClient

X, y = make_moons()
model = ConstantClassifier()
mach = machine(model, X, y) |> fit!

logger = MLJFlow.Logger("http://127.0.0.1:5000/api")
run = MLJ.save(logger, mach)

service = MLJFlow.service(logger)

artifacts = MLFlowClient.listartifacts(service, run)
@assert !isempty(artifacts)

# ERROR: AssertionError: !(isempty(artifacts))
# Stacktrace:
#  [1] top-level scope
#    @ REPL[27]:1

# julia> versioninfo()
# Julia Version 1.10.3
# Commit 0b4590a5507 (2024-04-30 10:59 UTC)
# Build Info:
#   Official https://julialang.org/ release
# Platform Info:
#   OS: macOS (x86_64-apple-darwin22.4.0)
#   CPU: 12 ร— Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
#   WORD_SIZE: 64
#   LIBM: libopenlibm
#   LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
# Threads: 12 default, 0 interactive, 6 GC (on 12 virtual cores)
# Environment:
#   JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
#   JULIA_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
#   JULIA_EGLOT_PATH = /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia
#   JULIA_NUM_THREADS = 12
#   JULIA_NIGHTLY_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia

# (jl_I6VBrN) pkg> st
# Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_I6VBrN/Project.toml`
#   [64a0f543] MLFlowClient v0.5.1
#   [add582a8] MLJ v0.20.5

Export names sparingly

https://github.com/pebeto/MLJFlow.jl/blob/70e961e9244645b4dfc52a99eab06aea0852f0b1/src/MLJFlow.jl#L20

Generally, I prefer to export names sparingly as removing them later on is always a breaking change. Typically export a name if the method is going to be used by an ordinary user, otherwise don't.

I think the only method your package needs to export is MLFlowLogger (for constructing loggers), and, if you are providing it, the client(::MLFlowLogger) method that the general user needs to access the client. I'm assuming runs is an internal method, yes?

Do not export save as this is such a common method (provided by serialization packages, for example) that this routinely leads to name conflicts. No MLJ package exports save.

[Tracking] Towards registration

Todo:

  • Add tests for simple pipelines (e.g. , Standarizer |> DecisionTreeClassifer())
  • Add code coverage banner
  • Add keys for CompatHelper
  • Add example to readme
  • #7
  • #8
  • #10

Make artifact retrieval more convenient

I wonder while we think of a remedy for #42, we can think about a better way for MLJ users to retrieve artifacts, without needing familiarity with the MLFlowClient API, and so requiring so many steps. What about something along the lines of a function MLJFlow.artifact(logger, run) to enable this workflow:

run = MLJ.save(logger, mach)
restored_mach = MLJFlow.artifact(logger, run)

Also, what if the user failed to record run? Is there no way to retrieve the artifact? Can artifacts be automatically tagged in some way that provides a user-handle for later retrieval, without knowing actual run object? So, instead of run above, we can subsitute the string handle?

Add accessor method to extract the "client" from an `MLFlowLogger object`

As discussed on a call.

I know we have been calling this client but I wonder if this is really the correct word, since it is on a server most (all?) of the time. I see that MLFlowClient.jl documentation calls an instance of MLFlow an "MLFlow API service", so maybe "service" is a better work for this.

Improve local testability

Running Pkg.test("MLJFlow") locally requires that an MLflow service is already running on your machine and that the uri "http://localhost:5000" will work to connect to it. On my mac, that uri will not work and I must manually edit it to be "http://127.0.1:5000", which is a pain.

Here's one suggestion: To run tests one must set a local env variable "MLFLOW_URI" to the uri of an active MLflow service. If the env is empty a helpful warning explaining what to do is thrown.

@pebeto Do you have any other suggestions?

Rename `MLFlowLogger` ?

I think having MLJFlow, MLFlowLogger, etc, - sometimes there is a "j" and sometimes not - is bit of cognitive burden. What if we rename MLFlowLogger to Logger? So, in MLJ the user does

logger = Logger(...)

In the rare case there are multiple logging platforms at play, the user can resolve the ambiguity with MLJFlow.Logger. Or we could just not export the name and have the user do logger = MLJFlow.Logger(...). Ie., the user does

logger = MLJFlow.Logger()

Of course this is a breaking change but I think that's fine at this early stage of development.

@pebeto What do you think?

Need handling for models with zero hyperparameters

ConstantClassifier is a model with no hyperparameters. If I change ConstantClassifer below to DecisionTreeClassifier, for example, then no error is thrown.

using MLJ
using .Threads
nthreads()
# 5

logger = MLFlowLogger("http://127.0.0.1:5000", experiment_name="rooster")
X, y = make_moons()
model = ConstantClassifier()
#model = (@load RandomForestClassifier pkg=DecisionTree)()

evaluate(
    model,
    X,
    y;
    logger,
)

# ERROR: HTTP.Exceptions.StatusError(400, "POST", "/api/2.0/mlflow/runs/log-parameter", HTTP.Messages.Response:
# """
# HTTP/1.1 400 Bad Request
# Server: gunicorn
# Date: Mon, 11 Sep 2023 19:09:40 GMT
# Connection: close
# Content-Type: application/json
# Content-Length: 163

# {"error_code": "INVALID_PARAMETER_VALUE", "message": "Missing value for required parameter 'key'. See the API docs for more information about request parameters."}""")
# Stacktrace:
#   [1] mlfpost(mlf::MLFlowClient.MLFlow, endpoint::String; kwargs::Base.Pairs{Symbol, String, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:run_id, :key, :value), Tuple{String, String, String}}})
#     @ MLFlowClient ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:74
#   [2] mlfpost
#     @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:66 [inlined]
#   [3] logparam(mlf::MLFlowClient.MLFlow, run_id::String, key::Symbol, value::ConstantClassifier)       

Improve `show` for logger instances

I think it would be better if a logger instance displays as you would construct it. That way the user can easily guess how to add keywords without looking up the docstring.

So, instead of

julia> MLFlowLogger("http://127.0.0.1:5000")
MLFlowLogger(MLFlow(
    baseuri = "http://127.0.0.1:5000", 
    apiversion = 2.0
), 1, "MLJ experiment", nothing)

we arrange

julia> MLFlowLogger("http://127.0.0.1:5000")
MLFlowLogger("http://127.0.0.1:5000",
    experiment_name="MLJ experiment",
    artifact_location=nothing,
)

This skips display of the apiversion, but I don't think that's a big deal. But we could do:

julia> MLFlowLogger("http://127.0.0.1:5000")
MLFlowLogger("http://127.0.0.1:5000",
    experiment_name="MLJ experiment",
    artifact_location=nothing,
) using MLflow version 2.0

@pebeto What do you think?

Add a `verbosity` field to the `MLFlowLogger` wrapper

defaulting to 1.

I think we will want this as the user interface point for specifying how much to log in tuning and iteration. For example, if verbosity < 1 then only log the performance evaluation for the best model.

Include type of `resampling` in the run tag

As discussed on a call.

Will require us to add resampling to the PerformanceEvaluation objects in MLJBase, as we did the model, to make them available to our overloading of load_evaluation.

Duplicate run names intentional?

When I tried out the example and repeated the evaluation to generate new runs, they all had the same name. Is that intentional?

Screen Shot 2023-08-17 at 3 11 59 PM

`mlflow` health endpoint is doutbful

mlflow local instances provide /health and /ping endpoints to check if it is running. Platforms like DagsHub don't have it (well, as far as I have been able to test), making users being unable to use this package. Should we remove that check?

Recheck readme example after MLJ.jl update

In the pending PR #10, the example does using MLJBase, MLJModels, and so forth. After we rollout the new version of MLJ that will include using MLJFlow we can simplify the example to do using MLJ and so forth.

Method ambiguity (due to type piracy) in extension of `MLJBase.save`

using MLJ
model = ConstantClassifier()
tmodel = TunedModel(models=[model, model])
mach = machine(tmodel, (@load_iris)...) |> fit!
MLJ.save(IOBuffer(), mach)

# ERROR: MethodError: save(::MLJTuning.ProbabilisticTunedModel{Explicit, Probabilistic}, ::Machine{ConstantClassifier, true}) is ambiguous.

# Candidates:
#   save(tmodel::Union{MLJTuning.DeterministicTunedModel{T, M}, MLJTuning.ProbabilisticTunedModel{T, M}} where {T, M}, fitresult)
#     @ MLJTuning ~/.julia/packages/MLJTuning/CLXum/src/tuned_models.jl:832
#   save(logger, machine::Machine)
#     @ MLJFlow ~/.julia/packages/MLJFlow/AAdc1/src/base.jl:22

# Possible fix, define
#   save(::Union{MLJTuning.DeterministicTunedModel{T, M}, MLJTuning.ProbabilisticTunedModel{T, M}} where {T, M}, ::Machine)

Clean up doc string

https://github.com/pebeto/MLJFlow.jl/blob/70e961e9244645b4dfc52a99eab06aea0852f0b1/src/types.jl#L9

The first part of the docstring should address the casual user. So your "Fields" section should not mention client at all (this is part of the implementation, not the user interface) and somewhere you should explain the meaning of the baseuri argument.

(Implementation details generally go in code comments, unless you are providing a method that is part of a public interface, such as our log_evaluation method, in which case put that at the end, possibly under a separate "New implementations" section.)

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.