Coder Social home page Coder Social logo

how to save the model about autots HOT 9 CLOSED

winedarksea avatar winedarksea commented on July 19, 2024
how to save the model

from autots.

Comments (9)

asgeorges avatar asgeorges commented on July 19, 2024 1

Gotcha, all the above makes sense. Thanks again! Might check in again with you a bit later when I've been able to play around more :)

from autots.

winedarksea avatar winedarksea commented on July 19, 2024

@catchlui
Most models should be retrained when re-run in the future. This is because most time series datasets are continually evolving - this isn't image classification where a cat is still a cat. You could pickle the model, but only if you plan to use it to generate forecasts for the exact same time period.
My usual plan is to export the top dozen or so models, then each new forecast time, run a much smaller number of generations, this way making sure the model used is always the best fit for whatever the data currently looks like. This is especially true for the use of seasonal cross validation.

You can import and export results like so:

# train your model then:
# set n=1 if you only want your best model
model.export_template("my_models.csv", models='best', n=15, max_per_model_class=3)

# later on a new session
# you can set `max_generations=0` in model, and then it will only attempt the imported models
model = model.import_template("my_models.csv", method='only')

from autots.

catchlui avatar catchlui commented on July 19, 2024

best fit for whatever the data currently looks like. T

Thanks a lot ..
so this is the sequence ..
initialize the autots
fit the data to the model..
export the best models..
then intialize the autots with max_generation=0
then fit with the new data...
forecast ??
It would be good if you can put a code snippet ..that will be very helpful...

from autots.

winedarksea avatar winedarksea commented on July 19, 2024

I have plans to build a production code example with non-proprietary data at some point soon, see #45.
Yes, your sequence sounds correct: fit, export best .... import best, run with 0 generations (or more than 0 is fine, if you want active learning) and output the prediction/forecast.
@catchlui

from autots.

asgeorges avatar asgeorges commented on July 19, 2024

Hey! I also have the same question.

Above points is understood about the online training aspects of time series.

A pickled model can still be useful though - for instance, training a model one system1 and inferencing on system2 (this is my current use case). System2 is specialized & can really just inference.

It would be awesome if you can still provide guidance on pickling here.

from autots.

winedarksea avatar winedarksea commented on July 19, 2024

the .fit(result_file) only saves the model result values, not the actual models. In case for some reason it crashes or you want to restart your computer part way through, it can be reloaded.

I understand your concern about inference, probably on an edge device. For perspective, I have run a full .fit() on a Raspberry Pi 3 with 1 GB of RAM, and it worked fine, although only with the less memory intensive models. .fit for most models can be run just fine for most models on tiny devices. I haven't tried running any of these on a microcontroller, but I've got a Pi Pico and an ESP32 here if that's your use case, I'd be happy to test on that.

The exceptions are GluonTS and the ~Regression models (which sometimes use Tensorflow). I have a Coral Edge board that can only inference Tensorflow, and can't train. Best just to use one of the excellent non-neural network models. The Edge board will run Numpy and pandas just fine. But here's the thing, the Neural Network models are.... not that great. At best they equal the other models but ten times slower.

The ultimate problem is that most time series models here (from other packages like Statsmodels as well as some I have written) simply can't be picked and refreshed on new data. They will only forecast jumping off from the most recent data point given in training. Which makes it impossible to write a generalizable pickable API. I could do so for a small subset of models, but that is a niche and not much use. And it stands that it is always best to retrain on fresh data for time series, because the markets and world are always changing and models drift to unusable very quickly - unlike image recognition or something where a picture of cat is always just a cat.

Long response, sorry. Happy to try to make something work but I feel that the current api style is the best for time series and other approaches will just lead to problems.

from autots.

winedarksea avatar winedarksea commented on July 19, 2024

I should add, have you seen the model_forecast function? It's in the extended_tutorial under Running Just One Model.
Basically, you do the AutoTS.fit() on a more powerful device. Then you take the model parameters and run model_forecast. Super fast and lightweight for all but the neural nets. @asgeorges

from autots.

asgeorges avatar asgeorges commented on July 19, 2024

Hey! Thanks for such a comprehensive answer!

I'm looking to deploy this model into production - not an edge device. I'm not a pro in deploying to prod, but I'd imagine the above solution you listed above won't suffice (a lot more moving parts than a static binarized file). My approach would be to:

  1. Batch train (for instance every day)
  2. Re-pickle model
  3. Run QA/sanity checks on dev system
  4. If model passes, send pickle to prod system
  5. Run online inference using pickle for 1 day

Above is how I"m currently thinking about it, but I'll think a bit more on whether your above approach can be used instead.

Sidenotes

  • I'm a big fan of this project...it's pretty awesome
  • I've found a few nuggets in the code :)

from autots.

winedarksea avatar winedarksea commented on July 19, 2024

I should be clear, the .predict() is entirely determined by the best_model params and internally calls the model_forecast function. Using a wide style dataframe, your best_model params, and any other keyword args, you can exactly duplicate the AutoTS.predict() with the model_forecast function and save space and time. You don't need to pickle the entire AutoTS object. That will be potentially massive because the AutoTS class includes in it an entire copy of the original dataset, among other things.

Here's a simplified approach that should do the same as the above.

  1. Batch train (on dev or otherwise) to select your best_model params (model_name, model_params, transformation_params). Make sure there's plenty of validations so that the chosen model looks good for the entire year.
  2. Setup a production script with model_forecast and drop in your best model params. You could pickle the best model params but a simple plain text json file will do just fine for those.
  3. Include sanity checks on inputs and outputs. Be aware there are constraints (about to get a major update in 0.4.1) and no_negatives that can help enforce expected forecasts. I would say the top sanity check to perform for inputs is that there aren't a bunch of missing data, or massive shifts (like a definition change) in the most recent data. You could use something like the Great Expectations package but usually I do something much simpler.
  4. Whenever you feel a refresh is needed, manually train in dev and then update the model parameters in prod when ready.

Otherwise I personally use a slightly variation on the production_example.py for my own production code. It's a different philosophy (requiring more compute) but works for me.

Glad you're enjoying the easter eggs. Please post feedback of anything you find annoying or difficult!

from autots.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.