Coder Social home page Coder Social logo

fast model evaluation about mist HOT 9 CLOSED

geoHeil avatar geoHeil commented on June 7, 2024
fast model evaluation

from mist.

Comments (9)

spushkarev avatar spushkarev commented on June 7, 2024

Thanks @geoHeil
We were looking at mleap but we do not feel comfortable to invent yet another model serialization format and re-writing mllib local.
There is a parallel activity in Spark community for separating and re-writing mllib local https://issues.apache.org/jira/browse/SPARK-13944 . So, our current solution should be aligned with these changes.

from mist.

geoHeil avatar geoHeil commented on June 7, 2024

@spushkarev v0.8 seems to support Fast local ML serving (experimental) but it is unclear to me how to configure that.

from mist.

spushkarev avatar spushkarev commented on June 7, 2024

Hi @geoHeil . Yes, we have "silently" released experimental code for 2 ml models. We are working on remaining ml models and documentation. It will be officially released within 2-3 weeks.

from mist.

geoHeil avatar geoHeil commented on June 7, 2024

https://github.com/Hydrospheredata/mist/blob/master/docs/use-cases/ml-realtime.md you mention

We are going to add this abstract entry point in future releases.
could you clarify what you mean with abstract entry point / when this is supposed to happen. Thanks.

Additionally, could you explain what LocalDataColumn is, especially the difference to a normal spark data frame (when considering a local spark context)

Do I understand correctly that MIST will still use a local spark context for its fast model evaluation?

What if a popular algorithm like https://github.com/dmlc/xgboost/tree/master/jvm-packages is used to build the model (relies on JNI and C dependencies) MIST will not "work around it" as PMML would do to increase the speed of scoring.

from mist.

spushkarev avatar spushkarev commented on June 7, 2024

@geoHeil

  1. By "Abstract entry point" I mean that it is easy to write the base Mist job which accepts "model name" or "model UUID" in parameters and serves that model. It might be used for internal/debug use. It's not a priority for the moment for us, but you could try to add it yourself.

  2. LocalDataColumn is just a local in memory data structure, arrays (in opposite to distributed DataFrame). It operates with local transformations while having the same interface as DataFrame.

  3. Local serving do not use SparkContext at all.

  4. I'm not sure I follow your question. If xgboost is trained in Spark and saved into parquet we'll be able to serve it the same as we do it with MLLib using xgboost core. This solution is better in terms of usability (everything in one context, no export/import), predictability (the same code base is used for training and serving) and performance.

from mist.

geoHeil avatar geoHeil commented on June 7, 2024

from mist.

geoHeil avatar geoHeil commented on June 7, 2024

from mist.

spushkarev avatar spushkarev commented on June 7, 2024
  1. Need to take a look deeper into xgboost. In general I think that re-using dependencies is better than re-implementing it and then maintaining compatibility.
  2. LocalData structure supports only minimal set of methods required for Pipeline.transform

from mist.

dos65 avatar dos65 commented on June 7, 2024

We started hydro-serving project to solve ml serving problem

from mist.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.