Comments (9)
Thanks @geoHeil
We were looking at mleap but we do not feel comfortable to invent yet another model serialization format and re-writing mllib local.
There is a parallel activity in Spark community for separating and re-writing mllib local https://issues.apache.org/jira/browse/SPARK-13944 . So, our current solution should be aligned with these changes.
from mist.
@spushkarev v0.8
seems to support Fast local ML serving (experimental)
but it is unclear to me how to configure that.
from mist.
Hi @geoHeil . Yes, we have "silently" released experimental code for 2 ml models. We are working on remaining ml models and documentation. It will be officially released within 2-3 weeks.
from mist.
https://github.com/Hydrospheredata/mist/blob/master/docs/use-cases/ml-realtime.md you mention
We are going to add this abstract entry point in future releases.
could you clarify what you mean with abstract entry point / when this is supposed to happen. Thanks.
Additionally, could you explain what LocalDataColumn is, especially the difference to a normal spark data frame (when considering a local spark context)
Do I understand correctly that MIST will still use a local spark context for its fast model evaluation?
What if a popular algorithm like https://github.com/dmlc/xgboost/tree/master/jvm-packages is used to build the model (relies on JNI and C dependencies) MIST will not "work around it" as PMML would do to increase the speed of scoring.
from mist.
-
By "Abstract entry point" I mean that it is easy to write the base Mist job which accepts "model name" or "model UUID" in parameters and serves that model. It might be used for internal/debug use. It's not a priority for the moment for us, but you could try to add it yourself.
-
LocalDataColumn is just a local in memory data structure, arrays (in opposite to distributed DataFrame). It operates with local transformations while having the same interface as DataFrame.
-
Local serving do not use SparkContext at all.
-
I'm not sure I follow your question. If xgboost is trained in Spark and saved into parquet we'll be able to serve it the same as we do it with MLLib using xgboost core. This solution is better in terms of usability (everything in one context, no export/import), predictability (the same code base is used for training and serving) and performance.
from mist.
from mist.
from mist.
- Need to take a look deeper into xgboost. In general I think that re-using dependencies is better than re-implementing it and then maintaining compatibility.
- LocalData structure supports only minimal set of methods required for Pipeline.transform
from mist.
We started hydro-serving project to solve ml serving problem
from mist.
Related Issues (20)
- HTTP API - Validate artifact file extension HOT 3
- mistlibpy - Row from SqlContext doesn't have method `asDict`
- Python - mistpy: BadParameterException HOT 3
- Facing error Couldn't find JsEncoder instance for Map[String,Any] HOT 2
- Required support for database other than H2 HOT 1
- Job cancellation returns 400 Bad Request in async mode
- Delete context by Id is not working HOT 4
- ContextFrontend: Ask worker connection for context failed HOT 1
- ERROR FunctionInfoProvider failed HOT 3
- 2.12 support - docs
- Strange debug-like code at Json4sConversion HOT 1
- PySpark - starting from Spark 2.4.1 python jobs don't work
- Mist : Unsupported major.minor version 52.0 HOT 3
- Starting child for FunctionInfoProvider failed HOT 6
- Unable to download mist-cli
- Run parallel jobs on-prem dynamic spark clusters
- How to Integrate Mist API with AWS EMR? HOT 2
- Support k8s helm HOT 1
- Is Mist deprecated/abandoned HOT 3
- Long running spark jobs when cancelled from mist ui continue to stay in cancelling state and eventually fail with something went wrong error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mist.