Coder Social home page Coder Social logo

jpmml-h2o's Introduction

JPMML-H2O Build Status

Java library and command-line application for converting H2O.ai models to PMML.

Features

Supported MOJO types

Prerequisites

  • H2O.ai 3.34(.0.1) or newer
  • Java 1.8 or newer

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces a library JAR file pmml-h2o/target/pmml-h2o-1.2-SNAPSHOT.jar, and an executable uber-JAR file pmml-h2o-example/target/pmml-h2o-example-executable-1.2-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use H2O.ai to train a model.
  2. Download the model in Model ObJect, Optimized (MOJO) data format to a file in a local filesystem.
  3. Use the JPMML-H2O command-line converter application to turn the MOJO file to a PMML file.

The H2O.ai side of operations

Using the h2o package to train a regression model for the example Boston housing dataset:

from h2o import H2OFrame
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
from pandas import DataFrame, Series
from sklearn.datasets import load_boston

import h2o
import pandas

boston = load_boston()

df = pandas.concat((DataFrame(data = boston.data, columns = boston.feature_names), Series(boston.target, name = "MEDV")), axis = 1)

h2o.connect()

df = H2OFrame(df)

glm = H2OGeneralizedLinearEstimator(family = "gaussian")
glm.train(boston.feature_names.tolist(), "MEDV", df)

glm.download_mojo(path = "mojo.zip")

The Java side of operations

Converting the MOJO file mojo.zip to a PMML file mojo.pmml:

java -jar pmml-h2o-example/target/pmml-h2o-example-executable-1.2-SNAPSHOT.jar --mojo-input mojo.zip --pmml-output mojo.pmml

Getting help:

java -jar pmml-h2o-example/target/pmml-h2o-example-executable-1.2-SNAPSHOT.jar --help

Documentation

License

JPMML-H2O is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-H2O in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-H2O available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-H2O is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

jpmml-h2o's People

Contributors

vruusmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jpmml-h2o's Issues

Capturing variable importances

The MojoModel#_modelDescriptor field is a model descriptor. Among other things, it exposes variable importances information via the ModelDescriptor#variableImportances() method.

An user has requested that varimp information should be collected and stored in the generated PMML document.

Missing number of values in the array

Hi,
I tried to convert an H20 random forest model. If the variable uses a IsIn operator, i'm expecting to get a simple set predicate similar to the figure below. It should contain the number of values (n=?) in the array and quotes between the values.

image

However, i'm getting these results (no n=? option and no quotes).
image

Please let me know your thoughts on how to fix this issue.
Let me know if you need additional info.
Thank you

Throws error when I change the max_depth > 5 and ntrees = 100 for a GBM

When I change the max_depth for a GBM model in H2O, export the mojo, and try to convert it with the tool, it throws the following error:

Exception in thread "main" java.lang.IllegalArgumentException
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:87)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeTreeModel(SharedTreeMojoModelConverter.java:68)
        at org.jpmml.h2o.GbmMojoModelConverter.lambda$encodeModel$0(GbmMojoModelConverter.java:73)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:74)
        at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:48)
        at org.jpmml.h2o.Converter.encodePMML(Converter.java:71)
        at org.jpmml.h2o.Main.run(Main.java:107)
        at org.jpmml.h2o.Main.main(Main.java:88)

The error happens for when I change the max_depth for any value above 5 with ntrees = 100

image

H2O version 3.19.0.4274

release 1.0.10 is not compatible with h2o-3.28.x

release 1.0.10 is not compatible with h2o-3.28.x. When doing PMML conveting, generate error message like

Exception in thread"main" ... MOJO version imcompatibility - the model MOJO version (1.10) is higher than the current h2o version (1.00) supports ...
    hex.genmodel.ModelMojoReader.checkMaxSupportedMojoVersion(ModelMojoReader.java:296)
    ...
    at org.jpmml.h2o.Main.main(Main.java:88)

rebuild with updated h2o dependency version does not solve the issue.

Output PMML file too big in size

Hi @vruusmann,
I've converted an h2o DRF model (9 MB) but the output PMML is 200 MB, occupying almost 1 GB on RAM.
There seems to be no compact option to reduce size.

Do you have any idea what can be done?
Thanks.

Throws and error when converting a poisson GBM to PMML

When trying to convert a poisson GBM built in H2O and exported as a MOJO to PMML format, I get the following error:
jpmml-h2o

I managed to solve this error by adding the following piece of code to the "\jpmml-h2o-master\src\main\java\org\jpmml\h2o\GbmMojoModelConverter.java" file:

if((DistributionFamily.poisson).equals(model._family)){
	ContinuousLabel continuousLabel = (ContinuousLabel)label;

	MiningModel miningModel = new MiningModel(MiningFunction.REGRESSION, ModelUtil.createMiningSchema(continuousLabel))
		.setSegmentation(MiningModelUtil.createSegmentation(MultipleModelMethod.SUM, treeModels))
		.setTargets(ModelUtil.createRescaleTargets(null, (double)model._init_f, continuousLabel));

	return miningModel;
} else

Once I added this piece of code (by copying the gaussian section and changing gaussian to poisson) the converter worked successfully, however, I'd like you to confirm whether this is okay and whether it can perhaps be added to the tool?

Thanks for your time.

Regards,
Paulo

Support of categorical variables

My model is a H2O Gradient Boosting Machine Learner trained using the Knime IDE with the H2O Machine Learning extension. I save it to a MOJO file and then use your project following all the steps. The model uses categorical and numerical variables, the categorical are in String format and I use the Domain Calculator of Knime to treat them as categorical.

I'm getting this error stack:

Exception in thread "main" java.lang.IllegalArgumentException: Field nivel_1 has data type string
        at org.jpmml.converter.PMMLEncoder.toContinuous(PMMLEncoder.java:209)
        at org.jpmml.converter.CategoricalFeature.toContinuousFeature(CategoricalFeature.java:56)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:195)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:257)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:257)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:257)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeTreeModel(SharedTreeMojoModelConverter.java:98)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.lambda$encodeTreeModels$0(SharedTreeMojoModelConverter.java:74)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeTreeModels(SharedTreeMojoModelConverter.java:75)
        at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:58)
        at org.jpmml.h2o.Converter.encodePMML(Converter.java:87)
        at org.jpmml.h2o.example.Main.run(Main.java:123)
        at org.jpmml.h2o.example.Main.main(Main.java:93)

Your project works with categorical variables stored as Strings? If not, what should be the type for categorical variables?

Thanks in advance!
Julio Paciello
from Paraguay

Downgrading PMML 4.4 to PMML 4.2

Hi there,

I have a GBM model and converted it to a PMML using the jpmml-h2o library. The little problem with this is that I was trying to import this model file into a software that only accepts PMML 4.2 versions or lower. I was wondering if there is a way to downgrade PMML version.

Regards.
Valentina

Score mismatch between PMML file and H2O Mojo prediction

Hi,
I am seeing inconsistency between the PMML file predicted score and H2O Mojo file predicted score for exactly same feature map. I am using flow UI to get predicted score from H2O mojo file, and the difference in score I am seeing is large, for example, (0.005 with H2O v/s 0.90 with Pmml) for exact same feature values. could you please help here ?

openscoring can't read output .pmml

I am testing out this tool and openscoring for some future actual use case.
i was able to get the openscoring working using the pmml that came with the repos.
but when i create a pmml from h2o using jpmml-h2o, the output pmml can't be read by openscoring i can't figure out why. Any help pointing the a solution would be appreciated.
Thank you!
Warning: Couldn't read data from file "xgboost_test1.pmml", this makes an empty
Warning: POST.

{ "message" : "Bad Request" }

Detect feature promotion from (high cardinality-) categorical to pseudo-numeric

Hi Villu,

Thanks for your assistance with the previous issue that I raised, it's greatly appreciated!

I have however stumbled across a new issue and was wondering whether you could perhaps take a look at it? I'm getting the following error when trying to convert a Tweedie GBM to PMML:

image

It seems like one of the inputs to the model (MAKE) is causing an issue, however, the same input was used for the Poisson model that I referred to you previously and there were no issues with it one you made allowance for Poisson models in your code.

Your assistance with this would be greatly appreciated.

Thanks for your time.

Regards,
Paulo

Support for `quantile` distribution in GBM

INFO 2021-10-29 16:26:52 [main] org.jpmml.h2o.Main [TID: N/A]- Loading MOJO..
INFO 2021-10-29 16:26:52 [main] org.jpmml.h2o.Main [TID: N/A]- Loaded MOJO in 305 ms.
INFO 2021-10-29 16:26:52 [main] org.jpmml.h2o.Main [TID: N/A]- Converting MOJO to PMML..
ERROR 2021-10-29 16:26:53 [main] org.jpmml.h2o.Main [TID: N/A]- Failed to convert MOJO to PMML
java.lang.IllegalArgumentException: Distribution family quantile is not supported
at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:111)
at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:45)
at org.jpmml.h2o.Converter.encodePMML(Converter.java:88)
at org.jpmml.h2o.Main.run(Main.java:120)
at org.jpmml.h2o.Main.main(Main.java:90)

Handling of missing values

I don't see support for missing values in mojo-pmml conversion.
While executing the pmml using jpmml-evaluator, the following error is thrown:
"Exception in thread "main" org.jpmml.evaluator.InvalidResultException : Field "X" cannot accept input value NaN".

This error doesn't appear for the jpmml-lightgbm package, since missing values are defined as 'NaN' in the pmml file.
Could the missing value handling be added to this jpmml-h2o package?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.