jpmml / jpmml-evaluator Goto Github PK

View Code? Open in Web Editor NEW

887.0 61.0 256.0 7.02 MB

Java Evaluator API for PMML

License: GNU Affero General Public License v3.0

Java 100.00%

jpmml-evaluator's Introduction

JPMML-Evaluator

Java Evaluator API for Predictive Model Markup Language (PMML).

Features
Prerequisites
Installation
API
Basic usage
Advanced usage
Example applications
Documentation
Support
License
Additional information

Features

JPMML-Evaluator is de facto the reference implementation of the PMML specification versions 3.0, 3.1, 3.2, 4.0, 4.1, 4.2, 4.3 and 4.4 for the Java/JVM platform:

Pre-processing of input fields according to the DataDictionary and MiningSchema elements:
- Complete data type system.
- Complete operational type system.
- Treatment of outlier, missing and/or invalid values.
Model evaluation:
Post-processing of target fields according to the Targets element:
- Rescaling and/or casting regression results.
- Replacing a missing regression result with the default value.
- Replacing a missing classification result with the map of prior probabilities.
Calculation of auxiliary output fields according to the Output element:
- Over 20 different result feature types.
Model verification according to the ModelVerification element.
Vendor extensions:
- Memory and security sandboxing.
- Java-backed model, expression and predicate types - integrate any 3rd party Java library into PMML data flow.
- MathML prediction reports.

For more information please see the features.md file.

JPMML-Evaluator is interoperable with most popular statistics and data mining software:

R and Rattle:
- JPMML-R library.
- r2pmml package.
- pmml and pmmlTransformations packages.
Python and Scikit-Learn:
- JPMML-SkLearn library.
- sklearn2pmml package.
Apache Spark:
- JPMML-SparkML library.
- pyspark2pmml and sparklyr2pmml packages.
- mllib.pmml.PMMLExportable interface.
H2O.ai:
- JPMML-H2O library.
XGBoost:
- JPMML-XGBoost library.
LightGBM:
- JPMML-LightGBM library.
TensorFlow:
- JPMML-TensorFlow library.
KNIME
RapidMiner
SAS
SPSS

JPMML-Evaluator is fast and memory efficient. It can deliver one million scorings per second already on a desktop computer.

Prerequisites

Java Platform, Standard Edition 8 or newer.

Installation

JPMML-Evaluator library JAR files (together with accompanying Java source and Javadocs JAR files) are released via Maven Central Repository.

The current version is 1.6.5 (7 July, 2024).

The main component of JPMML-Evaluator is org.jpmml:pmml-evaluator. However, in most application scenarios, this component is not included directly, but via a data format-specific runtime component(s) org.jpmml:pmml-evaluator-${runtime} that handle the loading and storage of PMML class model objects.

The recommended data format for PMML documents is XML, and the recommended implementation is Jakarta XML Binding via the Glassfish Metro JAXB runtime:

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator-metro</artifactId>
	<version>1.6.5</version>
</dependency>

Available components:

Component	Data format(s)
`org.jpmml:pmml-evaluator`	Java serialization
`org.jpmml:pmml-evaluator-jackson`	JSON, YAML, TOML etc. via the FasterXML Jackson suite
`org.jpmml:pmml-evaluator-kryo`	Kryo serialization
`org.jpmml:pmml-evaluator-metro`	XML via the GlassFish Metro JAXB runtime
`org.jpmml:pmml-evaluator-moxy`	JSON and XML via the EclipseLink MOXy JAXB runtime

API

Core types:

Interface org.jpmml.evaluator.EvaluatorBuilder
- Class org.jpmml.evaluator.ModelEvaluatorBuilder - Builds a ModelEvaluator instance based on an org.dmg.pmml.PMML instance
  - Class org.jpmml.evaluator.LoadingModelEvaluatorBuilder - Builds a ModelEvaluator instance from a PMML byte stream or a PMML file
  - Class org.jpmml.evaluator.ServiceLoadingModelEvaluatorBuilder - Builds a ModelEvaluator instance from a PMML service provider JAR file
Interface org.jpmml.evaluator.Evaluator
- Abstract class org.jpmml.evaluator.ModelEvaluator - Implements model evaluator functionality based on an org.dmg.pmml.Model instance
  - Classes org.jpmml.evaluator.<Model>Evaluator (GeneralRegressionModelEvaluator, MiningModelEvaluator, NeuralNetworkEvaluator, RegressionEvaluator, TreeModelEvaluator, SupportVectorMachineEvaluator etc.)
Abstract class org.jpmml.evaluator.ModelField
- Abstract class org.jpmml.evaluator.InputField - Describes a model input field
- Abstract class org.jpmml.evaluator.ResultField
  - Class org.jpmml.evaluator.TargetField - Describes a primary model result field
  - Class org.jpmml.evaluator.OutputField - Describes a secondary model result field
Abstract class org.jpmml.evaluator.FieldValue
- Class org.jpmml.evaluator.CollectionValue
- Abstract class org.jpmml.evaluator.ScalarValue
  - Class org.jpmml.evaluator.ContinuousValue
  - Abstract class org.jpmml.evaluator.DiscreteValue
    - Class org.jpmml.evaluator.CategoricalValue
    - Class org.jpmml.evaluator.OrdinalValue
Utility class org.jpmml.evaluator.EvaluatorUtil
Utility class org.jpmml.evaluator.FieldValueUtil

Core methods:

EvaluatorBuilder
- #build()
Evaluator
- #verify()
- #getInputFields()
- #getTargetFields()
- #getOutputFields()
- #evaluate(Map<String, ?>)
InputField
- #prepare(Object)

Target value types:

Interface org.jpmml.evaluator.Computable
- Abstract class org.jpmml.evaluator.AbstractComputable
  - Class org.jpmml.evaluator.Classification
  - Class org.jpmml.evaluator.Regression
  - Class org.jpmml.evaluator.Vote
Interface org.jpmml.evaluator.ResultFeature
- Interface org.jpmml.evaluator.HasCategoricalResult
  - Interface org.jpmml.evaluator.HasAffinity
    - Interface org.jpmml.evaluator.HasAffinityRanking
  - Interface org.jpmml.evaluator.HasConfidence
  - Interface org.jpmml.evaluator.HasProbability
- Interface org.jpmml.evaluator.HasDisplayValue
- Interface org.jpmml.evaluator.HasEntityId
  - Interface org.jpmml.evaluator.HasEntityAffinity
  - Interface org.jpmml.evaluator.HasEntityIdRanking
- Interface org.jpmml.evaluator.HasPrediction
- Interface org.jpmml.evaluator.HasReasonCodeRanking
- Interface org.jpmml.evaluator.HasRuleValues
- Interface org.jpmml.evaluator.mining.HasSegmentResults
- Interface org.jpmml.evaluator.scorecard.HasPartialScores
- Interface org.jpmml.evaluator.tree.HasDecisionPath
Abstract class org.jpmml.evaluator.Report
Utility class org.jpmml.evaluator.ReportUtil

Target value methods:

Computable
- #getResult()
HasProbability
- #getProbability(String)
- #getProbabilityReport(String)
HasPrediction
- #getPrediction()
- #getPredictionReport()

Exception types:

Abstract class org.jpmml.model.PMMLException
- Abstract class org.jpmml.model.MarkupException
  - Abstract class org.jpmml.model.InvalidMarkupException
  - Abstract class org.jpmml.model.MissingMarkupException
  - Abstract class org.jpmml.model.UnsupportedMarkupException
- Abstract class org.jpmml.evaluator.EvaluationException

Basic usage

// Building a model evaluator from a PMML file
Evaluator evaluator = new LoadingModelEvaluatorBuilder()
	.load(new File("model.pmml"))
	.build();

// Perforing the self-check
evaluator.verify();

// Printing input (x1, x2, .., xn) fields
List<InputField> inputFields = evaluator.getInputFields();
System.out.println("Input fields: " + inputFields);

// Printing primary result (y) field(s)
List<TargetField> targetFields = evaluator.getTargetFields();
System.out.println("Target field(s): " + targetFields);

// Printing secondary result (eg. probability(y), decision(y)) fields
List<OutputField> outputFields = evaluator.getOutputFields();
System.out.println("Output fields: " + outputFields);

// Iterating through columnar data (eg. a CSV file, an SQL result set)
while(true){
	// Reading a record from the data source
	Map<String, ?> arguments = readRecord();
	if(arguments == null){
		break;
	}

	// Evaluating the model
	Map<String, ?> results = evaluator.evaluate(arguments);

	// Decoupling results from the JPMML-Evaluator runtime environment
	results = EvaluatorUtil.decodeAll(results);

	// Writing a record to the data sink
	writeRecord(results);
}

// Making the model evaluator eligible for garbage collection
evaluator = null;

Advanced usage

Loading models

The PMML standard defines large number of model types. The evaluation logic for each model type is encapsulated into a corresponding ModelEvaluator subclass.

Even though ModelEvaluator subclasses can be instantiated and configured directly, the recommended approach is to follow the Builder design pattern as implemented by the ModelEvaluatorBuilder builder class.

A model evaluator builder provides configuration and loading services.

The default configuration corresponds to most common needs. It can be overriden to customize the behaviour of model evaluators for more specific needs. A model evaluator is given a copy of the configuration that was effective when the ModelEvaluatorBuilder#build() method was invoked. It is not affected by later configuration changes.

For example, creating two differently configured model evaluators from a PMML instance:

import org.jpmml.evaluator.reporting.ReportingValueFactoryFactory

PMML pmml = ...;

ModelEvaluatorBuilder modelEvaluatorBuilder = new ModelEvaluatorBuilder(pmml);

Evaluator evaluator = modelEvaluatorBuilder.build();

// Activate the generation of MathML prediction reports
modelEvaluatorBuilder.setValueFactoryFactory(ReportingValueFactoryFactory.newInstance());

Evaluator reportingEvaluator = modelEvaluatorBuilder.build();

Configurations and model evaluators are fairly lightweight, which makes them cheap to create and destroy. However, for maximum performance, it is advisable to maintain a one-to-one mapping between PMML, ModelEvaluatorBuilder and ModelEvaluator instances (ie. an application should load a PMML byte stream or file exactly once, and then maintain and reuse the resulting model evaluator as long as needed).

Some ModelEvaluator subclasses contain static caches that are lazily populated on a PMML instance basis. This may cause the first ModelEvaluator#evaluate(Map<String, ?>) method invocation to take somewhat longer to complete (relative to all the subsequent method invocations). If the model contains model verification data, then this "warm-up cost" is paid once and for all during the initial ModelEvaluator#verify() method invocation.

Thread safety

The ModelEvaluatorBuilder base class is thread safe. It is permitted to construct and configure a central ModelEvaluatorBuilder instance, and invoke its ModelEvaluatorBuilder#build() method concurrently.

Some ModelEvaluatorBuilder subclasses may extend the base class with functionality that is not thread safe. The case in point are all sorts of "loading" implementations, which modify the value of ModelEvaluatorBuilder#pmml and/or ModelEvaluatorBuilder#model fields.

The ModelEvaluator base class and all its subclasses are completely thread safe. It is permitted to share a central ModelEvaluator instance between any number of threads, and invoke its ModelEvaluator#evaluate(Map<String, ?>) method concurrently.

The JPMML-Evaluator library follow functional programming principles. In a multi-threaded environment, its data throughput capabilities should scale linearly with respect to the number of threads.

Querying the "data schema" of models

The model evaluator can be queried for the list of input (ie. independent), target (ie. primary dependent) and output (ie. secondary dependent) field definitions, which provide information about field name, data type, operational type, value domain etc.

Querying and analyzing input fields:

List<? extends InputField> inputFields = evaluator.getInputFields();
for(InputField inputField : inputFields){
	org.dmg.pmml.DataField pmmlDataField = (org.dmg.pmml.DataField)inputField.getField();
	org.dmg.pmml.MiningField pmmlMiningField = inputField.getMiningField();

	org.dmg.pmml.DataType dataType = inputField.getDataType();
	org.dmg.pmml.OpType opType = inputField.getOpType();

	switch(opType){
		case CONTINUOUS:
			com.google.common.collect.RangeSet<Double> validInputRanges = inputField.getContinuousDomain();
			break;
		case CATEGORICAL:
		case ORDINAL:
			List<?> validInputValues = inputField.getDiscreteDomain();
			break;
		default:
			break;
	}
}

Querying and analyzing target fields:

List<? extends TargetField> targetFields = evaluator.getTargetFields();
for(TargetField targetField : targetFields){
	org.dmg.pmml.DataField pmmlDataField = targetField.getField();
	org.dmg.pmml.MiningField pmmlMiningField = targetField.getMiningField(); // Could be null
	org.dmg.pmml.Target pmmlTarget = targetField.getTarget(); // Could be null

	org.dmg.pmml.DataType dataType = targetField.getDataType();
	org.dmg.pmml.OpType opType = targetField.getOpType();

	switch(opType){
		case CONTINUOUS:
			break;
		case CATEGORICAL:
		case ORDINAL:
			List<?> validTargetValues = targetField.getDiscreteDomain();

			// The list of target category values for querying HasCategoricalResults subinterfaces (HasProbability, HasConfidence etc).
			// The default element type is String.
			// If the PMML instance is pre-parsed, then the element type changes to the appropriate Java primitive type
			List<?> categories = targetField.getCategories();
			break;
		default:
			break;
	}
}

Querying and analyzing output fields:

List<? extends OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
	org.dmg.pmml.OutputField pmmlOutputField = outputField.getOutputField();

	org.dmg.pmml.DataType dataType = outputField.getDataType(); // Could be null
	org.dmg.pmml.OpType opType = outputField.getOpType(); // Could be null

	boolean finalResult = outputField.isFinalResult();
	if(!finalResult){
		continue;
	}
}

Evaluating models

A model may contain verification data, which is a small but representative set of data records (inputs plus expected outputs) for ensuring that the model evaluator is behaving correctly in this deployment configuration (JPMML-Evaluator version, Java/JVM version and vendor etc. variables). The model evaluator should be verified once, before putting it into actual use.

Performing the self-check:

evaluator.verify();

During scoring, the application code should iterate over data records (eg. rows of a table), and apply the following encode-evaluate-decode sequence of operations to each one of them.

The processing of the first data record will be significantly slower than the processing of all subsequent data records, because the model evaluator needs to lookup, validate and pre-parse model content. If the model contains verification data, then this warm-up cost is borne during the self-check.

Preparing the argument map:

Map<String, ?> inputDataRecord = ...;

Map<String, FieldValue> arguments = new LinkedHashMap<>();

List<InputField> inputFields = evaluator.getInputFields();
for(InputField inputField : inputFields){
	String inputName = inputField.getName();

	Object rawValue = inputDataRecord.get(inputName);

	// Transforming an arbitrary user-supplied value to a known-good PMML value
	// The user-supplied value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue inputValue = inputField.prepare(rawValue);

	arguments.put(inputName, inputValue);
}

Performing the evaluation:

Map<String, ?> results = evaluator.evaluate(arguments);

Extracting primary results from the result map:

List<TargetField> targetFields = evaluator.getTargetFields();
for(TargetField targetField : targetFields){
	String targetName = targetField.getName();

	Object targetValue = results.get(targetName);
}

The target value is either a Java primitive value (as a wrapper object) or a complex value as a Computable instance.

A complex target value may expose additional information about the prediction by implementing appropriate ResultFeature subinterfaces:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;

	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();

	Entity winner = entities.get(hasEntityId.getEntityId());
}

// Test for "probability" result feature
if(targetValue instanceof HasProbability){
	HasProbability hasProbability = (HasProbability)targetValue;

	Set<?> categories = hasProbability.getCategories();
	for(Object category : categories){
		Double categoryProbability = hasProbability.getProbability(category);
	}
}

A complex target value may hold a reference to the model evaluator that created it. It is adisable to decode it to a Java primitive value (ie. decoupling from the JPMML-Evaluator runtime environment) as soon as all the additional information has been retrieved:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	targetValue = computable.getResult();
}

Extracting secondary results from the result map:

List<OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
	String outputName = outputField.getName();

	Object outputValue = results.get(outputName);
}

The output value is always a Java primitive value (as a wrapper object).

Example applications

Module pmml-evaluator-example exemplifies the use of the JPMML-Evaluator library.

This module can be built using Apache Maven:

mvn clean install

The resulting uber-JAR file target/pmml-evaluator-example-executable-1.6-SNAPSHOT.jar contains the following command-line applications:

org.jpmml.evaluator.example.EvaluationExample (source).
org.jpmml.evaluator.example.RecordCountingExample (source).
org.jpmml.evaluator.example.TestingExample (source).

Evaluating model model.pmml with data records from input.csv. The predictions are stored to output.csv:

java -cp target/pmml-evaluator-example-executable-1.6-SNAPSHOT.jar org.jpmml.evaluator.example.EvaluationExample --model model.pmml --input input.csv --output output.csv

Evaluating model model.pmml with data records from input.csv. The predictions are verified against data records from expected-output.csv:

java -cp target/pmml-evaluator-example-executable-1.6-SNAPSHOT.jar org.jpmml.evaluator.example.TestingExample --model model.pmml --input input.csv --expected-output expected-output.csv

Enhancing model model.pmml with verification data records from input_expected-output.csv:

java -cp target/pmml-evaluator-example-executable-1.6-SNAPSHOT.jar org.jpmml.evaluator.example.EnhancementExample --model model.pmml --verification input_expected_output.csv

Getting help:

java -cp target/pmml-evaluator-example-executable-1.6-SNAPSHOT.jar <application class name> --help

Documentation

Up-to-date:

Slightly outdated:

Testing PMML applications

Support

Limited public support is available via the JPMML mailing list.

License

JPMML-Evaluator is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0. For a quick summary of your rights ("Can") and obligations ("Cannot" and "Must") under AGPLv3, please refer to TLDRLegal.

If you would like to use JPMML-Evaluator in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-Evaluator available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-Evaluator is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

jpmml-evaluator's People

Contributors

Stargazers

Watchers

Forkers

ssshow16 thomasdarimont ericbottard borjaeg nkhuyu neubann fatu lucentcosmos edwardt zhangjiaqi anujsrc dadcode fahongsun168 gargprakhar musicx codeaudit make42 zqyang0124 naier jinxustartup cequencer bigsea2015 rahuldhote xbkaishui synthesse ybrovman prog012 daanhoogenboezem lalitnookala cswaroop-opensource parker00811 rohankoid chenditc imanojkumar spikhalskiy pombredanne jieluosc zunwenyou scintillating7 nitinmotgi leoliudeprecated pjpan sethrem camcairns manugarri iqiuyu-0821 demon888 gr4ve zhhb xzwu leezqcst dotrado lingya sdd031215 songfang paulrsing tris-sondon nlpscott goaaron beeva-franciscollaneza hukaishige pythonai red7hj beifeizhou smizoe chenjieit iuanloveyou imxtyler markcoble animeshinvinci novellll watertraveller elambda ronry juanpablocastillo davidemandrini sniperxiaojun marianatuma surya-iquanti hopeztm7500 taozhuo ajmd17 zjlinkin 1035976069lzm fengchu0618 clthomas sschrijver dctdevelopment r00tak tengben0905 akari0725 zhangqiking skalva404 tomterragni armgong yong-zeng kangxz carlossilva-zd2 hivewang tularamban

jpmml-evaluator's Issues

Migrating active field preparation code from 1.2.X API to 1.3.X API

When I run the following code

Object rawValue = 1.0;
FieldValue activeValue = input.prepare(rawValue);

The error always happen:

Exception in thread "main" org.jpmml.evaluator.InvalidResultException
    at org.jpmml.evaluator.FieldValueUtil.performInvalidValueTreatment(FieldValueUtil.java:190)
    at org.jpmml.evaluator.FieldValueUtil.prepareInputValue(FieldValueUtil.java:94)
    at org.jpmml.evaluator.InputField.prepare(InputField.java:64)
    at cn.pmml.test1.PMMLTest.arguments(PMMLTest.java:87)
    at cn.pmml.test1.PMMLTest.main(PMMLTest.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

I tried many ways to fix that, but all failed.

Does JPMML support TransformationDictionary in PMML 4.3 for pre-processing

As I'm using NN, and providing API for prediction querying.

I am expecting normal input params like age=26,gender=m.

So I have to use some pre-processing work before input these into nn-evaluator.

Does JPMML support TransformationDictionary?

If yes, in which package? and how?
If no, any plan scheduled?

Facing Typed check exception org.jpmml.evaluator.TypeCheckException: Expected FLOAT, but got DOUBLE (3.4)

While trying to read the pmml file created using sklearn2pmml using jpmml evaluator for prediction facing this error:

org.jpmml.evaluator.TypeCheckException: Expected FLOAT, but got DOUBLE (3.4)
	at org.jpmml.evaluator.TypeUtil.toFloat(TypeUtil.java:419)
	at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:333)

I am using the version 1.3.5 of the evaluator. PFB the mapper used while creating the pmml file no transformation was specified

iris_pipeline = PMMLPipeline([
  ("mapper", DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), Imputer()])
  ])),
  ("classifier", RandomForestClassifier(n_estimators = 100))
])

jpmml-evaluator requires terminal classification TreeModel Nodes to have score attributes even if they have ScoreDistributions

From the PMML spec (versions 2.0 and up):

When a Node is selected as the final Node and if this Node has no score attribute, then the highest recordCount in the ScoreDistribution determines which value is selected as the predicted class. If a Node contains a sequence of ScoreDistribution elements such that there is more than one entry where recordCount_i is an upper bound, then the first entry is selected.

Note: If a Node has an attribute score then this attribute value overrides the computation of a predicted value from the ScoreDistribution.

The above suggests that it should be OK for a terminal Node in a TreeModel to omit the score attribute so long as it contains at least one ScoreDistribution element and, further, that including a score attribute may in fact weaken the contribution of the ScoreDistributions (though it is of course always possible to add a score attribute that accurately reflects the behavior specified in the above).

Note that, when using multipleModelMethod="average" for a series of TreeModels, jpmml-evaluator (as of 1.1.17) appears to completely ignore the score attributes (i.e. you can set them all to "foo"), instead relying entirely on the ScoreDistributions to make its prediction. It seems odd to be required to provide an attribute that isn't going to be used at all.

Current version 1.1.6 is not tagged as release

Getting wrong svm model result

#20

As per your inputs on above url , we have generated PMML file but output is not coming as per desire output.

PMML snippet:

output file: We are getting output(Predicted_Cluster) as 1->1 and for 2->3 and 3->2.

Please suggest on the above mention.

Evaluator#getActiveFields() should include a synthetic InputField if the model needs to calculate residual values

This issue is based on the following JPMML mailing list thread: https://groups.google.com/forum/#!topic/jpmml/1IsR9zTm4KY

Technically, it is possible to detect if the model contains a residual-type output field, and if so, add an extra value to the argument data record:

List<OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
  if((ResultFeature.RESIDUAL).equals(outputField.getResultFeature())){
    TargetField targetField = Iterables.getOnlyElement(evaluator.getTargetFields()); // Get the sole target field
    arguments.put(targetField.getName(), userArguments.get(targetField.getName()));
  }
}

However, this assumes great familiarity with the PMML specification and the JPMML-Evaluator way of doing things, which is an unreasonable expectation (also, the above code might not work if the residual value is calculated at some deeper model nesting level).

org.dmg.pmml.MiningField.getOptype()Lorg/dmg/pmml/OpType

16/12/12 19:04:26 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, byd0158): java.lang.NoSuchMethodError: org.dmg.pmml.MiningField.getOptype()Lorg/dmg/pmml/OpType;
at org.jpmml.evaluator.ArgumentUtil.isOutlier(ArgumentUtil.java:153)
at org.jpmml.evaluator.ArgumentUtil.prepare(ArgumentUtil.java:69)
at org.jpmml.evaluator.ModelEvaluator.prepare(ModelEvaluator.java:110)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:120)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:110)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply119245_186$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Extended support for the `clusterAffinity` output feature

Hi,

following code (using 1.2.5 release):

final Map<FieldName, ?> results = kMeansModel.evaluate(params);
for (final Entry<FieldName, ?> resultEntry : results.entrySet())
{
    System.out.printf("%s = %s%n", resultEntry.getKey(), resultEntry.getValue());
}

returns this:

null = ClusterAffinityDistribution{result=5, distance_entries=[1=46.498128117308376, 2=47.12002804402491, 3=49.17335819210169, 4=43.117652229258695, 5=39.95722874558617, 6=45.533022467040844, 7=46.711182656888525], entityId=5}
predictedValue = 5
clusterAffinity_1 = 39.95722874558617
clusterAffinity_2 = 39.95722874558617
clusterAffinity_3 = 39.95722874558617
clusterAffinity_4 = 39.95722874558617
clusterAffinity_5 = 39.95722874558617
clusterAffinity_6 = 39.95722874558617
clusterAffinity_7 = 39.95722874558617

shouldn't the clusterAffinity_? have the same values as in the first line?

Regards,
Juraj.

why PMMLEvaluationContext is invisible？

Hello,
In this commit (a309d50) "Restricted the visibility of EvaluationContext constructors", you remove the access control "public", so that I cannot new java object "PMMLEvaluationContext"

Why do that?
In my code, I use PMMLEvaluationContext to process DataTransformation with a non-model PMML file. In the lastest version, how can I do DataTransformation(PMML file only contain ) in another way?

TypeUtil.getDataType(Object value) does not recognize BigDecimal

An exception is thrown when you pass a BigDecimal into the ModelEvaluator method prepare(activeField, rawValue). This is because TypeUtil.getDataType(Object value) does not check for BigDecimal values and an EvaluationException is thrown.

BigDecimal values are recognized as superior to Doubles/Floats for financial calculations and considered 'best practice'. It is recommended that the JPMML framework handles them without having to convert to a Double first.

Adding the following to the getDataType(Object value) method should resolve this issue:

if(value instanceof BigDecimal){
return DataType.DOUBLE;
} else

In addition: improving the message provided within the EvaluationException would also help with diagnosis of future issues. For example, the message 'the class java.math.BigDecimal is not a supported type' would improve the usability of the framework.

LoadingCache maybe lead to OOM,Does the jpmml support scene of Model Iteration?

Hi,I have got a problem,my scene is Model iteration by every day,but the framework of jpmml use LoadingCache as cache, that has a characteristics of delaying to delete.so jpmml leads to jvm memory is very big, even OOM.
The solution : At the same time using weakKeys() and weakValues():

private static LoadingCache<MiningModel, BiMap<String, Segment>> entityCache = CacheUtil.buildLoadingCache(new CacheLoader<MiningModel, BiMap<String, Segment>>(){

		@Override
		public BiMap<String, Segment> load(MiningModel miningModel){
			Segmentation segmentation = miningModel.getSegmentation();

			return EntityUtil.buildBiMap(segmentation.getSegments());
		}
	});

Incorrect number of active fields for NeuralNetwork PMML

I have 132 NeuralInputs in my PMML but evaluator.getActiveFields() method keeps giving me 100.
Is there something that is missing in my PMML.
Attached is my PMML for your reference.
Thanks,

jpmml is not a member of package org

I'm using Scala and SBT. In my build.sbt, I added this line:

libraryDependencies += "org.jpmml" % "jpmml-evaluator" % "1.3.3"

But I still got error jpmml is not a member of package org when importing.

For more information: Scala version is 2.11.8

RuleSet Model doesn't support defaultScore attribute

According to http://www.dmg.org/v4-2-1/RuleSet.html#RuleSet it should be possible to define a default score for a rule set which is returned when none of the rules fire. However, OpenScoring returns a server error in this scenario.

My pmml model:

<PMML xmlns="http://www.dmg.org/PMML-4_2" version="4.2">

  <DataDictionary numberOfFields="1">
    <DataField name="$Result" displayName="$Result" optype="categorical" dataType="string"/>   
  </DataDictionary>

  <RuleSetModel modelName="Trivial" functionName="classification" algorithmName="RuleSet">

    <MiningSchema>
      <MiningField name="$Result" usageType="target"/>
    </MiningSchema>

    <LocalTransformations>
      <DerivedField name="foobar" displayName="foobar" optype="categorical" dataType="boolean">
        <Constant>true</Constant>
      </DerivedField>
    </LocalTransformations>

    <RuleSet defaultScore="True" defaultConfidence="0.0">
      <RuleSelectionMethod criterion="firstHit"/>

      <SimpleRule id="RULE1" score="Something">
        <SimplePredicate field="foobar" operator="equal" value="false"/>
      </SimpleRule>
    </RuleSet>

  </RuleSetModel>
</PMML>

JSON request:

{
    "id": "example-001", 
    "arguments": {}
}

The result:

$ curl -X POST --data-binary @trivial-example-request.json -H "Content-type: application/json" http://localhost:8080/openscoring/model/trivial
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 </title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /openscoring/model/trivial. Reason:
<pre>    Internal Server Error</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>

Clustering pmml model execution

Hi,
I am new to PMML execution using JPMML Evaluator.
When i tried to execute clustering pmml model(KNIME) for Iris data from DMG site got the exception.
Exception in thread "main" java.lang.NullPointerException
at org.jpmml.evaluator.BatchUtil.formatRecords(BatchUtil.java:190)
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:295)
at org.jpmml.evaluator.Example.execute(Example.java:60)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:127)

I have used the below line for pmml execution in my local command prompt.
java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model model.pmml --input input.tsv --output output.tsv

Please help me.

Impose soft limit on the maximum number of input fields

People are working with models that specify tens to hundreds of THOUSANDS input fields:

Evaluator evaluator = ...;
List<InputField> inputFields = evaluator.getInputFields();
System.out.println(inputFields.size()); // Prints 100'000

For example: http://stats.stackexchange.com/questions/152891/bad-performance-of-pmml-evaluator and http://stackoverflow.com/questions/42074491/evaluate-method-takes-long-time-pmml-models-using-jpmml

Understandably, such "structurally valid but conceptually/functionally invalid" models cannot be made to perform, not by the JPMML-Evaluator library, or any other PMML scoring engine.

By default, the JPMML-Evaluator library should simply refuse to deal with them:

if(inputFields.size() > 1000){
  throw new EvaluationException("The model specifies unreasonably large number of input fields, which is indicative of bad data science/engineering process. Please refactor the model");
}

However, the limit should be programmatically customizable. If people want to do stupid things, then they should have technical means to do so.

Evaluate error

Sorry to trouble you again~
The jpmml works well when I use LogisticRegression, but fails with other models like randomforest
The model comes from sklearn, and I use your awesome tool sklearn2pmml

the error is

Exception in thread "main" org.jpmml.evaluator.EvaluationException
    at org.jpmml.evaluator.CategoricalValue.compareToString(CategoricalValue.java:39)
    at org.jpmml.evaluator.FieldValue.compareTo(FieldValue.java:139)
    at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:131)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:63)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluateNode(TreeModelEvaluator.java:201)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.handleTrue(TreeModelEvaluator.java:218)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluateTree(TreeModelEvaluator.java:162)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluateClassification(TreeModelEvaluator.java:137)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluate(TreeModelEvaluator.java:106)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:407)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:240)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:207)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:185)
    at com.ctrip.hotelbi.jpmml.Score.gettingProbability(Score.java:32)
    at com.ctrip.hotelbi.jpmml.Score.gettingProbability(Score.java:53)
    at com.ctrip.hotelbi.jpmml.PMMLTest.main(PMMLTest.java:41)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

The codes are

public class Process {
    private String[] data;
    private Evaluator evaluator;

    public Process() {
    }

    public Process(String[] data, Evaluator evaluator) {
        this.data = data;
        this.evaluator = evaluator;
    }

    public Map<FieldName, FieldValue> testData() {
        /**
         * Prepare test data
         * @return input data for prediction
         */
        Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
        List<InputField> inputs = this.evaluator.getActiveFields();
        for (InputField input : inputs) {
            FieldName activeName = input.getName();
            int i = inputs.indexOf(input);
            FieldValue activeValue = null;
            try {
                if (input.getDataType().equals(DataType.DOUBLE)) {
                    activeValue = input.prepare(Double.parseDouble(this.data[i]));
                }else activeValue = FieldValueUtil.create( this.data[i] );

            }catch (Exception e){
                activeValue = FieldValueUtil.create(0.0);
                e.printStackTrace();
            }
            arguments.put(activeName, activeValue);


        }

        return arguments;
    }
}

public class Score extends Process{
    private String[] data;
    private Evaluator evaluator;

    public Score(String[] data, Evaluator evaluator) {
        super(data, evaluator);
    }

    public ArrayList<?> gettingProbability(Evaluator evaluator){
        /**
         Predict all target label probabilities
         @param evaluator pmml model
         @return probability score of each label
         */
        Map<FieldName, FieldValue> testData = super.testData();

        ArrayList<Object> score = new ArrayList();

        System.out.println(testData.size());
        Map<FieldName,?> finalResults = evaluator.evaluate(testData);


        for(FieldName t : finalResults.keySet()){

            if (finalResults.get(t) instanceof Double) {
                score.add((Double) finalResults.get(t));
            }else{
                score.add(finalResults.get(t));
            }
        }
        return score;
    }

    public Double gettingProbability(Evaluator evaluator,int targetLabelIndex){
        /**
         Predict target label probability
         @param evaluator pmml model
         @param targetLabelIndex the index of target label that you want to predict
         @return probability score of each label
         */
        ArrayList<?> scoreArray = this.gettingProbability(evaluator);
        Double targetScore = (Double) scoreArray.get(targetLabelIndex);
        return targetScore;

    }
}

public class PMMLTest {
    public static void main(String[] args) throws IOException, JAXBException, SAXException {
        //Loading data
        CSVReader reader = new CSVReader(new FileReader("d:\\Users\\shuangyangwang\\Desktop\\JPMML\\Iris1.csv"));
        List<String[]> data = reader.readAll();
        data.remove(0);
        reader.close();

        //Loading model

        InputStream is = new FileInputStream("d:\\Users\\shuangyangwang\\Desktop\\Test\\ExtraTreesClassifier.pmml");
        PMML model = PMMLUtil.unmarshal(is);
        is.close();

        ModelEvaluatorFactory mef = ModelEvaluatorFactory.newInstance();
        ModelEvaluator<?> modelEvaluator = mef.newModelEvaluator(model);
        Evaluator evaluator = (Evaluator) modelEvaluator;
        evaluator.verify();

        //Predicting probability
        List<ArrayList<?>> listArray = new ArrayList<>();
        for (String[] s : data) {
//            PreprocessData ppd = new PreprocessData(s, evaluator);
//            Map<FieldName, FieldValue> testData = ppd.testData();
            Score scoreE = new Score(s, evaluator);
            //ArrayList<Double> result = (ArrayList<Double>) scoreE.gettingProbability(evaluator);
            Double score = scoreE.gettingProbability( evaluator ,1);
            System.out.println(score);
            //listArray.add(result);
        }

    }
}

I really don't know what is wrong with that, please give me some suggestions
Thank you very much

Getting wrong results for myata using svm pmml model

Hi,
Using jpmml evaluator for SVM pmml model execution for Audit data set working fine , But for user data getting wrong results. Actually the data set having 4 fields in that one is target field,contains three categories.
I have used below line for execution in my console.

java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model model.pmml --input input.tsv --output output.tsv

Please help me on the above mention query.
Thanks in advance...

Add a changelog

Hello,

could you please add (and maintain) a changelog?

Cheers,
Thomas

Generalized Regression Model: Output not a valid pmf (probability mass function)

Hello,

My issue pertains to GeneralRegressionModelEvaluator.java and specifically to Generalized Linear Model.

If we take into consideration a two-class (here, +1 and -1) classification problem, then the generalized linear model would estimate the Pr(class = class1) and Pr(class = class2) for any data point given it's feature vector. This is done by modeling the distributions with a logit function. Since we're estimating a pmf, we will have Pr(class = class1) + Pr(class = class2) = 1.

If we look at the loop starting at line 337, it is basically supposed to do the same thing -- iterate our the different classes/categories and compute its probability. Everything goes well for class1, but when the code does the computation for class2 (which is the last category), it assigns value = 0 in line 417 and passes that through the logit function. This will always give the probability of last category to be 0.5, no matter how many categories are there.

For a two-category problem, say the probability we compute in the first iteration of the for loop starting at line 337 for category 1 is value1, then the probability of the other class should be simply (1 - value1). This is not achieved by the code. In fact it would always assign the probability for the last category to be equal to 0.5.

If I'm right, a quick fix could be that for the last category, the probability should be just 1 - sum(all the rest probabilities).

Thanks
Akshay

InvalidFeatureException: MiningField

Hi Villu,
I have also attached my file

)

I am generating PMML for NeuralNetwork but when i use the evaluator it keeps throwing this exception.

org.jpmml.evaluator.InvalidFeatureException: MiningField
	at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:72)
	at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:61)
	at org.jpmml.evaluator.ModelEvaluator$4.load(ModelEvaluator.java:688)
	at org.jpmml.evaluator.ModelEvaluator$4.load(ModelEvaluator.java:684)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3628)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2336)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2295)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2208)
	at com.google.common.cache.LocalCache.get(LocalCache.java:4053)
	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4057)
	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4986)
	at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:51)
	at org.jpmml.evaluator.ModelEvaluator.<init>(ModelEvaluator.java:128)
	at org.jpmml.evaluator.neural_network.NeuralNetworkEvaluator.<init>(NeuralNetworkEvaluator.java:90)
	at org.jpmml.evaluator.neural_network.NeuralNetworkEvaluator.<init>(NeuralNetworkEvaluator.java:86)
	at com.baesystems.ai.analytics.smile.pmml.NeuralNetworkPMMLTest.createEvaluator(NeuralNetworkPMMLTest.java:130)
	at com.baesystems.ai.analytics.smile.pmml.NeuralNetworkPMMLTest.testLeastMeanSqaures(NeuralNetworkPMMLTest.java:123)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

InvalidFeatureException from spark context

Hi,

When I try to use the evaluator from a spark context, it will not create the model manager because of pmml validation problems.

Exception in thread "main" org.jpmml.evaluator.InvalidFeatureException (at or around line 8): DataDictionary                                          [13/1951]
        at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:58)
        at org.jpmml.evaluator.ModelEvaluator.<init>(ModelEvaluator.java:113)
        at org.jpmml.evaluator.TreeModelEvaluator.<init>(TreeModelEvaluator.java:54)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:101)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:45)
        at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:66)
        at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:46)
        at com.example.Main.main(Main.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.dmg.pmml.DataField cannot be cast to org.dmg.pmml.Indexable
        at org.jpmml.evaluator.IndexableUtil.ensureKey(IndexableUtil.java:78)
        at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:64)
        at org.jpmml.evaluator.ModelEvaluator$1.load(ModelEvaluator.java:538)
        at org.jpmml.evaluator.ModelEvaluator$1.load(ModelEvaluator.java:534)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
        at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:50)
        ... 16 more

Here is my java class I am submitting:

package com.example;

import org.dmg.pmml.PMML;
import org.jpmml.evaluator.Evaluator;
import org.jpmml.evaluator.ModelEvaluatorFactory;
import org.jpmml.model.ImportFilter;
import org.jpmml.model.JAXBUtil;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.bind.JAXBException;
import javax.xml.transform.Source;

public final class Main {
    public static void main(String[] args) {
        System.out.println("hello world");

        try {
            Source transformedSource = ImportFilter.apply(new InputSource(Main.class.getResourceAsStream("/DecisionTreeIris.pmml")));
            PMML pmml = JAXBUtil.unmarshalPMML(transformedSource);
            ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance();
            Evaluator evaluator = modelEvaluatorFactory.newModelManager(pmml);
            evaluator.verify();
        } catch (SAXException | JAXBException e) {
            // could not parse pmml as xml
            throw new RuntimeException(e);
        }
    }
}

Submitting with spark-submit --class com.example.Main /path/to/example-assembly.jar.

It does not throw the error when I run the assembled jar like java -jar /path/to/example-assembly.jar.

DecisionTreeIris.pmml is from here.

Thanks for the project. Any help is appreciated.

Performance issues with 'at org.jpmml.evaluator.ModelEvaluator.getInputFields(ModelEvaluator.java:207)' for multiple threads #2

With around ~900 input fields of type double in my model , most of the threads waste time (28% of the execution time ) in this method 'at org.jpmml.evaluator.ModelEvaluator.getInputFields(ModelEvaluator.java:207)'which is called everytime per thread execution.

Same method is called while creating the arguments per thread , I made a common inputField over there which solved that issues but again for evaluate , it is calling that method and affecting the performance.

Can we pass inputFields in evaluate method along with arguments , this could save 28% of the execution time ? This would require changing arguments everywhere.

Thread Dump:

    at org.jpmml.evaluator.ModelEvaluator.createInputFields(ModelEvaluator.java:397)
    at org.jpmml.evaluator.ModelEvaluator.getInputFields(ModelEvaluator.java:207)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.createSegmentHandler(MiningModelEvaluator.java:600)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:367)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:240)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:207)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:185)

Subclasses of org.jpmml.evaluator.Computable throw EvaluationException if result is null

Why do subclasses of Computable like InstanceClassificationMap throw an EvaluationException if the getResult is called and result is null (e.g: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/InstanceClassificationMap.java)? Wouldn't it be more correct to leave the interpretation of an expected/unexpected "null" result up to the caller?

The method FieldValueUtil#getStatus(DataField, MiningField, Object) could return `VALID` even if the argument is invalid

https://groups.google.com/forum/#!topic/jpmml/3PAKCCXfil4

Lost task 0.0 in stage 1.0 (TID 2, byd0158): org.jpmml.evaluator.InvalidFeatureException (at or around line 5759): Target

16/12/13 14:58:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, byd0158): org.jpmml.evaluator.InvalidFeatureException (at or around line 5759): Target
at org.jpmml.evaluator.IndexableUtil.ensureKey(IndexableUtil.java:81)
at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:64)
at org.jpmml.evaluator.ModelEvaluator$7.load(ModelEvaluator.java:586)
at org.jpmml.evaluator.ModelEvaluator$7.load(ModelEvaluator.java:582)
at com.shaded.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
at com.shaded.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
at com.shaded.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.shaded.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.shaded.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.shaded.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.shaded.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:50)
at org.jpmml.evaluator.ModelEvaluator.(ModelEvaluator.java:139)
at org.jpmml.evaluator.MiningModelEvaluator.(MiningModelEvaluator.java:79)
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:66)
at org.jpmml.evaluator.MiningModelEvaluator.createSegmentHandler(MiningModelEvaluator.java:559)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:355)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:223)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:190)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:167)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:162)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:128)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:113)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalExpr2$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

JPMML Compilation Error

Description - Duplicate methods named spliterator with the parameters () and () are inherited from the types Collection and IterableSparseArrayUtil.java /pmml- evaluator/src/
main/java/org/jpmml/evaluator

While compiling the project eclipse, I am getting above error. Have downloaded latest code yesterday.

logistic regression fail in 1.1.7

In 1.1.7, when we try to consume a logistic regression under RegressionModel, we encountered the below error message.

We also tried linear regression and regression with more than two categories, they are working all fine. We also tried to switch back to 1.1.3, under 1.1.3, the logistic regression works fine also.

Exception in thread "main" org.jpmml.manager.InvalidFeatureException (at or around line 33): RegressionModel
at org.jpmml.evaluator.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:130)
at org.jpmml.evaluator.RegressionModelEvaluator.evaluate(RegressionModelEvaluator.java:71)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:425)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:211)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:108)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:86)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:68)
at org.jpmml.evaluator.CsvEvaluationExample.evaluateAll(CsvEvaluationExample.java:226)
at org.jpmml.evaluator.CsvEvaluationExample.execute(CsvEvaluationExample.java:97)
at org.jpmml.evaluator.Example.execute(Example.java:45)
at org.jpmml.evaluator.CsvEvaluationExample.main(CsvEvaluationExample.java:72)

different results in evaluation

I get different results in evaluation from using predict in R in comparison in using published pmml code via jpmml-xgboost and openscoring

interested in sample data set? and the r code?!

Null in Result

Hi villu,

ProbabilityDistribution prob = (ProbabilityDistribution) results.get(evaluator.getTargetField().getName());

This is returning me null. I dont know whats going on.
I have tried to match my PMML with the example that you showed me but even then its failing.
Can you please look into it and guide me.

Thanks

Boolean input variables not recognised by evaluator function.

I have the following R code for generating 2 csv and 2 pmml files based on the iris dataset:

data(iris)
library(pmml)

# build a model for Sepal.Length based on remaining variables
model.glm <- glm(Sepal.Length ~ ., data=iris)
saveXML(pmml(model.glm), "iris.glm.pmml")

# write csv file for testing
write.csv(iris, 'iris.csv', quote=FALSE, row.names=FALSE)

# set remaining variables to booleans
iris$Sepal.Width <- as.logical(iris$Sepal.Width > 3)
iris$Petal.Length <- as.logical(iris$Petal.Length > 4)
iris$Petal.Width <- as.logical(iris$Petal.Width > 1)
iris$Species   <- as.logical(iris$Species=='setosa')

# rebuild model for Sepal.Length
model.glm <- glm(Sepal.Length ~ ., data=iris)
saveXML(pmml(model.glm), "iris.glm.bool.pmml")

# write csv file for testing
write.csv(iris, 'iris.bool.csv', quote=FALSE, row.names=FALSE)

The problem becomes apparent when doing predictions. The files iris.csv and iris.glm.pmml produce the desired output. The files iris.bool.csv and iris.glm.bool.pmml produce the same value
for every record, regardless of the input data.

Incompatible Google Guava library dependency

When running the following code
ModelEvaluator<RegressionModel> modelEvaluator = new RegressionModelEvaluator(model); Evaluator evaluator = (Evaluator) modelEvaluator;

I got error like this :
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.from(Lcom/google/common/cache/CacheBuilderSpec;)Lcom/google/common/cache/CacheBuilder;

Maven pulling snapshot dependencies

Hi, I'm using this library as a dependency in a maven project.

            <dependency>
                <groupId>org.jpmml</groupId>
                <artifactId>pmml-evaluator</artifactId>
                <version>1.3.3</version>
            </dependency>

When I compile the project, I get

[WARNING] The POM for com.google.guava:guava:jar:19.0-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for org.apache.commons:commons-math3:jar:3.5-SNAPSHOT is missing, no dependency information available

It looks like this is due to the version 'constraint' of guava for example <version>[14.0, 19.0]</version>.

The documentation regarding version ranges tell that it is possible to take snapshots into account when resolving them. https://docs.oracle.com/middleware/1221/core/MAVEN/maven_version.htm#MAVEN8903

Why do you use version constraints and not just pick one version? And is there a way to get rid of these SNAPSHOT resolution?

org.jpmml.evaluator.EvaluationException

I am working on RulesInduction model and JPMML keeps complaining about this file whose contents i have copied in this issue. I am not able to figure out what is the problem. Please help me with it.

Exception:

org.jpmml.evaluator.EvaluationException
    at org.jpmml.evaluator.CategoricalValue.compareToString(CategoricalValue.java:39)
    at org.jpmml.evaluator.FieldValue.compareTo(FieldValue.java:143)
    at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:131)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:63)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.PredicateUtil.evaluateCompoundPredicateInternal(PredicateUtil.java:200)
    at org.jpmml.evaluator.PredicateUtil.evaluateCompoundPredicate(PredicateUtil.java:168)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:71)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluateRule(RuleSetModelEvaluator.java:190)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluateRules(RuleSetModelEvaluator.java:216)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluateClassification(RuleSetModelEvaluator.java:109)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluate(RuleSetModelEvaluator.java:84)
    at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:406)
    at com.norkom.blake.pmml.IrepRuleTest.makePredictions(IrepRuleTest.java:177)
    at com.norkom.blake.pmml.IrepRuleTest.testRules(IrepRuleTest.java:106)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at junit.framework.TestCase.runTest(TestCase.java:176)
    at junit.framework.TestCase.runBare(TestCase.java:141)
    at junit.framework.TestResult$1.protect(TestResult.java:122)
    at junit.framework.TestResult.runProtected(TestResult.java:142)
    at junit.framework.TestResult.run(TestResult.java:125)
    at junit.framework.TestCase.run(TestCase.java:129)
    at junit.framework.TestSuite.runTest(TestSuite.java:255)
    at junit.framework.TestSuite.run(TestSuite.java:250)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

PMML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2">
    <DataDictionary numberOfFields="13">
        <DataField name="TransactionType" optype="categorical" dataType="string">
            <Value value="ATM"/>
            <Value value="Point of Sale"/>
            <Value value="Point of Sale BGC"/>
            <Value value="Term Deposit Post Office"/>
        </DataField>
        <DataField name="Amount" optype="continuous" dataType="double"/>
        <DataField name="CreditOrDebit" optype="categorical" dataType="string">
            <Value value="Credit"/>
            <Value value="Debit"/>
        </DataField>
        <DataField name="Currency" optype="categorical" dataType="string">
            <Value value="DOLLAR"/>
            <Value value="EUR"/>
        </DataField>
        <DataField name="POSAmount3Days.acc.day.present" optype="continuous" dataType="double"/>
        <DataField name="POSAmount3Days.acc.day.total" optype="continuous" dataType="double"/>
        <DataField name="POSAmount4hr.acc.hour4" optype="continuous" dataType="double"/>
        <DataField name="POSAmount60Mins.acc.minute60" optype="continuous" dataType="double"/>
        <DataField name="POSCount3Days.cnt.day.present" optype="continuous" dataType="double"/>
        <DataField name="POSCount3Days.cnt.day.total" optype="continuous" dataType="double"/>
        <DataField name="POSCount4hr.cnt.hour4" optype="continuous" dataType="double"/>
        <DataField name="POSCount60Mins.cnt.minute60" optype="continuous" dataType="double"/>
        <DataField name="Fraud" optype="categorical" dataType="double">
            <Value value="0.0"/>
            <Value value="1.0"/>
        </DataField>
    </DataDictionary>
    <RuleSetModel modelName="RulesSetModel" functionName="classification">
        <MiningSchema>
            <MiningField name="TransactionType"/>
            <MiningField name="Amount"/>
            <MiningField name="CreditOrDebit"/>
            <MiningField name="Currency"/>
            <MiningField name="POSAmount3Days.acc.day.present"/>
            <MiningField name="POSAmount3Days.acc.day.total"/>
            <MiningField name="POSAmount4hr.acc.hour4"/>
            <MiningField name="POSAmount60Mins.acc.minute60"/>
            <MiningField name="POSCount3Days.cnt.day.present"/>
            <MiningField name="POSCount3Days.cnt.day.total"/>
            <MiningField name="POSCount4hr.cnt.hour4"/>
            <MiningField name="POSCount60Mins.cnt.minute60"/>
            <MiningField name="Fraud" usageType="target"/>
        </MiningSchema>
        <RuleSet recordCount="5152.0" nbCorrect="5033.0" defaultScore="0" defaultConfidence="0.0">
            <RuleSelectionMethod criterion="firstHit"/>
            <SimpleRule id="Rule0" score="1.0" recordCount="95.0" nbCorrect="89.0" confidence="0.9325842696629213">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="104.1"/>
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="182.63"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="6.0"/>
                <ScoreDistribution value="1.0" recordCount="89.0"/>
            </SimpleRule>
            <SimpleRule id="Rule1" score="1.0" recordCount="8.0" nbCorrect="8.0" confidence="1.0">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="80.0"/>
                    <SimplePredicate field="Amount" operator="greaterOrEqual" value="104.1"/>
                    <SimplePredicate field="Amount" operator="lessOrEqual" value="104.16"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="0.0"/>
                <ScoreDistribution value="1.0" recordCount="8.0"/>
            </SimpleRule>
            <SimpleRule id="Rule2" score="1.0" recordCount="16.0" nbCorrect="13.0" confidence="0.7692307692307693">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount60Mins.acc.minute60" operator="greaterOrEqual" value="37.64"/>
                    <SimplePredicate field="POSAmount3Days.acc.day.present" operator="greaterOrEqual" value="148.57"/>
                    <SimplePredicate field="TransactionType" operator="greaterOrEqual" value="13.0"/>
                    <SimplePredicate field="POSAmount3Days.acc.day.present" operator="greaterOrEqual" value="261.19"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="3.0"/>
                <ScoreDistribution value="1.0" recordCount="13.0"/>
            </SimpleRule>
            <SimpleRule id="Rule3" score="1.0" recordCount="8.0" nbCorrect="6.0" confidence="0.6666666666666666">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="90.95"/>
                    <SimplePredicate field="Amount" operator="greaterOrEqual" value="147.57"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="2.0"/>
                <ScoreDistribution value="1.0" recordCount="6.0"/>
            </SimpleRule>
            <SimpleRule id="Rule4" score="1.0" recordCount="4.0" nbCorrect="3.0" confidence="0.6666666666666666">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="90.95"/>
                    <SimplePredicate field="POSAmount60Mins.acc.minute60" operator="lessOrEqual" value="90.95"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="1.0"/>
                <ScoreDistribution value="1.0" recordCount="3.0"/>
            </SimpleRule>
        </RuleSet>
    </RuleSetModel>
</PMML>

FunctionUtil.evaluate order

we have an issue in production where we have a DefineFunction in the pmml

Looking at the FunctionUtil.evaluate it will try to do some reflection stuff to find a user defined one before trying to use the one from the pmml. The problem is that reflection is a bit too slow for us in production. It would be great to either have some way to supply our own FunctionRegistry or to change the order in which the functions are resolved in FunctionUtil.

MiningModelEvaluator multipleModelMethod with weightedAverage not support probability OutputField feature

my scenario is this , I train an random forest pmml file
I use multipleModelMethod=weightedAverage , and want to output label's multi class probability
the pmml file like this

then it throws an Exception

org.jpmml.evaluator.TypeCheckException: Expected org.jpmml.evaluator.HasProbability, but got org.jpmml.evaluator.ClassificationMap ({0=0.6526508348685987, 1=0.3473491651314011})
at org.jpmml.evaluator.OutputUtil.asResultFeature(OutputUtil.java:848)
at org.jpmml.evaluator.OutputUtil.getProbability(OutputUtil.java:478)
at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:182)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:117)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:85)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
at com.alipay.mymdp.model.component.impl.pmml.engine.PmmlComponentEngine.execute(PmmlComponentEngine.java:49)
at com.alipay.mymdp.model.component.impl.pmml.engine.PmmlComponentEngine.executePmmlComponentEngine(PmmlComponentEngine.java:35)
at com.alipay.mymdp.model.component.impl.pmml.engine.TestPmmlComponentEngine.testRF2PmmlCom

Question: multipleModelMethod="max" when multiple classes have max

Hi @vruusmann,

Looking through the implementation of multipleModelMethod="max" for classification, particularly: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/ProbabilityAggregator.java#L207

Suppose we have a case with three segments that are predicting three classes and we have the following probabilities:

{a: 0.8, b: 0.1, c: 0.1},
{a: 0.5, b: 0.1, c: 0.4},
{a: 0.1, b: 0.1, c: 0.8}

Then using the max I would expect the average of the first and third model:

{a: 0.45, b: 0.1, c: 0.45}

Is that your interpretation of the spec? max: consider the model(s) that have contributed the chosen probability for the winning category. Return their average probabilities;
Will the implementation linked to above return that?

installation

I do realise this might sound like a stupid question, but I am not used to java nor mvn. I've spent already more than an hour trying to install the evaluator. I've first tried mvn get with the central repository, then git cloning and mvn build, both haven't got me nowhere. Please advise, highly appreciated.

approach:

mvn org.apache.maven.plugins:maven-dependency-plugin:2.8:get -Dartifact=org.jpmml:pmml-evaluator:1.2.5:jar -DoutputDirectory=.
I've tried a lot of variants of this command searching around and looking over tutorials. But I am still not sure, where to go from here.
When I try
java -jar target/pmml-evaluator-1.2.5-sources.jar
it tells me about a missing manifest file. I've tried including this in the pom file provided in the central repository including the option -DpomFile=pom.xml, but it's complaining about the execution ids.
approach:

git clone https://github.com/jpmml/jpmml-evaluator
cd jpmml-evaluator/
mvn build pom.xml

[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.jpmml:pmml-evaluator:jar:1.2-SNAPSHOT
[WARNING] 'parent.relativePath' of POM org.jpmml:jpmml-evaluator:1.2-SNAPSHOT (/Users/<>/target/jpmml-evaluator/pom.xml) points at org.jpmml:pmml-evaluator instead of org.sonatype.oss:oss-parent, please verify your project structure @ org.jpmml:jpmml-evaluator:1.2-SNAPSHOT, /Users/benjamin/target/jpmml-evaluator/pom.xml, line 5, column 10
...
[much more of this]
...
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] JPMML-Evaluator
[INFO] JPMML evaluator
[INFO] JPMML evaluator example
[INFO] JPMML KNIME integration tests
[INFO] JPMML RapidMiner integration tests
[INFO] JPMML R/Rattle integration tests
[INFO] JPMML evaluator code coverage
[INFO] JPMML extension
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building JPMML-Evaluator 1.2-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] JPMML-Evaluator .................................... FAILURE [ 0.388 s]
[INFO] JPMML evaluator .................................... SKIPPED
[INFO] JPMML evaluator example ............................ SKIPPED
[INFO] JPMML KNIME integration tests ...................... SKIPPED
[INFO] JPMML RapidMiner integration tests ................. SKIPPED
[INFO] JPMML R/Rattle integration tests ................... SKIPPED
[INFO] JPMML evaluator code coverage ...................... SKIPPED
[INFO] JPMML extension .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.158 s
[INFO] Finished at: 2015-10-05T16:24:08+01:00
[INFO] Final Memory: 5M/65M
[INFO] ------------------------------------------------------------------------
[ERROR] Unknown lifecycle phase "pom.xml". You must specify a valid lifecycle phase or a goal in the format : or :[:]:. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
...

Attempt to invoke virtual method 'boolean org.dmg.pmml.PMML.hasModels()' on a null object reference

I am currently attempting to evaluate the a .pmml model created with sklearn2pmml. However, whenever I attempt to run the code ModelEvaluator<NearestNeighborModel> modelEvaluator = new NearestNeighborModelEvaluator(pmml);, I get the following error:

java.lang.NullPointerException: Attempt to invoke virtual method 'boolean org.dmg.pmml.PMML.hasModels()' on a null object reference
  at org.jpmml.evaluator.ModelEvaluator.selectModel(ModelEvaluator.java:584)
  at org.jpmml.evaluator.nearest_neighbor.NearestNeighborModelEvaluator.<init>(NearestNeighborModelEvaluator.java:105)
  at com.mygdx.game.DrawView.pitchAngle(DrawView.java:295)
  at com.mygdx.game.StartGdxGame.render(StartGdxGame.java:113)
  at com.badlogic.gdx.backends.android.AndroidGraphics.onDrawFrame(AndroidGraphics.java:459)
  at android.opengl.GLSurfaceView$GLThread.guardedRun(GLSurfaceView.java:1522)
  at android.opengl.GLSurfaceView$GLThread.run(GLSurfaceView.java:1239)

I have double checked and the code can find and has access to the .pmml-file and a model does exist in the .pmml file in the form <NearestNeighborModel functionName="regression" numberOfNeighbors="400" continuousScoringMethod="average">.

Is there any other reason for the error? Did I maybe compile the .pmml incorrectly?

Openscoring not supporting ensemble.GradientBoostingClassifier

Hello Vilu,

I've trained a ensemble.GradientBoostingClassifier classifier and deployed it to openscoring but I keep getting 400 after the requests.

Using the same pipeline to generate the pmml (using sklearn2pmml) and requesting with the same input works well on simplier models (like linear_model.LogisticRegression()).

Is GradientBoostingClassifier supported by the sklearn2pmml but not by openscoring?

Thanks!

TreeModel prediction mismatch between KNIME and JPMML

Hello,

I trained two models in Knime: a Neural Network and a Decision Tree.

Im comparing the results in Knime and in Java.

When taking look at the Neural Network, Im getting the same results.

When Decision Tree Model, Im getting all observation going to false.

I tried to read de PMML Model inside Knime and the results are not getting it.

Can you help me?

Accessing the output values of individual Neurons in NeuralNetwork model using the `entityId` output feature

See openscoring/openscoring#14

Missing Value Penalty in Tree Model

Looking at the following from http://dmg.org/pmml/v4-3/TreeModel.html#xsdType_MISSING-VALUE-STRATEGY

missingValuePenalty:

This optional attribute of TreeModel allows computed confidences to be reduced by a specified factor each time certain kinds of missing value handling are invoked during the scoring of a case. For each Node where either surrogate rules or the defaultChild strategy had to be used to select a child, the final confidences are multiplied by this factor. Note that this is based on the number of Nodes, not on the overall number of missing values that were encountered (with operator surrogate, multiple missing values can be encountered within a single Node). For example, if two Nodes with missing values were encountered to get to the final prediction, confidence is multiplied by the two missingValuePenalty values.

It sounds like the value of missingLevels in https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/tree/TreeModelEvaluator.java should be the number of nodes that evaluate to Unknown, and nodes that rescue missing using surrogate should not count

that seems to be contrary to the logic here https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/tree/TreeModelEvaluator.java#L193-L195

am I reading the code wrong, or misinterpreting PMML?

Getting exception for clustering

Hi,
I am new to clustering pmml model execution using jpmml evaluator.
Getting exception when I am running below line.

java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model D:\analytics\Test\pmml\AuditKMeans.pmml --input D:\analytics\Test\csv\AuditData_Test.csv--output D:\analytics\Test\output\Audit_KmeansRes.csv

Exception in thread "main" java.lang.IllegalArgumentException: Missing active field(s): [Age, Income, Deductions, Hours]
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:217)
at org.jpmml.evaluator.Example.execute(E
Sample.zip
xample.java:60)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:127)

I have attached input and model files.
Sample.zip

Please help to sort out of the above exception.

Duplicate value Exception for Tree Model using Iris Data --- execution

Hi,
am new to this, while i have execute below exception is occuring.
R-PMML tree model getting exception,

used below line for execution:
D:\JPMML\jpmml-evaluator-master\pmml-evaluator-example>java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model D:\JPMML\Test\pmml\IrisTree.pmml --input D:\JPMML\Test\csv\Iris.csv --output D:\JPMML\Test\output\TreeOutput.csv

Exception in thread "main" org.jpmml.evaluator.DuplicateValueException: class
at org.jpmml.evaluator.EvaluationContext.declare(EvaluationContext.java:91)
at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:330)
at org.jpmml.evaluator.TreeModelEvaluator.evaluate(TreeModelEvaluator.java:93)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:406)
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:261)
at org.jpmml.evaluator.Example.execute(Example.java:60)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:127)

WARNUNG: CSV evaluation failed: Mark invalid

i got a very simple message

Apr 20, 2016 5:21:58 PM org.openscoring.client.CsvEvaluator run
WARNUNG: CSV evaluation failed: Mark invalid

form the evaluation:

"java -cp $jpmml/target/client-executable-1.2-SNAPSHOT.jar org.openscoring.client.CsvEvaluator --model http://localhost:8080/openscoring/model/460012_p_aktiv --input ~/test.csv --output ~/test_output.csv

any Idea???

has it to do with my missing (jpmml- xgboost)

Generate DMatrix file

mpg.dmatrix = genDMatrix(mpg_y, mpg_X, "xgboost.svm")

part?? I realised that i dont need xgboost.svm in order to get the pmml file

i simple used

xgboost(param=param,
data = data.matrix(training[,feature.names]),
label=training$aktiv_target,
nrounds=trounds_tmp,
base_score = base,
missing=NA
)

so I used the implicit transform of the data from xgboost

Performance issues while running evaluator for multiple threads #1

My .pmml file contains ~900 input fields of type double.
I'm running an application which runs on a multi-threaded environment evaluating with 30 threads.
Since there's a method in org.jpmml.evaluator.TypeUtil Line 208 - return (Double.parseDouble(value) + 0d); it has one synchronized method which blocks 29 threads and affects the overall performance
Ref: http://dalelane.co.uk/blog/?p=2936
I did a workaround adding this class from
https://gist.github.com/dalelane/7720269
and calling
return (DoubleParser.parseDouble(value) + 0d);
on line 208 which solved the issued.

Suggest you to do the same if required.

MiningModelEvaluator multipleModelMethod with weightedAverage not support probability OutputField feature

my scenario is this , I train an random forest pmml file
I use multipleModelMethod=weightedAverage , and want to output label's multi class probability
the pmml file like this

then it throws an Exception

org.jpmml.evaluator.InvalidFeatureException: DataField

Hi,
I keep getting this exception for my LogisticRegression Model and LinearRegressionModel. This my xml. Please guide me as what is the problem.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2">
    <DataDictionary numberOfFields="9">
        <DataField name="Attribute0" optype="continuous" dataType="double"/>
        <DataField name="Attribute1" optype="continuous" dataType="double"/>
        <DataField name="Attribute2" optype="continuous" dataType="double"/>
        <DataField name="Attribute3" optype="continuous" dataType="double"/>
        <DataField name="Attribute4" optype="continuous" dataType="double"/>
        <DataField name="Attribute5" optype="continuous" dataType="double"/>
        <DataField name="Attribute6" optype="continuous" dataType="double"/>
        <DataField name="Attribute7" optype="continuous" dataType="double"/>
        <DataField name="Attribute8" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel functionName="classification" algorithmName="logisticRegression" normalizationMethod="logit">
        <MiningSchema>
            <MiningField name="Attribute0"/>
            <MiningField name="Attribute1"/>
            <MiningField name="Attribute2"/>
            <MiningField name="Attribute3"/>
            <MiningField name="Attribute4"/>
            <MiningField name="Attribute5"/>
            <MiningField name="Attribute6"/>
            <MiningField name="Attribute7"/>
            <MiningField name="Attribute8" usageType="target"/>
        </MiningSchema>
        <RegressionTable intercept="0.0" targetCategory="1"/>
        <RegressionTable intercept="-8.397856251858588" targetCategory="0">
            <NumericPredictor name="Attribute0" coefficient="0.1230185712966992"/>
            <NumericPredictor name="Attribute1" coefficient="0.03514316177407176"/>
            <NumericPredictor name="Attribute2" coefficient="-0.013282878621280676"/>
            <NumericPredictor name="Attribute3" coefficient="6.631624570875322E-4"/>
            <NumericPredictor name="Attribute4" coefficient="-0.0011962985482762522"/>
            <NumericPredictor name="Attribute5" coefficient="0.08961636497438935"/>
            <NumericPredictor name="Attribute6" coefficient="0.943894934066085"/>
            <NumericPredictor name="Attribute7" coefficient="0.014842809237409734"/>
        </RegressionTable>
    </RegressionModel>
</PMML>

This is .xml file for LinearRegression:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2">
    <DataDictionary numberOfFields="9">
        <DataField name="Attribute0" optype="continuous" dataType="double"/>
        <DataField name="Attribute1" optype="continuous" dataType="double"/>
        <DataField name="Attribute2" optype="continuous" dataType="double"/>
        <DataField name="Attribute3" optype="continuous" dataType="double"/>
        <DataField name="Attribute4" optype="continuous" dataType="double"/>
        <DataField name="Attribute5" optype="continuous" dataType="double"/>
        <DataField name="Attribute6" optype="continuous" dataType="double"/>
        <DataField name="Attribute7" optype="continuous" dataType="double"/>
        <DataField name="Attribute8" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel functionName="regression" algorithmName="LinearRegression" normalizationMethod="logit">
        <MiningSchema>
            <MiningField name="Attribute0"/>
            <MiningField name="Attribute1"/>
            <MiningField name="Attribute2"/>
            <MiningField name="Attribute3"/>
            <MiningField name="Attribute4"/>
            <MiningField name="Attribute5"/>
            <MiningField name="Attribute6"/>
            <MiningField name="Attribute7"/>
            <MiningField name="Attribute8" usageType="target"/>
        </MiningSchema>
        <RegressionTable intercept="-8.397856251858588" targetCategory="0">
            <NumericPredictor name="Attribute0" coefficient="0.1230185712966992"/>
            <NumericPredictor name="Attribute1" coefficient="0.03514316177407176"/>
            <NumericPredictor name="Attribute2" coefficient="-0.013282878621280676"/>
            <NumericPredictor name="Attribute3" coefficient="6.631624570875322E-4"/>
            <NumericPredictor name="Attribute4" coefficient="-0.0011962985482762522"/>
            <NumericPredictor name="Attribute5" coefficient="0.08961636497438935"/>
            <NumericPredictor name="Attribute6" coefficient="0.943894934066085"/>
            <NumericPredictor name="Attribute7" coefficient="0.014842809237409734"/>
        </RegressionTable>
    </RegressionModel>
</PMML>

Here is the stack trace
org.jpmml.evaluator.InvalidFeatureException: DataField
at org.jpmml.evaluator.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:119)
at org.jpmml.evaluator.RegressionModelEvaluator.evaluate(RegressionModelEvaluator.java:69)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:406)
at com.norkorm.blake.pmml.LogisticRegressionPMMLTest.makePredictions(LogisticRegressionPMMLTest.java:250)
at com.norkorm.blake.pmml.LogisticRegressionPMMLTest.testLogisticPMML(LogisticRegressionPMMLTest.java:217)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:131)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

jpmml / jpmml-evaluator Goto Github PK

jpmml-evaluator's Introduction

JPMML-Evaluator

Table of Contents

Features

Prerequisites

Installation

API

Basic usage

Advanced usage

Loading models

Thread safety

Querying the "data schema" of models

Evaluating models

Example applications

Documentation

Support

License

Additional information

jpmml-evaluator's People

Contributors

Stargazers

Watchers

Forkers

jpmml-evaluator's Issues

Generate DMatrix file

Recommend Projects

Recommend Topics

Recommend Org