Coder Social home page Coder Social logo

jpmml-evaluator-spark's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

  • Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
    • Class hierarchy.
    • Schema version annotations.
  • Fluent API:
    • Value constructors.
  • SAX Locator information
  • [Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
    • Validation agents.
    • Optimization and transformation agents.

Evaluation engine

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}
Example applications

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-evaluator-spark's People

Contributors

vruusmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jpmml-evaluator-spark's Issues

How to support with spark 2.2 ?

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.CreateStruct.(Lscala/collection/Seq;)V
at org.jpmml.evaluator.spark.PMMLTransformer.transform(PMMLTransformer.java:151)
at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:305)
at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:305)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
at org.jpmml.spark.SVMEvaluationSparkExample.main(SVMEvaluationSparkExample.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

UnmarshalException with Spark 2.0.1

I ran into UnmarshalException with Spark 2.0.1 when I was trying to run the readPMML function (had the same issue with 1.6.1, but I upgraded the Spark version)

javax.xml.bind.UnmarshalException: unexpected element (uri:"http://www.dmg.org/PMML-4_3", local:"PMML"). Expected elements are <{http://www.dmg.org/PMML-4_2}ARIMA>,<{http://www.dmg.org/PMML-4_2}Aggregate>,<{http://www.dmg.org/PMML-4_2}Alternate>,<{http://www.dmg.org/PMML-4_2}Annotation>,<{http://www.dmg.org/PMML-4_2}Anova>,<{http://www.dmg.org/PMML-4_2}AnovaRow>,<{http://www.dmg.org/PMML-4_2}AntecedentSequence>,<{http://www.dmg.org/PMML-4_2}AnyDistribution>,<{http://www.dmg.org/PMML-4_2}Application>,<{http://www.dmg.org/PMML-4_2}Apply>,<{http://www.dmg.org/PMML-4_2}Array>,<{http://www.dmg.org/PMML-4_2}AssociationModel>,<{http://www.dmg.org/PMML-4_2}AssociationRule>,<{http://www.dmg.org/PMML-4_2}Attribute>,<{http://www.dmg.org/PMML-4_2}BaseCumHazardTables>,<{http://www.dmg.org/PMML-4_2}Baseline>,<{http://www.dmg.org/PMML-4_2}BaselineCell>,<{http://www.dmg.org/PMML-4_2}BaselineModel>,<{http://www.dmg.org/PMML-4_2}BaselineStratum>,<{http://www.dmg.org/PMML-4_2}BayesInput>,<{http://www.dmg.org/PMML-4_2}BayesInputs>,<{http://www.dmg.org/PMML-4_2}BayesOutput>,<{http://www.dmg.org/PMML-4_2}BoundaryValueMeans>,<{http://www.dmg.org/PMML-4_2}BoundaryValues>,<{http://www.dmg.org/PMML-4_2}CategoricalPredictor>,<{http://www.dmg.org/PMML-4_2}Categories>,<{http://www.dmg.org/PMML-4_2}Category>,<{http://www.dmg.org/PMML-4_2}CenterFields>,<{http://www.dmg.org/PMML-4_2}Characteristic>,<{http://www.dmg.org/PMML-4_2}Characteristics>,<{http://www.dmg.org/PMML-4_2}ChildParent>,<{http://www.dmg.org/PMML-4_2}ClassLabels>,<{http://www.dmg.org/PMML-4_2}Cluster>,<{http://www.dmg.org/PMML-4_2}ClusteringField>,<{http://www.dmg.org/PMML-4_2}ClusteringModel>,<{http://www.dmg.org/PMML-4_2}ClusteringModelQuality>,<{http://www.dmg.org/PMML-4_2}Coefficient>,<{http://www.dmg.org/PMML-4_2}Coefficients>,<{http://www.dmg.org/PMML-4_2}ComparisonMeasure>,<{http://www.dmg.org/PMML-4_2}Comparisons>,<{http://www.dmg.org/PMML-4_2}ComplexPartialScore>,<{http://www.dmg.org/PMML-4_2}CompoundPredicate>,<{http://www.dmg.org/PMML-4_2}CompoundRule>,<{http://www.dmg.org/PMML-4_2}Con>,<{http://www.dmg.org/PMML-4_2}ConfusionMatrix>,<{http://www.dmg.org/PMML-4_2}ConsequentSequence>,<{http://www.dmg.org/PMML-4_2}Constant>,<{http://www.dmg.org/PMML-4_2}Constraints>,<{http://www.dmg.org/PMML-4_2}ContStats>,<{http://www.dmg.org/PMML-4_2}CorrelationFields>,<{http://www.dmg.org/PMML-4_2}CorrelationMethods>,<{http://www.dmg.org/PMML-4_2}CorrelationValues>,<{http://www.dmg.org/PMML-4_2}Correlations>,<{http://www.dmg.org/PMML-4_2}CountTable>,<{http://www.dmg.org/PMML-4_2}Counts>,<{http://www.dmg.org/PMML-4_2}Covariances>,<{http://www.dmg.org/PMML-4_2}CovariateList>,<{http://www.dmg.org/PMML-4_2}DataDictionary>,<{http://www.dmg.org/PMML-4_2}DataField>,<{http://www.dmg.org/PMML-4_2}Decision>,<{http://www.dmg.org/PMML-4_2}DecisionTree>,<{http://www.dmg.org/PMML-4_2}Decisions>,<{http://www.dmg.org/PMML-4_2}DefineFunction>,<{http://www.dmg.org/PMML-4_2}Delimiter>,<{http://www.dmg.org/PMML-4_2}DerivedField>,<{http://www.dmg.org/PMML-4_2}DiscrStats>,<{http://www.dmg.org/PMML-4_2}Discretize>,<{http://www.dmg.org/PMML-4_2}DiscretizeBin>,<{http://www.dmg.org/PMML-4_2}DocumentTermMatrix>,<{http://www.dmg.org/PMML-4_2}EventValues>,<{http://www.dmg.org/PMML-4_2}ExponentialSmoothing>,<{http://www.dmg.org/PMML-4_2}Extension>,<{http://www.dmg.org/PMML-4_2}FactorList>,<{http://www.dmg.org/PMML-4_2}False>,<{http://www.dmg.org/PMML-4_2}FieldColumnPair>,<{http://www.dmg.org/PMML-4_2}FieldRef>,<{http://www.dmg.org/PMML-4_2}FieldValue>,<{http://www.dmg.org/PMML-4_2}FieldValueCount>,<{http://www.dmg.org/PMML-4_2}GaussianDistribution>,<{http://www.dmg.org/PMML-4_2}GeneralRegressionModel>,<{http://www.dmg.org/PMML-4_2}Header>,<{http://www.dmg.org/PMML-4_2}INT-Entries>,<{http://www.dmg.org/PMML-4_2}INT-SparseArray>,<{http://www.dmg.org/PMML-4_2}Indices>,<{http://www.dmg.org/PMML-4_2}InlineTable>,<{http://www.dmg.org/PMML-4_2}InstanceField>,<{http://www.dmg.org/PMML-4_2}InstanceFields>,<{http://www.dmg.org/PMML-4_2}Interval>,<{http://www.dmg.org/PMML-4_2}Item>,<{http://www.dmg.org/PMML-4_2}ItemRef>,<{http://www.dmg.org/PMML-4_2}Itemset>,<{http://www.dmg.org/PMML-4_2}KNNInput>,<{http://www.dmg.org/PMML-4_2}KNNInputs>,<{http://www.dmg.org/PMML-4_2}KohonenMap>,<{http://www.dmg.org/PMML-4_2}Level>,<{http://www.dmg.org/PMML-4_2}LiftData>,<{http://www.dmg.org/PMML-4_2}LiftGraph>,<{http://www.dmg.org/PMML-4_2}LinearNorm>,<{http://www.dmg.org/PMML-4_2}LocalTransformations>,<{http://www.dmg.org/PMML-4_2}MapValues>,<{http://www.dmg.org/PMML-4_2}MatCell>,<{http://www.dmg.org/PMML-4_2}Matrix>,<{http://www.dmg.org/PMML-4_2}MiningBuildTask>,<{http://www.dmg.org/PMML-4_2}MiningField>,<{http://www.dmg.org/PMML-4_2}MiningModel>,<{http://www.dmg.org/PMML-4_2}MiningSchema>,<{http://www.dmg.org/PMML-4_2}MissingValueWeights>,<{http://www.dmg.org/PMML-4_2}ModelExplanation>,<{http://www.dmg.org/PMML-4_2}ModelLiftGraph>,<{http://www.dmg.org/PMML-4_2}ModelStats>,<{http://www.dmg.org/PMML-4_2}ModelVerification>,<{http://www.dmg.org/PMML-4_2}MultivariateStat>,<{http://www.dmg.org/PMML-4_2}MultivariateStats>,<{http://www.dmg.org/PMML-4_2}NaiveBayesModel>,<{http://www.dmg.org/PMML-4_2}NearestNeighborModel>,<{http://www.dmg.org/PMML-4_2}NeuralInput>,<{http://www.dmg.org/PMML-4_2}NeuralInputs>,<{http://www.dmg.org/PMML-4_2}NeuralLayer>,<{http://www.dmg.org/PMML-4_2}NeuralNetwork>,<{http://www.dmg.org/PMML-4_2}NeuralOutput>,<{http://www.dmg.org/PMML-4_2}NeuralOutputs>,<{http://www.dmg.org/PMML-4_2}Neuron>,<{http://www.dmg.org/PMML-4_2}Node>,<{http://www.dmg.org/PMML-4_2}NormContinuous>,<{http://www.dmg.org/PMML-4_2}NormDiscrete>,<{http://www.dmg.org/PMML-4_2}NormalizedCountTable>,<{http://www.dmg.org/PMML-4_2}NumericInfo>,<{http://www.dmg.org/PMML-4_2}NumericPredictor>,<{http://www.dmg.org/PMML-4_2}OptimumLiftGraph>,<{http://www.dmg.org/PMML-4_2}Output>,<{http://www.dmg.org/PMML-4_2}OutputField>,<{http://www.dmg.org/PMML-4_2}PCell>,<{http://www.dmg.org/PMML-4_2}PCovCell>,<{http://www.dmg.org/PMML-4_2}PCovMatrix>,<{http://www.dmg.org/PMML-4_2}PMML>,<{http://www.dmg.org/PMML-4_2}PPCell>,<{http://www.dmg.org/PMML-4_2}PPMatrix>,<{http://www.dmg.org/PMML-4_2}PairCounts>,<{http://www.dmg.org/PMML-4_2}ParamMatrix>,<{http://www.dmg.org/PMML-4_2}Parameter>,<{http://www.dmg.org/PMML-4_2}ParameterField>,<{http://www.dmg.org/PMML-4_2}ParameterList>,<{http://www.dmg.org/PMML-4_2}Partition>,<{http://www.dmg.org/PMML-4_2}PartitionFieldStats>,<{http://www.dmg.org/PMML-4_2}PoissonDistribution>,<{http://www.dmg.org/PMML-4_2}PredictiveModelQuality>,<{http://www.dmg.org/PMML-4_2}Predictor>,<{http://www.dmg.org/PMML-4_2}PredictorTerm>,<{http://www.dmg.org/PMML-4_2}Quantile>,<{http://www.dmg.org/PMML-4_2}REAL-Entries>,<{http://www.dmg.org/PMML-4_2}REAL-SparseArray>,<{http://www.dmg.org/PMML-4_2}ROC>,<{http://www.dmg.org/PMML-4_2}ROCGraph>,<{http://www.dmg.org/PMML-4_2}RandomLiftGraph>,<{http://www.dmg.org/PMML-4_2}Regression>,<{http://www.dmg.org/PMML-4_2}RegressionModel>,<{http://www.dmg.org/PMML-4_2}RegressionTable>,<{http://www.dmg.org/PMML-4_2}ResultField>,<{http://www.dmg.org/PMML-4_2}RuleSelectionMethod>,<{http://www.dmg.org/PMML-4_2}RuleSet>,<{http://www.dmg.org/PMML-4_2}RuleSetModel>,<{http://www.dmg.org/PMML-4_2}ScoreDistribution>,<{http://www.dmg.org/PMML-4_2}Scorecard>,<{http://www.dmg.org/PMML-4_2}SeasonalTrendDecomposition>,<{http://www.dmg.org/PMML-4_2}Seasonality_ExpoSmooth>,<{http://www.dmg.org/PMML-4_2}Segment>,<{http://www.dmg.org/PMML-4_2}Segmentation>,<{http://www.dmg.org/PMML-4_2}Sequence>,<{http://www.dmg.org/PMML-4_2}SequenceModel>,<{http://www.dmg.org/PMML-4_2}SequenceReference>,<{http://www.dmg.org/PMML-4_2}SequenceRule>,<{http://www.dmg.org/PMML-4_2}SetPredicate>,<{http://www.dmg.org/PMML-4_2}SetReference>,<{http://www.dmg.org/PMML-4_2}SimplePredicate>,<{http://www.dmg.org/PMML-4_2}SimpleRule>,<{http://www.dmg.org/PMML-4_2}SimpleSetPredicate>,<{http://www.dmg.org/PMML-4_2}SpectralAnalysis>,<{http://www.dmg.org/PMML-4_2}SupportVector>,<{http://www.dmg.org/PMML-4_2}SupportVectorMachine>,<{http://www.dmg.org/PMML-4_2}SupportVectorMachineModel>,<{http://www.dmg.org/PMML-4_2}SupportVectors>,<{http://www.dmg.org/PMML-4_2}TableLocator>,<{http://www.dmg.org/PMML-4_2}Target>,<{http://www.dmg.org/PMML-4_2}TargetValue>,<{http://www.dmg.org/PMML-4_2}TargetValueCount>,<{http://www.dmg.org/PMML-4_2}TargetValueCounts>,<{http://www.dmg.org/PMML-4_2}TargetValueStat>,<{http://www.dmg.org/PMML-4_2}TargetValueStats>,<{http://www.dmg.org/PMML-4_2}Targets>,<{http://www.dmg.org/PMML-4_2}Taxonomy>,<{http://www.dmg.org/PMML-4_2}TestDistributions>,<{http://www.dmg.org/PMML-4_2}TextCorpus>,<{http://www.dmg.org/PMML-4_2}TextDictionary>,<{http://www.dmg.org/PMML-4_2}TextDocument>,<{http://www.dmg.org/PMML-4_2}TextIndex>,<{http://www.dmg.org/PMML-4_2}TextIndexNormalization>,<{http://www.dmg.org/PMML-4_2}TextModel>,<{http://www.dmg.org/PMML-4_2}TextModelNormalization>,<{http://www.dmg.org/PMML-4_2}TextModelSimiliarity>,<{http://www.dmg.org/PMML-4_2}Time>,<{http://www.dmg.org/PMML-4_2}TimeAnchor>,<{http://www.dmg.org/PMML-4_2}TimeCycle>,<{http://www.dmg.org/PMML-4_2}TimeException>,<{http://www.dmg.org/PMML-4_2}TimeSeries>,<{http://www.dmg.org/PMML-4_2}TimeSeriesModel>,<{http://www.dmg.org/PMML-4_2}TimeValue>,<{http://www.dmg.org/PMML-4_2}Timestamp>,<{http://www.dmg.org/PMML-4_2}TrainingInstances>,<{http://www.dmg.org/PMML-4_2}TransformationDictionary>,<{http://www.dmg.org/PMML-4_2}TreeModel>,<{http://www.dmg.org/PMML-4_2}True>,<{http://www.dmg.org/PMML-4_2}UniformDistribution>,<{http://www.dmg.org/PMML-4_2}UnivariateStats>,<{http://www.dmg.org/PMML-4_2}Value>,<{http://www.dmg.org/PMML-4_2}VectorDictionary>,<{http://www.dmg.org/PMML-4_2}VectorFields>,<{http://www.dmg.org/PMML-4_2}VectorInstance>,<{http://www.dmg.org/PMML-4_2}VerificationField>,<{http://www.dmg.org/PMML-4_2}VerificationFields>,<{http://www.dmg.org/PMML-4_2}XCoordinates>,<{http://www.dmg.org/PMML-4_2}YCoordinates>,<{http://www.dmg.org/PMML-4_2}binarySimilarity>,<{http://www.dmg.org/PMML-4_2}chebychev>,<{http://www.dmg.org/PMML-4_2}cityBlock>,<{http://www.dmg.org/PMML-4_2}euclidean>,<{http://www.dmg.org/PMML-4_2}jaccard>,<{http://www.dmg.org/PMML-4_2}minkowski>,<{http://www.dmg.org/PMML-4_2}row>,<{http://www.dmg.org/PMML-4_2}simpleMatching>,<{http://www.dmg.org/PMML-4_2}squaredEuclidean>,<{http://www.dmg.org/PMML-4_2}tanimoto>
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:647)
	at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:258)
	at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:253)
	at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:120)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1052)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:483)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:464)
	at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:152)
	at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
	at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
	at org.jpmml.model.filters.PMMLFilter.startElement(PMMLFilter.java:69)
	at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
	at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:216)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:189)
	at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:140)
	at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:123)
	at org.jpmml.model.JAXBUtil.unmarshal(JAXBUtil.java:78)
	at org.jpmml.model.JAXBUtil.unmarshalPMML(JAXBUtil.java:64)
	at org.jpmml.model.PMMLUtil.unmarshal(PMMLUtil.java:31)
	at ConfidencePredictor.readPMML(ConfidencePredictor.java:187)
       ....

In my pom file, I wrote the dependency

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>jpmml-evaluator-spark</artifactId>
	<version>1.2.0</version>
</dependency>

which is also used with the spark dependency:

<dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.11</artifactId>
          <version>2.0.1</version>
</dependency>

I know that Spark has different version of PMML which is in conflict with the newer versions. But, isn't this project for solving that issue too? Is this some sort of an issue or maybe I am doing something wrong?

mvn package project failed

I change the version of pmml-evaluator of pmml-spark/pom.xml. And maven package successfully.

                <dependency>
                        <groupId>org.jpmml</groupId>
                        <artifactId>pmml-evaluator</artifactId>
-                       <version>1.2.12</version>
+                       <version>1.2.15</version>
                </dependency>

Memory issues

Hello,
I've been using the the spark evaluator with a few PMMLs and datasets, but now I'm using a 250MB PMML and a dataset with 129 columns and about 138k records but I very quickly get an OutOfMemoryError.
I reduced the dataset to 500 records (divided to 5 partitions) and had about 2GB on every executor and driver.
Had about 5-6 executors and still had an OOM.

During the process I dumped the heap to a file and from analyzing it, seems that org.jpmml.evaluator.mining.MiningModelEvaluator occupied 400MB, and there were 5 Threads that occupied about 250 MB each.
Does that make sense?

I didn't attach the dump because it's around 2GB but whatever you'll need I'll give it.

Thanks in advance :)

Can I use Scala to load PMML model to complete prediction?

I built and saved the PMML model with Python, but I didn't know how to invoke it in scala and load it.

If possible, what functions should be used to read and call the PMML model

For example, I can use it in scala

val model = PipelineModel. load(”loadpath“)

Then use

model. transform(data)

To load the model

How do I write code if I want to load the PMML model

In readme It seems that the code in MD can only run in Java

Row-oriented exception handling

At the moment, if a single data record fails to evaluate, then the whole job fails.

There should be a mechanism to "annotate" predictions with success/failure markers, possibly capturing the exception type/message.

java.lang.NoClassDefFoundError: org.jpmml.model.ImportFilter

When trying to use org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator() I get:

java.lang.NoClassDefFoundError: org/jpmml/model/ImportFilter
	at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:53)
	at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:47)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

Seems like EvaluatorUtil.java#L32 is importing org.jpmml.model.ImportFilter but it should be importing org.jpmml.model.filters.ImportFilter

The change occured in jpmml/jpmml-model@d4b7688 on Feb 10.

Replace `java.util.List<E>` parameters with `E[]` parameters in method signatures

This library is developed using the Java language, but the majority of users are working with it using the Scala language.

Method signatures should be "designed" in a way that would eliminate the need for Scala-to-Java type casts and/or conversions. For example, the List type is different/incompatible between these two languages, whereas the array type is the same.

Resolving an application classpath conflict

i use scala write a LR model and start with stringindexer model, because i want use orgin data(including string feature and dpuble feature) to predict .

scala 2.11.8
spark 2.3.3
jmmp-spark 1.4.18
pmml-model 1.5.12
jpmml-evaluator-spark 1.2.2

List inputFields = model.getInputFields();
System.out.println(inputFields);

result:

[InputField{name=Sex, displayName=null, dataType=string, opType=categorical}, InputField{name=Pclass, displayName=null, dataType=double, opType=continuous}, InputField{name=Age, displayName=null, dataType=double, opType=continuous}, InputField{name=SibSp, displayName=null, dataType=double, opType=continuous}, InputField{name=Parch, displayName=null, dataType=double, opType=continuous}, InputField{name=Fare, displayName=null, dataType=double, opType=continuous}]

System.out.println("@@@@ "+rawValue);
FieldValue inputFieldValue = inputField.prepare(rawValue);

result:

@@@@   female
Exception in thread "main" java.lang.NoSuchMethodError: org.dmg.pmml.Value.getValue()Ljava/lang/String;
	at org.jpmml.evaluator.InputFieldUtil.getStatus(InputFieldUtil.java:309)
	at org.jpmml.evaluator.InputFieldUtil.getStatus(InputFieldUtil.java:269)
	at org.jpmml.evaluator.InputFieldUtil.prepareScalarInputValue(InputFieldUtil.java:133)
	at org.jpmml.evaluator.InputFieldUtil.prepareInputValue(InputFieldUtil.java:112)

Thanks in advance for helping.

when to support for spark 2.1.x

Hi, vruusmann.
i meet a problem when i use jpmml-evaluator-spark with spark 2.1.1.

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.CreateStruct.(Lscala/collection/Seq;)V
at org.jpmml.evaluator.spark.PMMLTransformer.transform(PMMLTransformer.java:151)
at com.michaelxu.spark.TestPipeLine.testJPMML(TestPipeLine.scala:312)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

the version of jpmml-evaluator-spark i used is 1.1-SNAPSHOT ,it support version of spark is 2.0.0 to 2.0.2.
the CreateStruct defined in 2.0.2 is a case class

 /**
 * Returns a Row containing the evaluation of all children expressions.
 */
@ExpressionDescription(
  usage = "_FUNC_(col1, col2, col3, ...) - Creates a struct with the given field values.")
case class CreateStruct(children: Seq[Expression]) extends Expression {

  override def foldable: Boolean = children.forall(_.foldable)

  override lazy val dataType: StructType = {
    val fields = children.zipWithIndex.map { case (child, idx) =>
      child match {
        case ne: NamedExpression =>
          StructField(ne.name, ne.dataType, ne.nullable, ne.metadata)
        case _ =>
          StructField(s"col${idx + 1}", child.dataType, child.nullable, Metadata.empty)
      }
    }
    StructType(fields)
  }
......

but in spark 2.1.1, the CreateStruct defined as a object

/**
 * Returns a Row containing the evaluation of all children expressions.
 */
object CreateStruct extends FunctionBuilder {
  def apply(children: Seq[Expression]): CreateNamedStruct = {
    CreateNamedStruct(children.zipWithIndex.flatMap {
      case (e: NamedExpression, _) if e.resolved => Seq(Literal(e.name), e)
      case (e: NamedExpression, _) => Seq(NamePlaceholder, e)
      case (e, index) => Seq(Literal(s"col${index + 1}"), e)
    })
  }

  /**
   * Entry to use in the function registry.
   */
  val registryEntry: (String, (ExpressionInfo, FunctionBuilder)) = {
    val info: ExpressionInfo = new ExpressionInfo(
      "org.apache.spark.sql.catalyst.expressions.NamedStruct",
      null,
      "struct",
      "_FUNC_(col1, col2, col3, ...) - Creates a struct with the given field values.",
      "")
    ("struct", (info, this))
  }
}

i'm not sure how to change the code in PMMLTransformer,can you give me some suggestion?
thank you.

Expression evaluateExpression = new ScalaUDF(evaluatorFunction, getOutputSchema(), ScalaUtil.<Expression>singletonSeq(new CreateStruct(ScalaUtil.<Expression>toSeq(activeExpressions))), ScalaUtil.<DataType>emptySeq());

org.jpmml.evaluator.UnsupportedFeatureException (at or around line 30): GeneralRegressionModel

Hi,

We are trying to evaluate a PMML model in spark which has a general regression (glm) algorithm. However, when implementing as documented in the readme we are unable to evaluate the model against the dataframe. We are able to run and evaluate the PMML with the JPMML-evaluator-example but fails with the jpmml-spark-evaluator.

This works so assuming pmml syntax is correct:

java -cp example-1.3-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model sample_pmml.pmml --input PMML_Dummy_Data.csv --output output.csv

GLM line which gets called out in stderr:
<GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="classification" algorithmName="glm" distribution="binomial" linkFunction="logit">

The following fails when submitted as a spark job

spark-submit --master yarn --deploy-mode cluster --class com.mapr.examples.SparkSQLHiveContextExample SparkSQLExample-1.0-SNAPSHOT-jar-with-dependencies.jar

CODE:


package com.mapr.examples;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Method;
import java.util.List;
import org.apache.commons.io.IOUtils;
import org.apache.spark.ml.Transformer;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SparkSession;
import org.jpmml.evaluator.Evaluator;
import org.jpmml.evaluator.InputField;
import org.jpmml.evaluator.ModelEvaluatorFactory;
import org.jpmml.evaluator.spark.EvaluatorUtil;
import org.jpmml.evaluator.spark.TransformerBuilder;

public class SparkSQLHiveContextExample {

	public static void main(String[] args) throws Exception {

		String warehouseLocation = "spark-warehouse";

		SparkSession spark = SparkSession.builder().appName("PMML").config("spark.sql.warehouse.dir", warehouseLocation)
				.enableHiveSupport().getOrCreate();

		File pmmlFile = new File("some_pmml.pmml");
		ClassLoader loader = Thread.currentThread().getContextClassLoader();
		Evaluator evaluator = EvaluatorUtil.createEvaluator(pmmlFile);
		List<InputField> inputFields = evaluator.getInputFields();
		TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
				.withTargetCols()
				.withOutputCols()
				.exploded(false);

		Transformer pmmlTransformer = pmmlTransformerBuilder.build();

		Dataset<org.apache.spark.sql.Row> sqlDF = spark.sql(
				"SELECT pmml_col_1, pmml_col_2, pmml_col_3, pmml_col_4, pmml_col_5 from pmml_sample");

		Dataset<org.apache.spark.sql.Row> output = pmmlTransformer.transform(sqlDF);

		output.createOrReplaceTempView("pmml_output");
	}


}

STDERR trace:

Log Type: stderr
Log Upload Time: Fri Jun 23 22:20:26 +0000 2017
Log Length: 6574
Showing 4096 bytes of 6574 total. Click here for the full log.
ewModelEvaluator(ModelEvaluatorFactory.java:126)
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:66)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:63)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:47)
at com.mapr.examples.SparkSQLHiveContextExample.main(SparkSQLHiveContextExample.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
17/06/23 22:20:25 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.jpmml.evaluator.UnsupportedFeatureException (at or around line 30): GeneralRegressionModel)
17/06/23 22:20:25 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.jpmml.evaluator.UnsupportedFeatureException (at or around line 30): GeneralRegressionModel
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:126)
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:66)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:63)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:47)
at com.mapr.examples.SparkSQLHiveContextExample.main(SparkSQLHiveContextExample.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
17/06/23 22:20:25 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.jpmml.evaluator.UnsupportedFeatureException (at or around line 30): GeneralRegressionModel)

Exception in thread "main" org.jpmml.evaluator.UnsupportedFeatureException (at or around line 15): RegressionModel

can run via idea main ,but error via spark-submit .
spark 1.6 ,use branch 1.x
xml:

     <DataDictionary numberOfFields="4">
        <DataField name="field_0" optype="continuous" dataType="double"/>
        <DataField name="field_1" optype="continuous" dataType="double"/>
        <DataField name="field_2" optype="continuous" dataType="double"/>
        <DataField name="target" optype="categorical" dataType="string"/>
    </DataDictionary>
    <RegressionModel modelName="linear SVM" functionName="classification" normalizationMethod="none">...

error is :

Exception in thread "main" org.jpmml.evaluator.UnsupportedFeatureException (at or around line 15): RegressionModel
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:126)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:66)
        at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:63)
        at org.jpmml.spark.SVMEvaluationSparkExample.main(SVMEvaluationSparkExample.java:50)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Problem with parsing gbdt models

I tried to construct gbdt model in spark with a pmml file which describes a gbdt model, and I got the following error:

"caused by:org.shaded.jpmml.evaluator.MissingFieldException: Field 'decisionFunction(1.0)' is not defined". What are the possible things I need to check?

Thanks!

Spark SQL Analysis Exception cannot resolve column name "Sepal.Length"

Hi folks,

I was trying to deploy the neural network Iris pmml model by using spark and i am facing issue Sparksql analysis exception cannot resolve column name "Sepal.Length" and for more info i am attaching the screenshot of the error can you please help me in figuring out this ......

neuralnetwork_issue

Neural Network iris model i used

<?xml version="1.0" encoding="UTF-8"?>
<PMML version="4.0" xmlns="http://www.dmg.org/PMML-4_0">
  <Header copyright="morent">
    <Application name="KNIME" version="2.4.1"/>
  </Header>
  <DataDictionary numberOfFields="5">
    <DataField name="sepal_length" optype="continuous" dataType="double">
      <Interval closure="closedClosed" leftMargin="4.3" rightMargin="7.9"/>
    </DataField>
    <DataField name="sepal_width" optype="continuous" dataType="double">
      <Interval closure="closedClosed" leftMargin="2.0" rightMargin="4.4"/>
    </DataField>
    <DataField name="petal_length" optype="continuous" dataType="double">
      <Interval closure="closedClosed" leftMargin="1.0" rightMargin="6.9"/>
    </DataField>
    <DataField name="petal_width" optype="continuous" dataType="double">
      <Interval closure="closedClosed" leftMargin="0.1" rightMargin="2.5"/>
    </DataField>
    <DataField name="class" optype="categorical" dataType="string">
      <Value value="Iris-setosa"/>
      <Value value="Iris-versicolor"/>
      <Value value="Iris-virginica"/>
    </DataField>
  </DataDictionary>
  <NeuralNetwork functionName="classification" algorithmName="RProp" activationFunction="logistic" normalizationMethod="none" width="0.0" numberOfLayers="2">
    <MiningSchema>
      <MiningField name="sepal_length" invalidValueTreatment="asIs"/>
      <MiningField name="sepal_width" invalidValueTreatment="asIs"/>
      <MiningField name="petal_length" invalidValueTreatment="asIs"/>
      <MiningField name="petal_width" invalidValueTreatment="asIs"/>
      <MiningField name="class" invalidValueTreatment="asIs" usageType="predicted"/>
    </MiningSchema>
    <NeuralInputs numberOfInputs="4">
      <NeuralInput id="0,0">
        <DerivedField optype="continuous" dataType="double">
          <FieldRef field="sepal_length"/>
        </DerivedField>
      </NeuralInput>
      <NeuralInput id="0,1">
        <DerivedField optype="continuous" dataType="double">
          <FieldRef field="sepal_width"/>
        </DerivedField>
      </NeuralInput>
      <NeuralInput id="0,2">
        <DerivedField optype="continuous" dataType="double">
          <FieldRef field="petal_length"/>
        </DerivedField>
      </NeuralInput>
      <NeuralInput id="0,3">
        <DerivedField optype="continuous" dataType="double">
          <FieldRef field="petal_width"/>
        </DerivedField>
      </NeuralInput>
    </NeuralInputs>
    <NeuralLayer>
      <Neuron id="1,0" bias="-0.9441846137121959">
        <Con from="0,0" weight="-0.04486308459743835"/>
        <Con from="0,1" weight="-1.007767488246452"/>
        <Con from="0,2" weight="0.4116250900429575"/>
        <Con from="0,3" weight="1.209150565102718"/>
      </Neuron>
      <Neuron id="1,1" bias="-1.5806031030377694">
        <Con from="0,0" weight="0.33364337105537917"/>
        <Con from="0,1" weight="2.454247860268515"/>
        <Con from="0,2" weight="-0.15509338500436265"/>
        <Con from="0,3" weight="-1.8389081867014756"/>
      </Neuron>
      <Neuron id="1,2" bias="-2.7781887628433255">
        <Con from="0,0" weight="0.6896588815822495"/>
        <Con from="0,1" weight="0.4155487213880096"/>
        <Con from="0,2" weight="-0.4693479739569995"/>
        <Con from="0,3" weight="0.024472433148318866"/>
      </Neuron>
      <Neuron id="1,3" bias="-1.8804828048612006">
        <Con from="0,0" weight="-0.046367292285697366"/>
        <Con from="0,1" weight="0.15139622470906453"/>
        <Con from="0,2" weight="0.15126787451340934"/>
        <Con from="0,3" weight="0.5798317894556151"/>
      </Neuron>
      <Neuron id="1,4" bias="-1.0479686251266709">
        <Con from="0,0" weight="0.3022349918305707"/>
        <Con from="0,1" weight="0.5696576400734125"/>
        <Con from="0,2" weight="-1.0984794716289736"/>
        <Con from="0,3" weight="-0.27270023464908766"/>
      </Neuron>
      <Neuron id="1,5" bias="-0.3025179949964037">
        <Con from="0,0" weight="-0.22289067665818585"/>
        <Con from="0,1" weight="-0.5029455541094195"/>
        <Con from="0,2" weight="0.6748013255475972"/>
        <Con from="0,3" weight="0.2769427141691832"/>
      </Neuron>
      <Neuron id="1,6" bias="0.5185804721896162">
        <Con from="0,0" weight="-0.06630369718678882"/>
        <Con from="0,1" weight="0.9366878508504322"/>
        <Con from="0,2" weight="-2.0505238545746756"/>
        <Con from="0,3" weight="-1.4180466765594852"/>
      </Neuron>
      <Neuron id="1,7" bias="-0.14392081477403149">
        <Con from="0,0" weight="0.47767196169403736"/>
        <Con from="0,1" weight="0.9231599358415787"/>
        <Con from="0,2" weight="-1.521293106276155"/>
        <Con from="0,3" weight="-0.9151590899627176"/>
      </Neuron>
      <Neuron id="1,8" bias="-1.1991037918096945">
        <Con from="0,0" weight="-0.0368452580731863"/>
        <Con from="0,1" weight="-0.6438633470456006"/>
        <Con from="0,2" weight="0.09328089052662898"/>
        <Con from="0,3" weight="1.6829695286970527"/>
      </Neuron>
      <Neuron id="1,9" bias="-0.07098521306546789">
        <Con from="0,0" weight="1.1130349585302026"/>
        <Con from="0,1" weight="-0.37623467152890655"/>
        <Con from="0,2" weight="-1.0691145207939827"/>
        <Con from="0,3" weight="-0.6233268000787716"/>
      </Neuron>
    </NeuralLayer>
    <NeuralLayer>
      <Neuron id="2,0" bias="-0.5982891836398166">
        <Con from="1,0" weight="-131.91572569969986"/>
        <Con from="1,1" weight="-2.0803811849944296"/>
        <Con from="1,2" weight="-0.6084039564897556"/>
        <Con from="1,3" weight="-1.066354849840332"/>
        <Con from="1,4" weight="1.7727568679658863"/>
        <Con from="1,5" weight="-8.098634498899418"/>
        <Con from="1,6" weight="19.128469000024445"/>
        <Con from="1,7" weight="47.99808211407523"/>
        <Con from="1,8" weight="-48.02276063336108"/>
        <Con from="1,9" weight="0.8376316854289872"/>
      </Neuron>
      <Neuron id="2,1" bias="0.6887290187921082">
        <Con from="1,0" weight="-16.591716865586278"/>
        <Con from="1,1" weight="0.7787222775775234"/>
        <Con from="1,2" weight="-0.21185970643262134"/>
        <Con from="1,3" weight="-0.5385365682832596"/>
        <Con from="1,4" weight="0.1494344843029574"/>
        <Con from="1,5" weight="-0.2908246414912338"/>
        <Con from="1,6" weight="-209.54696149112175"/>
        <Con from="1,7" weight="41.03251658642203"/>
        <Con from="1,8" weight="-4.481423572323473"/>
        <Con from="1,9" weight="20.57673314463075"/>
      </Neuron>
      <Neuron id="2,2" bias="-0.33370153632947136">
        <Con from="1,0" weight="3.284832540721896"/>
        <Con from="1,1" weight="1.2279556829167277"/>
        <Con from="1,2" weight="0.7602122721765426"/>
        <Con from="1,3" weight="1.1997577238312889"/>
        <Con from="1,4" weight="-11.325671536250567"/>
        <Con from="1,5" weight="1.483961591977191"/>
        <Con from="1,6" weight="-1821.8108555592457"/>
        <Con from="1,7" weight="-148.9767379958393"/>
        <Con from="1,8" weight="6.619929393199292"/>
        <Con from="1,9" weight="-8.681838533973998"/>
      </Neuron>
    </NeuralLayer>
    <NeuralOutputs numberOfOutputs="3">
      <NeuralOutput outputNeuron="2,0">
        <DerivedField optype="categorical" dataType="string">
          <NormDiscrete field="class" value="Iris-setosa"/>
        </DerivedField>
      </NeuralOutput>
      <NeuralOutput outputNeuron="2,1">
        <DerivedField optype="categorical" dataType="string">
          <NormDiscrete field="class" value="Iris-versicolor"/>
        </DerivedField>
      </NeuralOutput>
      <NeuralOutput outputNeuron="2,2">
        <DerivedField optype="categorical" dataType="string">
          <NormDiscrete field="class" value="Iris-virginica"/>
        </DerivedField>
      </NeuralOutput>
    </NeuralOutputs>
  </NeuralNetwork>
</PMML>

question about build error

hi dear author,
i build this maven project and it has error in generation of 'scala-compile-first' , it says "can't find value DatasetUtil", it also can't find these classes used in PMMLTransformer.scala but defined in java class. do you know why , i don't know how to fix it , please help me

java.lang.AbstractMethodError when calling pmmlTransformer.transform

A java.lang.AbstractMethodError error occurred when I was testing the jpmml-evaluator-spark locally, and this error occurred when I called pmmlTransformer.transform.
Here is the code:

def pmmlPredict(spark: SparkSession, pmmlModelSavePath: String, predictData: String, predictResultSavePath: String): Unit = {
    val fs = FileSystem.get(new Configuration())
    val evaluator = EvaluatorUtil.createEvaluator(new File("E:/testModel/pmmlModel/pipelinePMMLModel.xml"))
    val pmmlTransformerBuilder = new TransformerBuilder(evaluator).withTargetCols().withOutputCols().exploded(true)
    val pmmlTransformer = pmmlTransformerBuilder.build()

    val fields = new ArrayBuffer[StructField]
    val it = evaluator.getActiveFields.iterator()
    while (it.hasNext) {
      fields.:+(StructField(it.next().getName.getValue, StringType, true))
    }
    val schema = StructType(fields)

    val predictStringRDD = spark.sparkContext.textFile(predictData)

    val predictRowRDD = predictStringRDD.map(_.split(",").map(_.toDouble)).map(Row.fromSeq(_))
    val predictDF = spark.createDataFrame(predictRowRDD, schema)

    val predictResultDF = pmmlTransformer.transform(predictDF)

    predictResultDF.write.csv(predictResultSavePath)

    predictResultDF.show()
  }

The full stack trace is:

Exception in thread "main" java.lang.AbstractMethodError: org.apache.spark.ml.Transformer.transform(Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/Dataset;
	at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:299)
	at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:299)
	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
	at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:299)
	at com.myhexin.oryx.TestPMML$.pmmlPredict(TestPMML.scala:154)
	at com.myhexin.oryx.TestPMML$.main(TestPMML.scala:42)
	at com.myhexin.oryx.TestPMML.main(TestPMML.scala)

TestPMML.scala:154 refers to the “val predictResultDF = pmmlTransformer.transform(predictDF)” line, and I don't know why this error occurred.

AbstractMethodError: org.shaded.jpmml.evaluator.spark.PMMLTransformer

I am getting the above exception when invoking the transform() method. I've shaded my jar (incl. relocation). Using Spark 2, jpmml-evaluator-spaprk "1.1-SNAPSHOT" version. I'm passing in a dataframe to transform(). Did I miss something?

Here's my code:
case class Attribute(ATTR1: Double, ATTR2:Double,ATTR3:Double, ATTR4:Double)
val ds = spark.read.option("header", "true").schema(StructType(Array(StructField("ATTR1", DoubleType, true),StructField("ATTR2", DoubleType, true),StructField("ATTR3", DoubleType, true),StructField("ATTR4", DoubleType, true)))).csv("my.csv").as[Attribute]

val file = new File(”mypmml.xml")
val evaluator = org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(file)

val pmmlTransformerBuilder = new org.jpmml.evaluator.spark.TransformerBuilder(evaluator).withOutputCols
val pmmlTransformer = pmmlTransformerBuilder.build()
val output = pmmlTransformer.transform(ds)   // exception triggered here:

I get an exception on the last line:

Exception in thread "main" java.lang.AbstractMethodError: org.shaded.jpmml.evaluator.spark.PMMLTransformer.transform(Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/Dataset;
at org.equifax.spark.pmml.examples.TestPMML$.main(TestPMML.scala:52)
at org.equifax.spark.pmml.examples.TestPMML.main(TestPMML.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

why NoClassDefFoundError exception occured in LoadingModelEvaluatorBuilder method

I Create the evaluator use this code
val evaluator = new LoadingModelEvaluatorBuilder().load(pmmlIs).build()

but it cannot run
Exception in thread "main" java.lang.NoClassDefFoundError: jakarta/xml/bind/JAXBContent

JDK version is 8
spark version 2.12
scala version 2.12

I tried different jpmml-evaluator-spark version not worked

org.jpmml.evaluator.UnsupportedFeatureException: TreeModel

Hello!

My colleague give me an pmml file like:

<?xml version="1.0" encoding="UTF-8"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2">
  <Header copyright="rosy">
    <Application name="KNIME" version="2.11.0"/>
  </Header>
  <DataDictionary numberOfFields="21">
...
  <TreeModel modelName="DecisionTree" functionName="classification" splitCharacteristic="binarySplit" missingValueStrategy="lastPrediction" noTrueChildStrategy="returnNullPrediction">
    <MiningSchema>
      <MiningField name="VMail Message" invalidValueTreatment="asIs"/>
      <MiningField name="Day Mins" invalidValueTreatment="asIs"/>
...

I build a transformer:

        InputStream is = MyModel.class.getResourceAsStream("/model.pmml");
        Evaluator evaluator = EvaluatorUtil.createEvaluator(is);

        TransformerBuilder modelBuilder = new TransformerBuilder(evaluator)
                .withOutputCols()
                .withTargetCols()
                .exploded(false);

        Transformer transformer = modelBuilder.build();

and run it at spark local, it's no any problem, the job add the predicte tag after the DataFrame.

But, when i run it at AWS EMR cluster which has 1 masternode and 2 workernode, The java code can not transform the pmml file:

Exception in thread "main" org.jpmml.evaluator.UnsupportedFeatureException: TreeModel
	at org.jpmml.evaluator.ModelEvaluatorFactory.createModelEvaluator(ModelEvaluatorFactory.java:134)
	at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:74)
	at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:70)
	at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:63)
	at javaCode.mlModel.MyModel.getClassifier(MyModel.java:21)
	at testpackage.test_s3$.main(test_s3.scala:17)

I don't know Y.

my env is:
java 7
scala 2.11.8
spark 2.0.2 (AWS EMR 5.2.1)

        <dependency>
            <groupId>org.jpmml</groupId>
            <artifactId>pmml-model</artifactId>
            <version>1.3.8</version>
        </dependency>
        <dependency>
            <groupId>org.jpmml</groupId>
            <artifactId>pmml-evaluator</artifactId>
            <version>1.3.10</version>
        </dependency>
        <dependency>
            <groupId>org.jpmml</groupId>
            <artifactId>jpmml-evaluator-spark</artifactId>
            <version>1.1-SNAPSHOT</version>
        </dependency>

Last time, in order to run the pmml(4.3) exported from sklearn, I used jpmml-evaluator-spark 1.1
This time, the version of pmml is 4.2,
but when I use jpmml-evaluator-spark 1.0.0, it has same problam.

Forgive my fucking English...

Thank you!

Simple prediction mode

The PMML transformer currently operates in "appending" mode - the Transformer#transform(Dataset<Row>) method appends prediction columns to the input dataset.

The "appending mode" appears to be Apache Spark convention. However, in some situations (eg. demonstration purposes) it might be desirable to operate in "simple" (aka "standalone") mode - the transform method creates a new dataset that only contains prediction columns.

Could be easily achieved by introducing a TransformerBuilder#setTransformationMode(Mode.APPENDING|Mode.STANDALONE) configuration method.

dependency version not consistent

hi dear author, thank you for taking time to answer my question.
whar do you mean of "fix my classpath", my local jdk is verson8 and in spark the jdk is also version8, is there any other things i need to fix? i still can't fix this problem ,
another question is , i read your read.md , i think maybe i can use your maven repository 1.2.2 jar release to construct my project ranther than using your jar code directly, but i dont't understand your code. for example, what does the class "LoadingModelEvaluatorBuilder" come from?
in fact i don't know how to keep the jar version consistently of my local environment and spark environment, can you tell my some metods?

jpmml-spark fails

jpmml-spark fails from github pull

Downloaded the latest . update the pom to spark 1.6.0

mvn -V = 3.3.1
mvn clean install

[INFO] 3 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] JPMML-Spark ........................................ SUCCESS [ 0.592 s]
[INFO] pmml-spark ......................................... FAILURE [ 2.775 s]
[INFO] pmml-spark-example ................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.492 s
[INFO] Finished at: 2016-06-23T17:55:38-07:00
[INFO] Final Memory: 39M/316M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project pmml-spark: Compilation failure: Compilation failure:
[ERROR] /PMML-SPARK/jpmml-spark-master/pmml-spark/src/main/java/org/jpmml/spark/PMMLPredictionModel.java:[31,28] cannot find symbol
[ERROR] symbol: class DataFrame
[ERROR] location: package org.apache.spark.sql
[ERROR] /PMML-SPARK/jpmml-spark-master/pmml-spark/src/main/java/org/jpmml/spark/PMMLPredictionModel.java:[88,46] cannot find symbol
[ERROR] symbol: class DataFrame
[ERROR] location: class org.jpmml.spark.PMMLPredictionModel
[ERROR] /PMML-SPARK/jpmml-spark-master/pmml-spark/src/main/java/org/jpmml/spark/PMMLPredictionModel.java:[88,16] cannot find symbol
[ERROR] symbol: class DataFrame
[ERROR] location: class org.jpmml.spark.PMMLPredictionModel
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :pmml-spark

support for spark 3.x ?

Hi @vruusmann,

Is there any plan to support evaluator in spark 3.x?

Or maybe I could attempt to build jpmml-evaluator-spark on spark 3.x on my own?

Thank you

question about class 'PMMLTransformer'

hi dear author,
thank you for your project and it really helps me .
i'm a new hand to spark and pmml, and i can't find the class 'PMMLTransformer' used in class 'Transformer', could you please tell me where can i find the class of 'PMMLTranformer'?

How to get the functionname of PMML model?

When I have a PMML file, how do I get the functionname? I can't find the relevant API
i want know which value
<xs:simpleType name="MINING-FUNCTION">
<xs:restriction base="xs:string">
<xs:enumeration value="associationRules"/>
<xs:enumeration value="sequences"/>
<xs:enumeration value="classification"/>
<xs:enumeration value="regression"/>
<xs:enumeration value="clustering"/>
<xs:enumeration value="timeSeries"/>
<xs:enumeration value="mixed"/>
</xs:restriction>
</xs:simpleType>

local class incompatible

hi dear author,
i use your ilbrary to contrust a predict from hdfs data to hdfs data,
but in the final step, i use dataframe.write.format("...csv").path("/../") to save data into hdfs.
then i got an error messaeg said
"java invalidClassException: org.jpmml.evaluator.ModelEvaluator,local class incompatible, serialVersionUiD=blabla.. , local serialVersionUID=blabla..."
i search solutions from internets and take some methods, first i change the pom.xml in order to get the jar version of cdh6.1.1 , because my spark environment is based on cdh6.1.1, then it doesn't work.
next, i use the maven-shade-plugin to change the org.jpmml into org.shaded.jpmml,
but it still doesn't works.
i don't know how to solve this problems,
i would appreciate if you can give me some suggestions to help me find the problems...
thank you so much~

when i only use jpmml-evaluator-spark, it will incur an exception

when i only use jpmml-evaluator-spark,it will incur an exception.i must explicitly use the other 2 dependencies.
--------------------error!!!----------------

org.jpmml
jpmml-evaluator-spark
1.2.2

-------------------correct, i have to use 2 extra dependencies!!!----------------------

org.jpmml
jpmml-evaluator-spark
1.2.2


org.jpmml
pmml-evaluator
1.4.7


org.jpmml
pmml-model
1.4.8

Doesn't work with Spark 1.6.1 (UnmarshalException)

I tried using this with Spark 1.6.1, but I get the following UnmarshalException exception:

javax.xml.bind.UnmarshalException: unexpected element (uri:"http://www.dmg.org/PMML-4_3", local:"PMML"). Expected elements are <{http://www.dmg.org/PMML-4_2}ARIMA>,<{http://www.dmg.org/PMML-4_2}Aggregate>,<{http://www.dmg.org/PMML-4_2}Alternate>,<{http://www.dmg.org/PMML-4_2}Annotation>,<{http://www.dmg.org/PMML-4_2}Anova>,<{http://www.dmg.org/PMML-4_2}AnovaRow>,<{http://www.dmg.org/PMML-4_2}AntecedentSequence>,<{http://www.dmg.org/PMML-4_2}AnyDistribution>,<{http://www.dmg.org/PMML-4_2}Application>,<{http://www.dmg.org/PMML-4_2}Apply>,<{http://www.dmg.org/PMML-4_2}Array>,<{http://www.dmg.org/PMML-4_2}AssociationModel>,<{http://www.dmg.org/PMML-4_2}AssociationRule>,<{http://www.dmg.org/PMML-4_2}Attribute>,<{http://www.dmg.org/PMML-4_2}BaseCumHazardTables>,<{http://www.dmg.org/PMML-4_2}Baseline>,<{http://www.dmg.org/PMML-4_2}BaselineCell>,<{http://www.dmg.org/PMML-4_2}BaselineModel>,<{http://www.dmg.org/PMML-4_2}BaselineStratum>,<{http://www.dmg.org/PMML-4_2}BayesInput>,<{http://www.dmg.org/PMML-4_2}BayesInputs>,<{http://www.dmg.org/PMML-4_2}BayesOutput>,<{http://www.dmg.org/PMML-4_2}BoundaryValueMeans>,<{http://www.dmg.org/PMML-4_2}BoundaryValues>,<{http://www.dmg.org/PMML-4_2}CategoricalPredictor>,<{http://www.dmg.org/PMML-4_2}Categories>,<{http://www.dmg.org/PMML-4_2}Category>,<{http://www.dmg.org/PMML-4_2}CenterFields>,<{http://www.dmg.org/PMML-4_2}Characteristic>,<{http://www.dmg.org/PMML-4_2}Characteristics>,<{http://www.dmg.org/PMML-4_2}ChildParent>,<{http://www.dmg.org/PMML-4_2}ClassLabels>,<{http://www.dmg.org/PMML-4_2}Cluster>,<{http://www.dmg.org/PMML-4_2}ClusteringField>,<{http://www.dmg.org/PMML-4_2}ClusteringModel>,<{http://www.dmg.org/PMML-4_2}ClusteringModelQuality>,<{http://www.dmg.org/PMML-4_2}Coefficient>,<{http://www.dmg.org/PMML-4_2}Coefficients>,<{http://www.dmg.org/PMML-4_2}ComparisonMeasure>,<{http://www.dmg.org/PMML-4_2}Comparisons>,<{http://www.dmg.org/PMML-4_2}ComplexPartialScore>,<{http://www.dmg.org/PMML-4_2}CompoundPredicate>,<{http://www.dmg.org/PMML-4_2}CompoundRule>,<{http://www.dmg.org/PMML-4_2}Con>,<{http://www.dmg.org/PMML-4_2}ConfusionMatrix>,<{http://www.dmg.org/PMML-4_2}ConsequentSequence>,<{http://www.dmg.org/PMML-4_2}Constant>,<{http://www.dmg.org/PMML-4_2}Constraints>,<{http://www.dmg.org/PMML-4_2}ContStats>,<{http://www.dmg.org/PMML-4_2}CorrelationFields>,<{http://www.dmg.org/PMML-4_2}CorrelationMethods>,<{http://www.dmg.org/PMML-4_2}CorrelationValues>,<{http://www.dmg.org/PMML-4_2}Correlations>,<{http://www.dmg.org/PMML-4_2}CountTable>,<{http://www.dmg.org/PMML-4_2}Counts>,<{http://www.dmg.org/PMML-4_2}Covariances>,<{http://www.dmg.org/PMML-4_2}CovariateList>,<{http://www.dmg.org/PMML-4_2}DataDictionary>,<{http://www.dmg.org/PMML-4_2}DataField>,<{http://www.dmg.org/PMML-4_2}Decision>,<{http://www.dmg.org/PMML-4_2}DecisionTree>,<{http://www.dmg.org/PMML-4_2}Decisions>,<{http://www.dmg.org/PMML-4_2}DefineFunction>,<{http://www.dmg.org/PMML-4_2}Delimiter>,<{http://www.dmg.org/PMML-4_2}DerivedField>,<{http://www.dmg.org/PMML-4_2}DiscrStats>,<{http://www.dmg.org/PMML-4_2}Discretize>,<{http://www.dmg.org/PMML-4_2}DiscretizeBin>,<{http://www.dmg.org/PMML-4_2}DocumentTermMatrix>,<{http://www.dmg.org/PMML-4_2}EventValues>,<{http://www.dmg.org/PMML-4_2}ExponentialSmoothing>,<{http://www.dmg.org/PMML-4_2}Extension>,<{http://www.dmg.org/PMML-4_2}FactorList>,<{http://www.dmg.org/PMML-4_2}False>,<{http://www.dmg.org/PMML-4_2}FieldColumnPair>,<{http://www.dmg.org/PMML-4_2}FieldRef>,<{http://www.dmg.org/PMML-4_2}FieldValue>,<{http://www.dmg.org/PMML-4_2}FieldValueCount>,<{http://www.dmg.org/PMML-4_2}GaussianDistribution>,<{http://www.dmg.org/PMML-4_2}GeneralRegressionModel>,<{http://www.dmg.org/PMML-4_2}Header>,<{http://www.dmg.org/PMML-4_2}INT-Entries>,<{http://www.dmg.org/PMML-4_2}INT-SparseArray>,<{http://www.dmg.org/PMML-4_2}Indices>,<{http://www.dmg.org/PMML-4_2}InlineTable>,<{http://www.dmg.org/PMML-4_2}InstanceField>,<{http://www.dmg.org/PMML-4_2}InstanceFields>,<{http://www.dmg.org/PMML-4_2}Interval>,<{http://www.dmg.org/PMML-4_2}Item>,<{http://www.dmg.org/PMML-4_2}ItemRef>,<{http://www.dmg.org/PMML-4_2}Itemset>,<{http://www.dmg.org/PMML-4_2}KNNInput>,<{http://www.dmg.org/PMML-4_2}KNNInputs>,<{http://www.dmg.org/PMML-4_2}KohonenMap>,<{http://www.dmg.org/PMML-4_2}Level>,<{http://www.dmg.org/PMML-4_2}LiftData>,<{http://www.dmg.org/PMML-4_2}LiftGraph>,<{http://www.dmg.org/PMML-4_2}LinearKernelType>,<{http://www.dmg.org/PMML-4_2}LinearNorm>,<{http://www.dmg.org/PMML-4_2}LocalTransformations>,<{http://www.dmg.org/PMML-4_2}MapValues>,<{http://www.dmg.org/PMML-4_2}MatCell>,<{http://www.dmg.org/PMML-4_2}Matrix>,<{http://www.dmg.org/PMML-4_2}MiningBuildTask>,<{http://www.dmg.org/PMML-4_2}MiningField>,<{http://www.dmg.org/PMML-4_2}MiningModel>,<{http://www.dmg.org/PMML-4_2}MiningSchema>,<{http://www.dmg.org/PMML-4_2}MissingValueWeights>,<{http://www.dmg.org/PMML-4_2}ModelExplanation>,<{http://www.dmg.org/PMML-4_2}ModelLiftGraph>,<{http://www.dmg.org/PMML-4_2}ModelStats>,<{http://www.dmg.org/PMML-4_2}ModelVerification>,<{http://www.dmg.org/PMML-4_2}MultivariateStat>,<{http://www.dmg.org/PMML-4_2}MultivariateStats>,<{http://www.dmg.org/PMML-4_2}NaiveBayesModel>,<{http://www.dmg.org/PMML-4_2}NearestNeighborModel>,<{http://www.dmg.org/PMML-4_2}NeuralInput>,<{http://www.dmg.org/PMML-4_2}NeuralInputs>,<{http://www.dmg.org/PMML-4_2}NeuralLayer>,<{http://www.dmg.org/PMML-4_2}NeuralNetwork>,<{http://www.dmg.org/PMML-4_2}NeuralOutput>,<{http://www.dmg.org/PMML-4_2}NeuralOutputs>,<{http://www.dmg.org/PMML-4_2}Neuron>,<{http://www.dmg.org/PMML-4_2}Node>,<{http://www.dmg.org/PMML-4_2}NormContinuous>,<{http://www.dmg.org/PMML-4_2}NormDiscrete>,<{http://www.dmg.org/PMML-4_2}NormalizedCountTable>,<{http://www.dmg.org/PMML-4_2}NumericInfo>,<{http://www.dmg.org/PMML-4_2}NumericPredictor>,<{http://www.dmg.org/PMML-4_2}OptimumLiftGraph>,<{http://www.dmg.org/PMML-4_2}Output>,<{http://www.dmg.org/PMML-4_2}OutputField>,<{http://www.dmg.org/PMML-4_2}PCell>,<{http://www.dmg.org/PMML-4_2}PCovCell>,<{http://www.dmg.org/PMML-4_2}PCovMatrix>,<{http://www.dmg.org/PMML-4_2}PMML>,<{http://www.dmg.org/PMML-4_2}PPCell>,<{http://www.dmg.org/PMML-4_2}PPMatrix>,<{http://www.dmg.org/PMML-4_2}PairCounts>,<{http://www.dmg.org/PMML-4_2}ParamMatrix>,<{http://www.dmg.org/PMML-4_2}Parameter>,<{http://www.dmg.org/PMML-4_2}ParameterField>,<{http://www.dmg.org/PMML-4_2}ParameterList>,<{http://www.dmg.org/PMML-4_2}Partition>,<{http://www.dmg.org/PMML-4_2}PartitionFieldStats>,<{http://www.dmg.org/PMML-4_2}PoissonDistribution>,<{http://www.dmg.org/PMML-4_2}PolynomialKernelType>,<{http://www.dmg.org/PMML-4_2}PredictiveModelQuality>,<{http://www.dmg.org/PMML-4_2}Predictor>,<{http://www.dmg.org/PMML-4_2}PredictorTerm>,<{http://www.dmg.org/PMML-4_2}Quantile>,<{http://www.dmg.org/PMML-4_2}REAL-Entries>,<{http://www.dmg.org/PMML-4_2}REAL-SparseArray>,<{http://www.dmg.org/PMML-4_2}ROC>,<{http://www.dmg.org/PMML-4_2}ROCGraph>,<{http://www.dmg.org/PMML-4_2}RadialBasisKernelType>,<{http://www.dmg.org/PMML-4_2}RandomLiftGraph>,<{http://www.dmg.org/PMML-4_2}Regression>,<{http://www.dmg.org/PMML-4_2}RegressionModel>,<{http://www.dmg.org/PMML-4_2}RegressionTable>,<{http://www.dmg.org/PMML-4_2}ResultField>,<{http://www.dmg.org/PMML-4_2}RuleSelectionMethod>,<{http://www.dmg.org/PMML-4_2}RuleSet>,<{http://www.dmg.org/PMML-4_2}RuleSetModel>,<{http://www.dmg.org/PMML-4_2}ScoreDistribution>,<{http://www.dmg.org/PMML-4_2}Scorecard>,<{http://www.dmg.org/PMML-4_2}SeasonalTrendDecomposition>,<{http://www.dmg.org/PMML-4_2}Seasonality_ExpoSmooth>,<{http://www.dmg.org/PMML-4_2}Segment>,<{http://www.dmg.org/PMML-4_2}Segmentation>,<{http://www.dmg.org/PMML-4_2}Sequence>,<{http://www.dmg.org/PMML-4_2}SequenceModel>,<{http://www.dmg.org/PMML-4_2}SequenceReference>,<{http://www.dmg.org/PMML-4_2}SequenceRule>,<{http://www.dmg.org/PMML-4_2}SetPredicate>,<{http://www.dmg.org/PMML-4_2}SetReference>,<{http://www.dmg.org/PMML-4_2}SigmoidKernelType>,<{http://www.dmg.org/PMML-4_2}SimplePredicate>,<{http://www.dmg.org/PMML-4_2}SimpleRule>,<{http://www.dmg.org/PMML-4_2}SimpleSetPredicate>,<{http://www.dmg.org/PMML-4_2}SpectralAnalysis>,<{http://www.dmg.org/PMML-4_2}SupportVector>,<{http://www.dmg.org/PMML-4_2}SupportVectorMachine>,<{http://www.dmg.org/PMML-4_2}SupportVectorMachineModel>,<{http://www.dmg.org/PMML-4_2}SupportVectors>,<{http://www.dmg.org/PMML-4_2}TableLocator>,<{http://www.dmg.org/PMML-4_2}Target>,<{http://www.dmg.org/PMML-4_2}TargetValue>,<{http://www.dmg.org/PMML-4_2}TargetValueCount>,<{http://www.dmg.org/PMML-4_2}TargetValueCounts>,<{http://www.dmg.org/PMML-4_2}TargetValueStat>,<{http://www.dmg.org/PMML-4_2}TargetValueStats>,<{http://www.dmg.org/PMML-4_2}Targets>,<{http://www.dmg.org/PMML-4_2}Taxonomy>,<{http://www.dmg.org/PMML-4_2}TestDistributions>,<{http://www.dmg.org/PMML-4_2}TextCorpus>,<{http://www.dmg.org/PMML-4_2}TextDictionary>,<{http://www.dmg.org/PMML-4_2}TextDocument>,<{http://www.dmg.org/PMML-4_2}TextIndex>,<{http://www.dmg.org/PMML-4_2}TextIndexNormalization>,<{http://www.dmg.org/PMML-4_2}TextModel>,<{http://www.dmg.org/PMML-4_2}TextModelNormalization>,<{http://www.dmg.org/PMML-4_2}TextModelSimiliarity>,<{http://www.dmg.org/PMML-4_2}Time>,<{http://www.dmg.org/PMML-4_2}TimeAnchor>,<{http://www.dmg.org/PMML-4_2}TimeCycle>,<{http://www.dmg.org/PMML-4_2}TimeException>,<{http://www.dmg.org/PMML-4_2}TimeSeries>,<{http://www.dmg.org/PMML-4_2}TimeSeriesModel>,<{http://www.dmg.org/PMML-4_2}TimeValue>,<{http://www.dmg.org/PMML-4_2}Timestamp>,<{http://www.dmg.org/PMML-4_2}TrainingInstances>,<{http://www.dmg.org/PMML-4_2}TransformationDictionary>,<{http://www.dmg.org/PMML-4_2}TreeModel>,<{http://www.dmg.org/PMML-4_2}True>,<{http://www.dmg.org/PMML-4_2}UniformDistribution>,<{http://www.dmg.org/PMML-4_2}UnivariateStats>,<{http://www.dmg.org/PMML-4_2}Value>,<{http://www.dmg.org/PMML-4_2}VectorDictionary>,<{http://www.dmg.org/PMML-4_2}VectorFields>,<{http://www.dmg.org/PMML-4_2}VectorInstance>,<{http://www.dmg.org/PMML-4_2}VerificationField>,<{http://www.dmg.org/PMML-4_2}VerificationFields>,<{http://www.dmg.org/PMML-4_2}XCoordinates>,<{http://www.dmg.org/PMML-4_2}YCoordinates>,<{http://www.dmg.org/PMML-4_2}binarySimilarity>,<{http://www.dmg.org/PMML-4_2}chebychev>,<{http://www.dmg.org/PMML-4_2}cityBlock>,<{http://www.dmg.org/PMML-4_2}euclidean>,<{http://www.dmg.org/PMML-4_2}jaccard>,<{http://www.dmg.org/PMML-4_2}minkowski>,<{http://www.dmg.org/PMML-4_2}row>,<{http://www.dmg.org/PMML-4_2}simpleMatching>,<{http://www.dmg.org/PMML-4_2}squaredEuclidean>,<{http://www.dmg.org/PMML-4_2}tanimoto>
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:662)
	at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:258)
	at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:253)
	at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:120)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1063)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:498)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:480)
	at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:150)
	at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
	at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
	at org.jpmml.model.filters.PMMLFilter.startElement(PMMLFilter.java:69)
	at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
	at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:258)
	at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:229)
	at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:140)
	at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:123)
	at org.jpmml.model.JAXBUtil.unmarshal(JAXBUtil.java:41)
	at org.jpmml.model.JAXBUtil.unmarshalPMML(JAXBUtil.java:29)
	at org.jpmml.model.PMMLUtil.unmarshal(PMMLUtil.java:31)
	.... 

Could one expect a fix-up for these older versions of Spark?

reading pmml from hdfs

hi i am trying to read my pmml file from hdfs but java.io.InputStream class is searching file from local storage so i get file not found exception. Is there a way to read pmml file from hdfs ?

Invalid lambda deserialization at org.shaded.jpmml.evaluator.OutputFilters.$deserializeLambda$

Hi There,

I have been using this library in my project. I am getting following error when i run K-means clustering algorithm(or any other clustering algorithms) on hadoop data lake.

It works fine on the standalone machine, but only fails on data lake when run in yarn cluster mode. Interesting point to mention that logistic regression, xgBoost ,decision Tree classification algorithms works fine on both standalone and yarn-cluster.

I am pasting error stack trace down below, snippet of program and pom-xml file content.
Note that it throws java.lang.IllegalArgumentException: Invalid lambda deserialization
at org.shaded.jpmml.evaluator.OutputFilters.$deserializeLambda$OutputFilters.java

Error:

19/01/30 13:06:58 ERROR executor.Executor: Exception in task 0.2 in stage 1.0 (TID 3)
java.io.IOException: unexpected exception type
	at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1682)
	at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1254)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2076)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1973)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1565)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1248)
	... 78 more
Caused by: java.lang.IllegalArgumentException: Invalid lambda deserialization
	at org.shaded.jpmml.evaluator.OutputFilters.$deserializeLambda$(OutputFilters.java:21)
	... 88 more

My Program is:

public class AppNew {
    public static void main(String[] args) throws IOException, JAXBException, org.xml.sax.SAXException {
        // TODO Auto-generated method stu

        String fileName = args[0];
        String dataFileName = args[1];
        String writeLocation =args[2];

        SparkSession spark = SparkSession.builder().appName("jpmml").config("spark.master",args[3]).getOrCreate();

        Configuration conf = spark.sparkContext().hadoopConfiguration();
        FileSystem fs =  FileSystem.get(conf);


        EvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
                .setLocatable(false)
                .setVisitors(new DefaultVisitorBattery())
                .load(fs.open(new Path(fileName)).getWrappedStream());
        Evaluator evaluator = evaluatorBuilder.build();
        evaluator.verify();

        TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
                .withTargetCols()
                .withOutputCols()
                .exploded(true);

        Transformer pmmlTransformer = pmmlTransformerBuilder.build();

        Random r= new Random();
        Dataset<Row> df = spark .read().option("header", "true").csv(dataFileName).toDF();
        Dataset<Row> tdf = pmmlTransformer.transform(df);

        tdf.printSchema();
        tdf.write().option("header","true").csv(String.format("%s_%s", writeLocation, r.nextLong()));
        spark.stop();
    }

My Pom file looks like this:

<dependencies>
    <dependency>
        <groupId>org.jpmml</groupId>
        <artifactId>jpmml-evaluator-spark</artifactId>
        <version>1.2.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.2.0</version>
        <exclusions>
            <exclusion>
                <groupId>org.jpmml</groupId>
                <artifactId>pmml-model</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>

                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.handlers</resource>
                                </transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.fxc.rpc.impl.member.MemberProvider</mainClass>
                                </transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.schemas</resource>
                                </transformer>
                            </transformers>
                            <relocations>
                                <relocation>
                                    <pattern>org.dmg.pmml</pattern>
                                    <shadedPattern>org.shaded.dmg.pmml</shadedPattern>
                                </relocation>
                                <relocation>
                                    <pattern>org.jpmml</pattern>
                                    <shadedPattern>org.shaded.jpmml</shadedPattern>
                                </relocation>
                                <relocation>
                                    <pattern>com.google.guava</pattern>
                                    <shadedPattern>com.shaded.google.guava</shadedPattern>
                                </relocation>
                                <relocation>
                                    <pattern>com.google.common</pattern>
                                    <shadedPattern>com.shaded.google.common</shadedPattern>
                                </relocation>
                            </relocations>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <compilerArgument>-XDignore.symbol.file</compilerArgument>
                </configuration>
            </plugin>
        </plugins>
    </build>

Appreciate your help in resolving this issue.

Thanks in advance.

Regards,
Ibrahim.

PMML version support upgrade needed

Hello,
I tried to load a PMML(version 4.3) model which was exported from r2pmml in jpmml-spark, whereas PMML in version 4.3 is not supported.

Dots in column/feature-names and org.apache.spark.sql.AnalysisException: cannot resolve 'something.other'

I ran into this issue, and was first reminded of a closed issue here, with the same Exception, but instead, this is based off the problem described at http://stackoverflow.com/questions/36000147/spark-1-6-apply-function-to-column-with-dot-in-name-how-to-properly-escape-coln
I don't think it's sensible to adjust the schema and eliminate the dot. Instead dot-handling should be done inside the package, at least with a more explicit error, or, ideally, by escaping dots in column names.

how to improve my pmml model‘s accuracy rate

with same data and arguments ,wen i use SparkML my accuracy rate is 69% but when i use pmml my accuracy rate is 47%,how to imporve it ,is there any reason to cause this。 thank you very much for your help

Rename transformer and transformer builder classes

Current names org.jpmml.evaluator.spark.Transformer and o.j.e.s.TransformerBuilder are too vague - there is no clear indication that they deal with the PMML data format (as implemented by the JPMML-Evaluator library). Perhaps adding a common prefix such as "PMML" (or "Evaluator") would be all that is needed.

Renaming classes is a breaking API change, so it would necessitate upgrading the JPMML-Evaluator-Spark base version from 1.2.X to 1.3.X.

not compatible with pmml-model-1.4.3

Hi , The path of the class (ImportFilter) was modified in PMML -model-1.4.3, and the class( ImportFilter) no longer has the method -- apply(), I only solved the problem by adding two original method--apply() and createFilteredSource().I hope you have a better solution so.

Unable to resolve Maven Dependency

After including the given maven dependency:

                 <dependency>
			<groupId>org.jpmml</groupId>
			<artifactId>jpmml-evaluator-spark</artifactId>
			<version>1.1-SNAPSHOT</version>
		</dependency>

I am facing following error while building jar:
The POM for org.jpmml:jpmml-evaluator-spark:jar:1.1-SNAPSHOT is missing, no dependency information available

Please provide the solution for this.

The period (.) in <output> creates problems

Classification Scenario

Context

Hello,
Sometimes in the dataset, target variable is pre-encoded into the 0, 1. But in those sometimes there are other sometimes when those 0 and 1 are encoded in as float. So they are 0.0 and 1.0.

Problem

Now, when we create the PMML for such data the output code is generated like this:

<Output>
	<OutputField name="probability(0.0)" optype="continuous" dataType="double" feature="probability" value="0.0"/>
	<OutputField name="probability(1.0)" optype="continuous" dataType="double" feature="probability" value="1.0"/>
</Output>

So, when we try to predict with other data with the following sentence:

Dataset<Row> result = pmmlTransformer.transform(DF);

The error is generated like:

Exception in thread "main" org.apache.spark.sql.AnalysisException: No such struct field `probability(0.0)` in income_>50K, probability(0.0), probability(1.0);
...

A Fix

So I tried, manually removing the . from the .pmml file and that worked correctly!
The updated code that worked correctly:

<Output>
	<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0.0"/>
	<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1.0"/>
</Output>

I can understand, that this could be solved from the file where we are generating the PMML but that might not be possible all time. So for the convenience I would ask this community to fix this at the jpmml level.

Thanking you 😄

Support for spark version 3.5.0?

Hey there! I am using latest version 1.3.0 of jpmml-evaluator-spark but after upgrading to the latest spark version 3.5.0. i am getting this error:

untyped Scala UDF

ERROR org.apache.spark.ml.util.Instrumentation - org.apache.spark.sql.AnalysisException: [UNTYPED_SCALA_UDF] You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`.
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive.
3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution.
	at org.apache.spark.sql.errors.QueryCompilationErrors$.usingUntypedScalaUDFError(QueryCompilationErrors.scala:3157)
	at org.apache.spark.sql.functions$.udf(functions.scala:8299)
	at org.jpmml.evaluator.spark.PMMLTransformer.transform(PMMLTransformer.scala:99)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$4(Pipeline.scala:311)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$3(Pipeline.scala:311)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:308)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
	at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:307)

After using "spark.sql.legacy.allowUntypedScalaUDF", "true" its working fine.

Is there will any update from your side to solve this?

I found this related closed issue: #43 for spark version 3.1.1

Model "data schema" exploration methods

The transformer class should methods for querying model's "native" data schema (all fields as defined in the PMML document) and "mapped" schema (as was configured during transformer building). The return type of such methods could be Apache Spark's standard StructType, which can then be inspected and processed using built-in tools.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.