Comments (6)
Can you provide more technical context? For example, a stack trace of an exception?
Intuitively, I would suspect the following invocation of DataFrame#apply(String)
: https://github.com/jpmml/jpmml-spark/blob/master/pmml-spark/src/main/java/org/jpmml/spark/PMMLTransformer.java#L107
In PMML specification there is no such thing as a reserved word or character. A field name can be literally anything (eg. an empty string ""
or a Java reserved word class
).
from jpmml-evaluator-spark.
See Issue #3 for the equivalent stack trace - I'm pretty sure it's identical (if for another reason), and sadly not very helpful, because it's mostly happening within the DataFrame/SparkSQL-magic bits of Spark.
To reproduce, add a dot to a PMML Attribute name and the corresponding CSV-header-attribute name. Then using the example code will trigger the Exception.
from jpmml-evaluator-spark.
The original stack trace:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot resolve column name "Sepal.Length" among (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species);
at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708)
at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696)
at org.jpmml.spark.PMMLTransformer$1.apply(PMMLTransformer.java:107)
at org.jpmml.spark.PMMLTransformer$1.apply(PMMLTransformer.java:103)
at com.shaded.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:638)
at com.shaded.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at java.util.ArrayList.<init>(ArrayList.java:164)
at com.shaded.google.common.collect.Lists.newArrayList(Lists.java:146)
at org.jpmml.spark.PMMLTransformer.transform(PMMLTransformer.java:113)
at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:297)
at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:297)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:297)
at org.jpmml.spark.EvaluationExample.main(EvaluationExample.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
from jpmml-evaluator-spark.
Transformer class org.jpmml.spark.PMMLTransformer
is now able to deal with column names that contain special characters. However, these column names still cause problems for Apache Spark core classes/methods (eg. the DataFrame#withColumn(String, Column)
method).
Here's a "residual" stack trace:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Sepal.Length' given input columns Sepal.Width, Species, pmml, Petal.Length, Sepal.Length, Petal.Width;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:318)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:107)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:117)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:121)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:121)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:125)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:125)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:57)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2165)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:751)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1225)
at org.jpmml.spark.ColumnExploder.transform(ColumnExploder.java:77)
at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:297)
at org.apache.spark.ml.PipelineModel$$anonfun$transform$1.apply(Pipeline.scala:297)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:297)
at org.jpmml.spark.EvaluationExample.main(EvaluationExample.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
from jpmml-evaluator-spark.
Thanks @vruusmann :)
The underlying problem is still annoying, but at least we can transform now :)
from jpmml-evaluator-spark.
@RPCMoritz Currently, you cannot do TransformerBuilder#exploded(true)
, because the column explosion and pruning functionality depends on this still-broken Apache Spark functionality. Maybe there's a way to "flatten" the predictions struct column manually (eg. some low-level Scala APIs).
This problem will be solved after upgrading to Apache Spark 2.0(.1).
from jpmml-evaluator-spark.
Related Issues (20)
- Invalid lambda deserialization at org.shaded.jpmml.evaluator.OutputFilters.$deserializeLambda$ HOT 4
- Rename transformer and transformer builder classes
- Simple prediction mode
- Model "data schema" exploration methods
- Replace `java.util.List<E>` parameters with `E[]` parameters in method signatures
- Row-oriented exception handling
- question about class 'PMMLTransformer' HOT 1
- question about build error HOT 1
- local class incompatible HOT 2
- dependency version not consistent HOT 2
- how to improve my pmml model‘s accuracy rate HOT 1
- submit spark job==》java.io.IOException: unexpected exception type HOT 1
- when i only use jpmml-evaluator-spark, it will incur an exception HOT 3
- reading pmml from hdfs HOT 1
- How to get the functionname of PMML model? HOT 2
- Resolving an application classpath conflict HOT 2
- Can I use Scala to load PMML model to complete prediction? HOT 1
- support for spark 3.x ? HOT 5
- The period (.) in <output> creates problems
- Incomplete `TransformerBuilder` default configuration for `exploded(true)`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-evaluator-spark.