ClassCastException loading model in Apache Spark about epic HOT 17 OPEN

dlwh commented on September 24, 2024

ClassCastException loading model in Apache Spark

from epic.

Comments (17)

dlwh commented on September 24, 2024

I've seen this kind of problem a few times, and they are incredibly hard to
debug. It's usually a classloader problem, I think, and I'm unfortunately
not great at debugging (you can guess my frustration level last time I
debugged this, which is when I created nonstupidObjectInputStream...)

This is going to sound very hacky, but... could you try creating a new
class in epic's package explicitly before loading the model? Something as
simple as val x = new epic.features.BrownClusterFeature("foo")

You might also appeal to the spark user list. I'm happy to help with it as
best I can, but it isn't Epic-specific (I think!) and they have a lot more
expertise dealing with serialization problems caused by remoting and
classloaders.

-- David

On Wed, Nov 19, 2014 at 8:35 AM, Tim Croydon [email protected]
wrote:

Hi there,

I'm trying to use epic in an Apache Spark Streaming environment but I'm
experiencing some difficulty loading the models. I'm not really sure
whether this is an Epic issue, a Spark issue or where/how to solve this
now! I get the following exception (for English NER):

Exception in thread "main" java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field epic.features.BrownClusterFeaturizer.epic$features$BrownClusterFeaturizer$$clusterFeatures of type scala.collection.immutable.Map in instance of epic.features.BrownClusterFeaturizer
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
... trimmed ...
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at breeze.util.package$.readObject(package.scala:21)
at epic.models.package$.deserialize(package.scala:54)
... trimmed calls from my code ...

I've tried running my code (compiled into uberjar using 'sbt assembly') in
a raw scala console and I can load the model and run it fine. However,
using Spark, I get the exception described. The ONLY difference as far as I
can tell is the way the model file is referenced. For the raw scala
environment, I can point directly at the model file on disk (e.g. new
File("mymodels/model.ser.gz")) and it loads. In Spark, I have to load the
file doing something similar to:

sc.addFile("model.ser.gz")
new File(SparkFiles.get("model.ser.gz")

I've tried narrowing the code down and depending whether I point at the
model extracted from the jar or the jar itself I get the same result. It's
definitely loading the file (I think) as it fails in other ways if the file
doesn't exist. I even tried bypassing the Breeze
nonStupidObjectInputStream to no avail.

Any idea what's going on or how to test? For reference, my JVM is 1.7.0_51
and same in both scala and Spark environments.

Thanks.

—
Reply to this email directly or view it on GitHub
#17.

from epic.

timcroydon commented on September 24, 2024

I tried your suggestion and was able to create a BrownClusterFeature object with no trouble so doesn't look like it's a classloader issue (as far as I can tell). It feels more like the kind of problem you might get serialising using one version and trying to deserialise with another, although given the file can be deserialised using raw scala it's almost like something's happening to the file stream.

I'll have a closer look at the Spark side to see if I can find similar issues there.

Thanks for the prompt response and for the library!

from epic.

dlwh commented on September 24, 2024

Is there maybe something going on with different scala versions? (Or, less
likely, Breeze versions?)

On Wed, Nov 19, 2014 at 11:01 AM, Tim Croydon [email protected]
wrote:

I tried your suggestion and was able to create a BrownClusterFeature
object with no trouble so doesn't look like it's a classloader issue (as
far as I can tell). It feels more like the kind of problem you might get
serialising using one version and trying to deserialise with another,
although given the file can be deserialised it's almost like something's
happening to the file stream.

I'll have a closer look at the Spark side to see if I can find similar
issues there.

Thanks for the prompt response and for the library!

—
Reply to this email directly or view it on GitHub
#17 (comment).

from epic.

timcroydon commented on September 24, 2024

I'm compiling to 2.10.4 and my installed scala version matches that. However, there is a Breeze dependency at a different version - looks like nak pulls in an older version of breeze_natives:

'What depends on' Breeze 0.8:


[info] org.scalanlp:breeze_2.10:0.8 (evicted by: 0.9)
[info]   +-org.scalanlp:breeze-natives_2.10:0.8 [S]
[info]     +-org.scalanlp:nak_2.10:1.3 [S]
[info]       +-org.scalanlp:epic_2.10:0.2 [S]
[info]         +-my stuff
[info]         +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info]         | +-my stuff
[info]         | 
[info]         +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info]           +-my stuff

And same for Breeze 0.9:


[info] org.scalanlp:breeze_2.10:0.9 [S]
[info]   +-org.scalanlp:breeze-natives_2.10:0.8 [S]
[info]   | +-org.scalanlp:nak_2.10:1.3 [S]
[info]   |   +-org.scalanlp:epic_2.10:0.2 [S]
[info]   |     +-my stuff
[info]   |     +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info]   |     | +-my stuff
[info]   |     | 
[info]   |     +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info]   |       +-my stuff
[info]   |       
[info]   +-org.scalanlp:epic_2.10:0.2 [S]
[info]   | +-my stuff
[info]   | +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info]   | | +-my stuff
[info]   | | 
[info]   | +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info]   |   +-my stuff
[info]   |   
[info]   +-org.scalanlp:nak_2.10:1.3 [S]
[info]     +-org.scalanlp:epic_2.10:0.2 [S]
[info]       +-my stuff
[info]       +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info]       | +-kafkareader:kafkareader_2.10:0.1 [S]
[info]       | 
[info]       +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info]         +-my stuff

No idea if that might cause problems?

from epic.

dlwh commented on September 24, 2024

nak is declared intransitive() so that shouldn't be a problem. (Seems like a bug in the dependency graph plugin...)

from epic.

JSantosP commented on September 24, 2024

Hi there,

I just googled, looking for a solution for a similar problem in a project I'm working in, and we found and fixed the problem cause (I'm not sure if it fixes your current problem).

We solved it adding missing classpath dependencies when creating SparkContext (not only direct dependencies):

  val sparkConf = new SparkConf().setJars("...") //Add all transitive dependencies that Spark workers might need.

Hope this helps.

Regards!

from epic.

acvogel commented on September 24, 2024

@timcroydon Any chance you found a solution to this problem? Running into the same issue.

from epic.

reactormonk commented on September 24, 2024

@acvogel the solution @JSantosP provided doesn't work?

from epic.

timcroydon commented on September 24, 2024

I don't recall now, I'm afraid. For various unrelated reasons, we ended up using a different library for similar functionality so I don't think I ever got round to investigating this fully - sorry!

from epic.

acvogel commented on September 24, 2024

@reactormonk I haven't gotten it to work by that route, but perhaps I'm missing something. I assemble the project into a single jar, and also add dependent jars:

SparkConf().setJars(Seq("/root/myBigJar.jar", "/root/epic-ner-en-conll_2.10-2015.1.25.jar", "/root/epic_2.10-0.3.jar"))

Perhaps I'm missing not following @JSantosP suggestion correctly, as those should be included in myBigJar.jar anyway.

@timcroydon Thanks for your reply!

from epic.

dlwh commented on September 24, 2024

there's a jar from february that works, i believe. can't fix atm.

On Wed, Jun 10, 2015 at 2:34 PM, acvogel [email protected] wrote:

@reactormonk https://github.com/reactormonk I haven't gotten it to work
by that route, but perhaps I'm missing something. I assemble the project
into a single jar, and also add dependent jars:

SparkConf().setJars(Seq("/root/myBigJar.jar",
"/root/epic-ner-en-conll_2.10-2015.1.25.jar", "/root/epic_2.10-0.3.jar"))

Perhaps I'm missing not following @JSantosP https://github.com/JSantosP
suggestion correctly, as those should be included in myBigJar.jar anyway.

@timcroydon https://github.com/timcroydon Thanks for your reply!

—
Reply to this email directly or view it on GitHub
#17 (comment).

from epic.

briantopping commented on September 24, 2024

I've been using the 2015.2.19 data files combined with the sources from https://github.com/dlwh/epic/tree/e0238ceb16fc9adb9511240638357e8c44200a2f. The files from February work, but I believe this tree is the last one that works. I covered some of it in #24 IIRC.

I don't know if this will solve your specific issue, but it is the latest version I believe will work. From there, maybe you could fix whatever CCE is holding back usage under Spark.

https://gist.github.com/briantopping/369fb337735c1b726337 is the complete dependency closure from the subproject I am using.

from epic.

lfernandez-stratio commented on September 24, 2024

I had the same problem and the JSantosP solutioin worked for me. Thank you.

from epic.

ltao80 commented on September 24, 2024

What is the final solution, I have the same problem, I make a single jar file, on my local, it works, but when submit to Spark, throw exception java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field epic.features.BrownClusterFeaturizer.epic$features$BrownClusterFeaturizer$$clusterFeatures of type scala.collection.immutable.Map in instance of epic.features.BrownClusterFeaturizer

Who can help me, thanks a lot.

from epic.

acvogel commented on September 24, 2024

@ltao80 I never got it to work and gave up. I'd be curious to hear from anyone else with a detailed solution.

from epic.

ltao80 commented on September 24, 2024

@acvogel Thank you for your reply, I gave up too, I change to use Stanford NLP

from epic.

Tooa commented on September 24, 2024

I'm facing the same problem (see here [1]). I've tried @JSantosP suggestion and added several dependencies to the SparkConf.

val path = "/home/.../.../spark-fun/jars/"
    val conf = new SparkConf().setAppName("wordCount").setJars(Seq(
      path + "epic_2.10-0.3.jar",
      path + "epic-ner-en-conll_2.10-2015.1.25.jar",
      path + "nak_2.10-1.3.jar",
      path + "scala-logging-api_2.10-2.1.2.jar",
      path + "scala-logging-slf4j_2.10-2.1.2.jar",
      path + "breeze_2.10-0.11-M0.jar",
      path + "spark-assembly-1.5.2-hadoop2.6.0.jar",
      path + "spark-fun-assembly-1.0.jar"
    ))

Do I need the path here? I also wonder, why I should add these jars to the SparkConf. Using a fat jar that was generated with sbt assembly should be enough, right? The project dependency tree looks like [2]. Do I really need to add all of these dependencies to the SparkConf?

[1] https://github.com/Tooa/spark-fun
[2] https://gist.github.com/Tooa/a2d364d7d457c64dd68f

from epic.

ClassCastException loading model in Apache Spark about epic HOT 17 OPEN

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent