Comments (17)
I've seen this kind of problem a few times, and they are incredibly hard to
debug. It's usually a classloader problem, I think, and I'm unfortunately
not great at debugging (you can guess my frustration level last time I
debugged this, which is when I created nonstupidObjectInputStream...)
This is going to sound very hacky, but... could you try creating a new
class in epic's package explicitly before loading the model? Something as
simple as val x = new epic.features.BrownClusterFeature("foo")
You might also appeal to the spark user list. I'm happy to help with it as
best I can, but it isn't Epic-specific (I think!) and they have a lot more
expertise dealing with serialization problems caused by remoting and
classloaders.
-- David
On Wed, Nov 19, 2014 at 8:35 AM, Tim Croydon [email protected]
wrote:
Hi there,
I'm trying to use epic in an Apache Spark Streaming environment but I'm
experiencing some difficulty loading the models. I'm not really sure
whether this is an Epic issue, a Spark issue or where/how to solve this
now! I get the following exception (for English NER):Exception in thread "main" java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field epic.features.BrownClusterFeaturizer.epic$features$BrownClusterFeaturizer$$clusterFeatures of type scala.collection.immutable.Map in instance of epic.features.BrownClusterFeaturizer
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
... trimmed ...
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at breeze.util.package$.readObject(package.scala:21)
at epic.models.package$.deserialize(package.scala:54)
... trimmed calls from my code ...I've tried running my code (compiled into uberjar using 'sbt assembly') in
a raw scala console and I can load the model and run it fine. However,
using Spark, I get the exception described. The ONLY difference as far as I
can tell is the way the model file is referenced. For the raw scala
environment, I can point directly at the model file on disk (e.g. new
File("mymodels/model.ser.gz")) and it loads. In Spark, I have to load the
file doing something similar to:sc.addFile("model.ser.gz")
new File(SparkFiles.get("model.ser.gz")I've tried narrowing the code down and depending whether I point at the
model extracted from the jar or the jar itself I get the same result. It's
definitely loading the file (I think) as it fails in other ways if the file
doesn't exist. I even tried bypassing the Breeze
nonStupidObjectInputStream to no avail.Any idea what's going on or how to test? For reference, my JVM is 1.7.0_51
and same in both scala and Spark environments.Thanks.
—
Reply to this email directly or view it on GitHub
#17.
from epic.
I tried your suggestion and was able to create a BrownClusterFeature object with no trouble so doesn't look like it's a classloader issue (as far as I can tell). It feels more like the kind of problem you might get serialising using one version and trying to deserialise with another, although given the file can be deserialised using raw scala it's almost like something's happening to the file stream.
I'll have a closer look at the Spark side to see if I can find similar issues there.
Thanks for the prompt response and for the library!
from epic.
Is there maybe something going on with different scala versions? (Or, less
likely, Breeze versions?)
On Wed, Nov 19, 2014 at 11:01 AM, Tim Croydon [email protected]
wrote:
I tried your suggestion and was able to create a BrownClusterFeature
object with no trouble so doesn't look like it's a classloader issue (as
far as I can tell). It feels more like the kind of problem you might get
serialising using one version and trying to deserialise with another,
although given the file can be deserialised it's almost like something's
happening to the file stream.I'll have a closer look at the Spark side to see if I can find similar
issues there.Thanks for the prompt response and for the library!
—
Reply to this email directly or view it on GitHub
#17 (comment).
from epic.
I'm compiling to 2.10.4 and my installed scala version matches that. However, there is a Breeze dependency at a different version - looks like nak pulls in an older version of breeze_natives:
'What depends on' Breeze 0.8:
[info] org.scalanlp:breeze_2.10:0.8 (evicted by: 0.9)
[info] +-org.scalanlp:breeze-natives_2.10:0.8 [S]
[info] +-org.scalanlp:nak_2.10:1.3 [S]
[info] +-org.scalanlp:epic_2.10:0.2 [S]
[info] +-my stuff
[info] +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info] | +-my stuff
[info] |
[info] +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info] +-my stuff
And same for Breeze 0.9:
[info] org.scalanlp:breeze_2.10:0.9 [S]
[info] +-org.scalanlp:breeze-natives_2.10:0.8 [S]
[info] | +-org.scalanlp:nak_2.10:1.3 [S]
[info] | +-org.scalanlp:epic_2.10:0.2 [S]
[info] | +-my stuff
[info] | +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info] | | +-my stuff
[info] | |
[info] | +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info] | +-my stuff
[info] |
[info] +-org.scalanlp:epic_2.10:0.2 [S]
[info] | +-my stuff
[info] | +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info] | | +-my stuff
[info] | |
[info] | +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info] | +-my stuff
[info] |
[info] +-org.scalanlp:nak_2.10:1.3 [S]
[info] +-org.scalanlp:epic_2.10:0.2 [S]
[info] +-my stuff
[info] +-org.scalanlp:epic-ner-en-conll_2.10:2014.10.26 [S]
[info] | +-kafkareader:kafkareader_2.10:0.1 [S]
[info] |
[info] +-org.scalanlp:epic-parser-en-span_2.10:2014.9.15 [S]
[info] +-my stuff
No idea if that might cause problems?
from epic.
nak is declared intransitive() so that shouldn't be a problem. (Seems like a bug in the dependency graph plugin...)
from epic.
Hi there,
I just googled, looking for a solution for a similar problem in a project I'm working in, and we found and fixed the problem cause (I'm not sure if it fixes your current problem).
We solved it adding missing classpath dependencies when creating SparkContext (not only direct dependencies):
val sparkConf = new SparkConf().setJars("...") //Add all transitive dependencies that Spark workers might need.
Hope this helps.
Regards!
from epic.
@timcroydon Any chance you found a solution to this problem? Running into the same issue.
from epic.
@acvogel the solution @JSantosP provided doesn't work?
from epic.
I don't recall now, I'm afraid. For various unrelated reasons, we ended up using a different library for similar functionality so I don't think I ever got round to investigating this fully - sorry!
from epic.
@reactormonk I haven't gotten it to work by that route, but perhaps I'm missing something. I assemble the project into a single jar, and also add dependent jars:
SparkConf().setJars(Seq("/root/myBigJar.jar", "/root/epic-ner-en-conll_2.10-2015.1.25.jar", "/root/epic_2.10-0.3.jar"))
Perhaps I'm missing not following @JSantosP suggestion correctly, as those should be included in myBigJar.jar anyway.
@timcroydon Thanks for your reply!
from epic.
there's a jar from february that works, i believe. can't fix atm.
On Wed, Jun 10, 2015 at 2:34 PM, acvogel [email protected] wrote:
@reactormonk https://github.com/reactormonk I haven't gotten it to work
by that route, but perhaps I'm missing something. I assemble the project
into a single jar, and also add dependent jars:SparkConf().setJars(Seq("/root/myBigJar.jar",
"/root/epic-ner-en-conll_2.10-2015.1.25.jar", "/root/epic_2.10-0.3.jar"))Perhaps I'm missing not following @JSantosP https://github.com/JSantosP
suggestion correctly, as those should be included in myBigJar.jar anyway.@timcroydon https://github.com/timcroydon Thanks for your reply!
—
Reply to this email directly or view it on GitHub
#17 (comment).
from epic.
I've been using the 2015.2.19 data files combined with the sources from https://github.com/dlwh/epic/tree/e0238ceb16fc9adb9511240638357e8c44200a2f. The files from February work, but I believe this tree is the last one that works. I covered some of it in #24 IIRC.
I don't know if this will solve your specific issue, but it is the latest version I believe will work. From there, maybe you could fix whatever CCE is holding back usage under Spark.
https://gist.github.com/briantopping/369fb337735c1b726337 is the complete dependency closure from the subproject I am using.
from epic.
I had the same problem and the JSantosP solutioin worked for me. Thank you.
from epic.
What is the final solution, I have the same problem, I make a single jar file, on my local, it works, but when submit to Spark, throw exception java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field epic.features.BrownClusterFeaturizer.epic$features$BrownClusterFeaturizer$$clusterFeatures of type scala.collection.immutable.Map in instance of epic.features.BrownClusterFeaturizer
Who can help me, thanks a lot.
from epic.
@ltao80 I never got it to work and gave up. I'd be curious to hear from anyone else with a detailed solution.
from epic.
@acvogel Thank you for your reply, I gave up too, I change to use Stanford NLP
from epic.
I'm facing the same problem (see here [1]). I've tried @JSantosP suggestion and added several dependencies to the SparkConf
.
val path = "/home/.../.../spark-fun/jars/"
val conf = new SparkConf().setAppName("wordCount").setJars(Seq(
path + "epic_2.10-0.3.jar",
path + "epic-ner-en-conll_2.10-2015.1.25.jar",
path + "nak_2.10-1.3.jar",
path + "scala-logging-api_2.10-2.1.2.jar",
path + "scala-logging-slf4j_2.10-2.1.2.jar",
path + "breeze_2.10-0.11-M0.jar",
path + "spark-assembly-1.5.2-hadoop2.6.0.jar",
path + "spark-fun-assembly-1.0.jar"
))
Do I need the path here? I also wonder, why I should add these jars to the SparkConf
. Using a fat jar that was generated with sbt assembly
should be enough, right? The project dependency tree looks like [2]. Do I really need to add all of these dependencies to the SparkConf
?
[1] https://github.com/Tooa/spark-fun
[2] https://gist.github.com/Tooa/a2d364d7d457c64dd68f
from epic.
Related Issues (20)
- Compilation failed HOT 3
- EpicSeqDemo doesn't compile
- correct models for 0.4-SNAPSHOT HOT 2
- "Parsing" with gold segmentation HOT 2
- Can't build - module not found: org.scalanlp#breeze_2.10;0.12-SNAPSHOT HOT 2
- org.scalanlp#breeze_2.11;0.12-SNAPSHOT: not found HOT 1
- Where is the parsing model?
- Serialization with Epic and Breeze Dependencies
- Exception in thread "main" java.lang.NullPointerException
- at org.mapdb.Volume$ByteBufferVol.getLong(Volume.java:300) HOT 1
- Lacks documentation HOT 1
- Exception in thread "main" java.lang.IllegalAccessError: DB has been closed
- Publish epic-parser-en-span for scala 2.12 HOT 3
- en-sent-segmenter.model.ser.gz is incorrectly deserialized in scala 2.12
- Implementation of CRF parser in another language HOT 1
- Dependencies Old! Bug HOT 2
- Couldn't deserialize model
- Failed to train CTB
- github code is not aligned with what published on Maven?
- POS tagging fails on word "1stgeneration" with java.lang.AssertionError HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from epic.