Comments (13)
Thanks for the detailed repro! From the error it looks like they have local vars with periods in them, which doesn't play well with the edn/read-string. I should be able to easily exclude those from the closure, but I'll make sure that they aren't required by the closure first.
from pigpen.
@mbossenbroek I'd love to know more about how you derived that (after staring at this code for an hour or two I'm thoroughly confused and suspect I might learn something).
from pigpen.
Certainly! pigpen.code/trap
re-writes your function like this:
=> (let [x 42]
(pigpen.code/trap
(fn [y] (+ x y))))
(pigpen.pig/with-ns pigpen-demo.core (clojure.core/let [x (quote 42)] (fn [y] (+ x y))))
What this returns is an expression, which when evaluated, will evaluate your user function within your namespace, with all of the lexical scope that was present at script generation time. Anything that's bound at that time ends up in that let
that encloses your function. It's a way of freezing everything we know now and reviving it later on a hadoop machine.
I've seen in the past that macro expansion will often leave a bunch of junk in there that's not actually required by the user code. for
is a good example of that.
The java.lang.ClassNotFoundException
is a classic example of Clojure interpreting any symbol with a period as a java class and trying to load it:
=> (eval '(prn x.y))
CompilerException java.lang.ClassNotFoundException: x.y, compiling:(/private/var/folders/54/cllx6y1d0nz92rmz915fgc4mmjkfgm/T/form-init3678987450896995460.clj:1:8)
In pigpen, we take the result of pigpen.code/trap
above, pr-str
it, put it in the script, read it, eval it, and run it. If you're getting that error, that symbol is likely getting into the closure somehow and failing when we try to eval it.
At least that's my guess at this point :)
from pigpen.
oof, well the only thing I think I can offer at this point is that this dotted name that it can't find is probably needed. It's generated here https://github.com/Prismatic/plumbing/blob/master/src/plumbing/graph/positional.clj#L12-L30 and is building a record that is used in place of a map further on in the library for performance.
My worry initially is that I've seen very strange behavior in clojure with regards to records and file load ordering (which was in that case solved by AOT compiling certain namespaces). And injecting a record into the namespace at run time seems like a very easy thing to have break with the pigpen approach.
from pigpen.
@mbossenbroek one other question, can you think of any work arounds for this in the short term? I have some uses for pigpen that will be blocked on a fix. A work around would free me up to continue. Also, let me know if there's anything else I can do to help here.
from pigpen.
None off the top of my head. Sorry I didn't get a chance to look at this yesterday - I'll have something for you today though.
from pigpen.
HA! I have no expectation of you dropping everything and fixing my bugs ;-) I've been already very impressed with your responsiveness and mostly frustrated that I seem unable to fix this myself!
I'm trying tracing some of those trap
calls to see if I can figure out what exactly is getting caught in there.
from pigpen.
I found the problem & it wasn't what I thought it was. It's actually the serialization library that we use, nippy, that doesn't want to deserialize the record. What makes this even weirder is that I can only reproduce the problem if it's using the nippy jar that's AOT'ed into the pigpen jar.
I'll follow up with him & see what we can come up with.
from pigpen.
Ooooooooh so records are serializable and nippy is happily serializing it, but when it goes to deserialize it can't find the reference and it blows up?
Maybe because this is gensym
'ed so probably doesn't result in a class file?
from pigpen.
It seems to happen for normal records too - it looks like the immediate problem is that pigpen uses AOT. When I turn AOT off, it works locally. If all else fails, I can disable AOT for pigpen; I was just using it to generate 32 nearly identical copies of a java class to work around a pig limitation.
The gensym might will be a problem down the road though as one machine will serialize the record and another will deserialize it. If the generated records will have different ids on different machines, it won't be able to deserialize the transported data. If those ids are locked in at jar compilation time (possibly via AOT), then this could work, but then we're back to the AOT problem.
Do you know if there's a way to disable record generation in prismatic's graph? Or at least have it generate stable ids?
from pigpen.
It's possible to disable the record generation, but the cost to performance hurts (at least for us), where we're using this to process a very large stream of data.
That being said... we are AOT'ing our code before putting on the cluster so it's likely we'll be seeing this problem if we went that route anyway.
Generating stable id's is interesting, but goes beyond my understanding of clojure. The way this is used though...it would look like you could just do a hash of the map that is used to generate the record and turn that into a symbol instead of using gensym
from pigpen.
IIRC, you said that disabling AOT resolved this, correct? Could I mark this as closed?
from pigpen.
You may certainly mark it as closed, it is happily running now.
On Mon, May 11, 2015 at 10:39 AM, Matt Bossenbroek <[email protected]
wrote:
IIRC, you said that disabling AOT resolved this, correct? Could I mark
this as closed?—
Reply to this email directly or view it on GitHub
#138 (comment).
from pigpen.
Related Issues (20)
- Strange behavior of count distinct HOT 2
- Add support for distributed cache on the Cascading platform
- Release HOT 2
- Cascading: Optimize co-group with all folds
- Cascading: Add docs & tutorial
- Cascading: Update parquet and avro storage to work with cascading
- CUBE/ROLLUP in PigPen HOT 2
- Libraries/Functions in closures HOT 7
- Should locally executed load functions support compression? HOT 2
- Use cascading-hadoop2-mr1 by default HOT 10
- allow custom properties to be passed to FlowConnector when creating a flow HOT 2
- Add a pigpen.pig/dump command
- Tutorial error: Pig version 0.12.0-cdh5.4.2,0.14 is right. HOT 5
- Hadoop Versions lists hadoop-client twice in dependencies. HOT 1
- clojure.lang.ExceptionInfo: :auto not supported on headerless data. {} HOT 17
- pigpen.core store functions don't quite work HOT 2
- Incorrect script generation with large number of fields (parquet) HOT 4
- Doc CSS is broken HOT 1
- Is this project being maintained? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pigpen.