Comments (5)
Unfortunately I can't repro that one.
Does the problem happen locally, on the cluster, or both? What's the type of the key you're trying to join on?
This works for me:
(defn foo [x y](prn x y)
{:x x, :y y})
(deftest test-join
(let [xs (pig/return [[1 "a"]
[1 "b"]
[2 "a"]])
ys (pig/return [[1 "a"]
[2 "b"]
[2 "a"]])
command (pig/join [(xs :on first)
(ys :on first)]
foo)](is %28= %28pig/dump command%29
[{:x [2 "a"], :y [2 "b"]}
{:x [2 "a"], :y [2 "a"]}
{:x [1 "a"], :y [1 "a"]}
{:x [1 "b"], :y [1 "a"]}]))))
I can also print from the function:
=> (test-join)
[2 "a"] [2 "b"]
[2 "a"] [2 "a"]
[1 "a"] [1 "a"]
[1 "b"] [1 "a"]
nil
Sometimes when running locally, code will execute on other threads. At least for CCW, this causes it to appear in the console instead of the REPL, which is kind of annoying. If you're using CCW, could you check the console output? If not, what editor are you using?
To repro, what commands are you using before the join? Are you loading data from a file, doing any transformations, etc?
Thanks,
Matt
On Sunday, March 30, 2014 at 5:57 PM, Jeff Terrell wrote:
With PigPen 0.2.3, I was using join, but instead of specifying an anonymous function inline with the join call, I defnd a function and just used the name of the function. In other words, instead of:
(join [(xs :on first) (ys :on first)](fn [x y] ...))...I was doing:
(defn foo [x y] ...) (join [(xs :on first) (ys :on first)] foo)The second version produced output with the same structure as the first version, except that there were nils in most places. My guess is that the macros aren't quite evaluating things properly, but I don't know this for sure.
I can't share more details, unfortunately, as they are proprietary. Although, if you're having difficulty reproducing this issue, I can try to reproduce it in a way that I can share.
Relatedly, is there a good reason why my print statements don't work in the join function? If that's easy to fix, that would be helpful for my debugging.
Thanks very much!
-Jeff T.—
Reply to this email directly or view it on GitHub (#19).
from pigpen.
I'm running locally, in a lein repl
session. I'm using vim to edit the code.
I'm trying to join vectors. The key function for each vector is simply first
.
I am doing a variety of transformations before the join, but I am not loading from a file.
I'll try to create a reproducible failure case tonight or tomorrow.
from pigpen.
Thanks. What's the data type of the join key?
Are you joining large maps or data structures? Or is it joining numbers, strings, keywords, or some other primitive?
The thread-switching happens when you're locally reading from a file, so that's the only reason I can think of for the printing not working.
The example I listed before prints when I run from a lein repl too.
Let me know what you can come up with for a repro case!
-Matt
On Sunday, March 30, 2014 at 6:50 PM, Jeff Terrell wrote:
I'm running locally, in a lein repl session. I'm using vim to edit the code.
I'm trying to join vectors. The key function for each vector is simply first.
I am doing a variety of transformations before the join, but I am not loading from a file.
I'll try to create a reproducible failure case tonight or tomorrow.—
Reply to this email directly or view it on GitHub (#19 (comment)).
from pigpen.
I'm joining on strings, so yeah, it's a primitive.
Heh, I guess it's on me to reproduce this, then—you've certainly done your due diligence. Thanks!
from pigpen.
I followed up with Jeff on another thread & we found that the problem was a stale fn in the REPL. Restarting the REPL fixed the issue.
Right now I'm memoizing user functions based on what you pass to the pigpen operator. This has the unfortunate side effect of using stale versions of named functions. In your case this means that if you load foo, load the join, and then modify foo, it'll use the first version.
The reason for this is historical and for performance. I never want to re-eval the same code on the cluster and on the cluster you never change the code, hence the memoization. In the past, defining a function not-inline wasn't supported so this wasn't a problem.
Fix coming soon...
-Matt
On Sunday, March 30, 2014 at 7:08 PM, Jeff Terrell wrote:
I'm joining on strings, so yeah, it's a primitive.
Heh, I guess it's on me to reproduce this, then—you've certainly done your due diligence. Thanks!—
Reply to this email directly or view it on GitHub (#19 (comment)).
from pigpen.
Related Issues (20)
- Strange behavior of count distinct HOT 2
- Add support for distributed cache on the Cascading platform
- Release HOT 2
- Cascading: Optimize co-group with all folds
- Cascading: Add docs & tutorial
- Cascading: Update parquet and avro storage to work with cascading
- CUBE/ROLLUP in PigPen HOT 2
- Libraries/Functions in closures HOT 7
- Should locally executed load functions support compression? HOT 2
- Weird error when used with prismatic plumbing HOT 13
- Use cascading-hadoop2-mr1 by default HOT 10
- allow custom properties to be passed to FlowConnector when creating a flow HOT 2
- Add a pigpen.pig/dump command
- Tutorial error: Pig version 0.12.0-cdh5.4.2,0.14 is right. HOT 5
- Hadoop Versions lists hadoop-client twice in dependencies. HOT 1
- clojure.lang.ExceptionInfo: :auto not supported on headerless data. {} HOT 17
- pigpen.core store functions don't quite work HOT 2
- Incorrect script generation with large number of fields (parquet) HOT 4
- Doc CSS is broken HOT 1
- Is this project being maintained? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pigpen.