Coder Social home page Coder Social logo

Comments (5)

mbossenbroek avatar mbossenbroek commented on July 26, 2024

Unfortunately I can't repro that one.

Does the problem happen locally, on the cluster, or both? What's the type of the key you're trying to join on?

This works for me:

(defn foo [x y](prn x y)
{:x x, :y y})

(deftest test-join
(let [xs (pig/return [[1 "a"]
[1 "b"]
[2 "a"]])
ys (pig/return [[1 "a"]
[2 "b"]
[2 "a"]])
command (pig/join [(xs :on first)
(ys :on first)]
foo)](is %28= %28pig/dump command%29
[{:x [2 "a"], :y [2 "b"]}
{:x [2 "a"], :y [2 "a"]}
{:x [1 "a"], :y [1 "a"]}
{:x [1 "b"], :y [1 "a"]}]))))

I can also print from the function:

=> (test-join)
[2 "a"] [2 "b"]
[2 "a"] [2 "a"]
[1 "a"] [1 "a"]
[1 "b"] [1 "a"]
nil

Sometimes when running locally, code will execute on other threads. At least for CCW, this causes it to appear in the console instead of the REPL, which is kind of annoying. If you're using CCW, could you check the console output? If not, what editor are you using?

To repro, what commands are you using before the join? Are you loading data from a file, doing any transformations, etc?

Thanks,
Matt

On Sunday, March 30, 2014 at 5:57 PM, Jeff Terrell wrote:

With PigPen 0.2.3, I was using join, but instead of specifying an anonymous function inline with the join call, I defnd a function and just used the name of the function. In other words, instead of:
(join [(xs :on first) (ys :on first)](fn [x y] ...))

...I was doing:
(defn foo [x y] ...) (join [(xs :on first) (ys :on first)] foo)

The second version produced output with the same structure as the first version, except that there were nils in most places. My guess is that the macros aren't quite evaluating things properly, but I don't know this for sure.
I can't share more details, unfortunately, as they are proprietary. Although, if you're having difficulty reproducing this issue, I can try to reproduce it in a way that I can share.
Relatedly, is there a good reason why my print statements don't work in the join function? If that's easy to fix, that would be helpful for my debugging.
Thanks very much!
-Jeff T.


Reply to this email directly or view it on GitHub (#19).

from pigpen.

kyptin avatar kyptin commented on July 26, 2024

I'm running locally, in a lein repl session. I'm using vim to edit the code.

I'm trying to join vectors. The key function for each vector is simply first.

I am doing a variety of transformations before the join, but I am not loading from a file.

I'll try to create a reproducible failure case tonight or tomorrow.

from pigpen.

mbossenbroek avatar mbossenbroek commented on July 26, 2024

Thanks. What's the data type of the join key?

Are you joining large maps or data structures? Or is it joining numbers, strings, keywords, or some other primitive?

The thread-switching happens when you're locally reading from a file, so that's the only reason I can think of for the printing not working.

The example I listed before prints when I run from a lein repl too.

Let me know what you can come up with for a repro case!

-Matt

On Sunday, March 30, 2014 at 6:50 PM, Jeff Terrell wrote:

I'm running locally, in a lein repl session. I'm using vim to edit the code.
I'm trying to join vectors. The key function for each vector is simply first.
I am doing a variety of transformations before the join, but I am not loading from a file.
I'll try to create a reproducible failure case tonight or tomorrow.


Reply to this email directly or view it on GitHub (#19 (comment)).

from pigpen.

kyptin avatar kyptin commented on July 26, 2024

I'm joining on strings, so yeah, it's a primitive.

Heh, I guess it's on me to reproduce this, then—you've certainly done your due diligence. Thanks!

from pigpen.

mbossenbroek avatar mbossenbroek commented on July 26, 2024

I followed up with Jeff on another thread & we found that the problem was a stale fn in the REPL. Restarting the REPL fixed the issue.

Right now I'm memoizing user functions based on what you pass to the pigpen operator. This has the unfortunate side effect of using stale versions of named functions. In your case this means that if you load foo, load the join, and then modify foo, it'll use the first version.

The reason for this is historical and for performance. I never want to re-eval the same code on the cluster and on the cluster you never change the code, hence the memoization. In the past, defining a function not-inline wasn't supported so this wasn't a problem.

Fix coming soon...

-Matt

On Sunday, March 30, 2014 at 7:08 PM, Jeff Terrell wrote:

I'm joining on strings, so yeah, it's a primitive.
Heh, I guess it's on me to reproduce this, then—you've certainly done your due diligence. Thanks!


Reply to this email directly or view it on GitHub (#19 (comment)).

from pigpen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.