Coder Social home page Coder Social logo

Comments (5)

henridf avatar henridf commented on July 17, 2024

This project is all pretty raw and early on, but it already exposes (some of) the Spark APIs (see README.md).

So you could definitely access the Spark dataframe APIs via this. I'll try to add some examples soon, but in the meantime the simplest way would be to mimic the relevant bits of bin/spark-node (in particular, create a sqlContext object by calling spark.sqlContext(..).

from apache-spark-node.

bf avatar bf commented on July 17, 2024

Thanks @henridf for your fast response.

Do you think it is feasible to implement the wordcount.py script using the current API (with some fiddling of course)? I am currently calling spark-submit from node..

It goes like this:

    sc = SparkContext(appName="PythonWordCount")
    lines = sc.textFile(sys.argv[1], 1)
    counts = lines.flatMap(lambda x: x.split(' ')) \
                  .map(lambda x: (x, 1)) \
                  .reduceByKey(add)
    output = counts.collect()
    for (word, count) in output:
        print("%s: %i" % (word, count))
    sc.stop()

from apache-spark-node.

henridf avatar henridf commented on July 17, 2024

Yes! It is possible to do word count with the current API. Check out the example that I just added to the README.

You'll notice that it uses the DataFrame APIs rather than the RDD APIs like wordcount.py, which is preferable (where that is possible).

from apache-spark-node.

bf avatar bf commented on July 17, 2024

Wow thank you very much, this is great! I will try to contribute to this project over the next few weeks as I incorporate spark into my current project.

from apache-spark-node.

henridf avatar henridf commented on July 17, 2024

Cool! Keep the questions/feedback coming as you make progress.

from apache-spark-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.