Comments (5)
This project is all pretty raw and early on, but it already exposes (some of) the Spark APIs (see README.md).
So you could definitely access the Spark dataframe APIs via this. I'll try to add some examples soon, but in the meantime the simplest way would be to mimic the relevant bits of bin/spark-node
(in particular, create a sqlContext
object by calling spark.sqlContext(..)
.
from apache-spark-node.
Thanks @henridf for your fast response.
Do you think it is feasible to implement the wordcount.py
script using the current API (with some fiddling of course)? I am currently calling spark-submit
from node..
It goes like this:
sc = SparkContext(appName="PythonWordCount")
lines = sc.textFile(sys.argv[1], 1)
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
.reduceByKey(add)
output = counts.collect()
for (word, count) in output:
print("%s: %i" % (word, count))
sc.stop()
from apache-spark-node.
Yes! It is possible to do word count with the current API. Check out the example that I just added to the README.
You'll notice that it uses the DataFrame APIs rather than the RDD APIs like wordcount.py
, which is preferable (where that is possible).
from apache-spark-node.
Wow thank you very much, this is great! I will try to contribute to this project over the next few weeks as I incorporate spark into my current project.
from apache-spark-node.
Cool! Keep the questions/feedback coming as you make progress.
from apache-spark-node.
Related Issues (20)
- Write up instructions for using with a notebook HOT 6
- Failure at installation HOT 2
- How to connect to s Spark Standalone Cluster? HOT 2
- Question: Is it somehow possible to use a Context and execute statements in parallel? HOT 6
- Async APIs HOT 1
- Support concurrent jobs HOT 1
- Add promise version of async functions
- Make spark-node ES7-aware
- unionAll not working? HOT 2
- Problem with spark-node HOT 4
- Failed to start spark-node with No Java runtime present error. HOT 1
- Querying a cassandra DB via spark HOT 3
- Would you like the "spark" module name on npm? HOT 3
- Streaming support from kafka ? HOT 1
- unable to install HOT 5
- Read data from mysql HOT 5
- Unable to install module with node version of 8.1.4 HOT 1
- node-gyp binding error on windows10 HOT 2
- Error with ASSEMBLY_JAR
- Let's reactivate the project?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apache-spark-node.