Coder Social home page Coder Social logo

Build and publish docs about apache-spark-node HOT 4 OPEN

henridf avatar henridf commented on August 16, 2024
Build and publish docs

from apache-spark-node.

Comments (4)

tobilg avatar tobilg commented on August 16, 2024

Could you also add some "developer docs" containing some concepts you're following? For example, I'd like to understand why you chose to wrap the functions etc. as JS. I thought that this would be usable OOTB (with some additional, "ugly" JS code, yes), but maybe I didn't understand this correctly.

Some insights would be great! Thanks in advance...

from apache-spark-node.

henridf avatar henridf commented on August 16, 2024

@tobilg yes, some sort of high-level docs explaining the design choices tradeoffs are a good idea. It's probably still early to write those because all of those concepts are still very much in flux, but in the meantime, here are some points related to your question.

I initially started out figuring that it would indeed be possible to use the imported objects without a wrapper. But pretty quickly I ran into issues the required at least some level of wrapping.

One example is dealing with defaults (two sample occurrences: https://github.com/henridf/apache-spark-node/blob/master/lib/DataFrame.js#L116 and https://github.com/henridf/apache-spark-node/blob/master/lib/sqlContext.js#L56)

Also, converting to/from the native representation (for example, https://github.com/henridf/apache-spark-node/blob/master/lib/DataFrame.js#L402). When using directly the java objects, head would return a bunch of opaque row java objects, which the user would have no idea how to use.

And of course, there's the issue of documenting what functions are available and how to use them (pointing people to the scala documentation wouldn't work very well...).

I anticipate that over time, the wrappers will do more such "convenience" work to make it more possible to use spark in a way that feels "node-like". This is similar to the python wrapper.

from apache-spark-node.

tobilg avatar tobilg commented on August 16, 2024

Thanks a lot for taking this. I already understand your design choices much better now. As I mostly use dataframe.toJSON(), if don't really struggle with the native data types, and therefore probably didn't notice the hurdles with the native representations.

I like the idea of providing convenience wrappers, but one downside what I can think of could be that once Spark changes it's APIs, you'll have to potentially rewrite a lot of code. Or are you using some kind of code generation/transpilation (from the Scala sources)?

from apache-spark-node.

henridf avatar henridf commented on August 16, 2024

Once things are in decent shape, I'm hoping that keeping up with Spark API
changes won't be a huge burden - generally they tend to avoid make breaking
API changes, it's more a case of the API surface that expands. We'll see
how that plays out in practice, but it seems to have worked reasonably well
for pyspark.

I've done the first pass manually from scala sources, with heavy use of
emacs/regexes. Then hand-tweaking.... Not an ideal process, but again it's
more of a one-time thing (I hope!).

On 9 December 2015 at 00:25, Tobi [email protected] wrote:

Thanks a lot for taking this. I already understand your design choices
much better now. As I mostly use dataframe.toJSON(), if don't really
struggle with the native data types, and therefore probably didn't notice
the hurdles with the native representations.

I like the idea of providing convenience wrappers, but one downside what I
can think of could be that once Spark changes it's APIs, you'll have to
potentially rewrite a lot of code. Or are you using some kind of code
generation/transpilation (from the Scala sources)?


Reply to this email directly or view it on GitHub
#9 (comment)
.

from apache-spark-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.