Coder Social home page Coder Social logo

Enhance configuration mechanics about josimtext HOT 3 OPEN

fmarten avatar fmarten commented on July 26, 2024
Enhance configuration mechanics

from josimtext.

Comments (3)

fmarten avatar fmarten commented on July 26, 2024

This project might be good for inspiration:
https://github.com/apache/systemml

from josimtext.

fmarten avatar fmarten commented on July 26, 2024

There seems to be three intersting places for how to create scripts that provide a good entry point into Spark.

I am not yet sure how they fit together, though.

But you can see that they favor you solution with explicitly passing Spark configuration, such as --driver-memory. What I do not like is how they have hard coded the defaults.

What I like is that they have a single entry point.

And I have a suggestion how this would be possible in our situation, even with the concerns you have mentioned (having a fast starting point for researchers with an overview of all model params and no need to write them donw manually). The solution could be extracting the model params to an extra key-value file and then solely providing this key-value file for each "method". The nice part about this idea is, that we can later regenerate such a file and include it into the output folder. (That is by the way similar to what Spark does within MLLibs model persistence.)

My main point is, that a single entry point, would reduce boilerplate and make it easier to resolve issues in the scripts.

  • Take for example any of the 20 scripts last argument, which is <config.sh>. If you open any of those scripts you see that this <config.sh> is sourced and then some variables are used which are never defined before. A reader might assume that this <config.sh> contains those variables and then assume that it is in the config folder. But it involves enough reasoning to question how explicit this is.
  • Another problem is the lack of naming the model parameters on the command line.

from josimtext.

alexanderpanchenko avatar alexanderpanchenko commented on July 26, 2024

But you can see that they favor you solution with explicitly passing Spark configuration, such as --driver-memory.

I thought again about this and ready to say that I am very much in favor of such explicit setting spark, not via env. vars.

What I do not like is how they have hard coded the defaults.

yeah, what we do now seems to be even more advanced

create scripts that provide a good entry point into Spark.

my main bias is to make the scripts as simple as possible, which is not really the case in this project. i want them ideally to have no while or for loops, no functions, and as little ifs as possible, so even a kid (=researcher) can read such bash script. in this project, the scripts are quite complex.

The nice part about this idea is, that we can later regenerate such a file and include it into the output folder. (That is by the way similar to what Spark does within MLLibs model persistence.)

I strongly oppose "later" thing. If it is a benefit, then we need to do it now or do not even consider it. For me actually it is not clear how you will do it. Though reflection?

Please answer: which problem you are trying to solve by changing the configuration? Please answer this question very clearly and with as much details as possible. For now, I cannot really understand it and this is very important.

from josimtext.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.