The objectives could be: configuration mechanics should be mor

This project might be good for inspiration: <a href="https://github.com/apache/sys

There seems to be three intersting places for how to create s that provide a goo

Enhance configuration mechanics about josimtext HOT 3 OPEN

fmarten commented on July 26, 2024

Enhance configuration mechanics

from josimtext.

Comments (3)

fmarten commented on July 26, 2024

This project might be good for inspiration:
https://github.com/apache/systemml

from josimtext.

fmarten commented on July 26, 2024

There seems to be three intersting places for how to create scripts that provide a good entry point into Spark.

I am not yet sure how they fit together, though.

But you can see that they favor you solution with explicitly passing Spark configuration, such as --driver-memory. What I do not like is how they have hard coded the defaults.

What I like is that they have a single entry point.

And I have a suggestion how this would be possible in our situation, even with the concerns you have mentioned (having a fast starting point for researchers with an overview of all model params and no need to write them donw manually). The solution could be extracting the model params to an extra key-value file and then solely providing this key-value file for each "method". The nice part about this idea is, that we can later regenerate such a file and include it into the output folder. (That is by the way similar to what Spark does within MLLibs model persistence.)

My main point is, that a single entry point, would reduce boilerplate and make it easier to resolve issues in the scripts.

Take for example any of the 20 scripts last argument, which is <config.sh>. If you open any of those scripts you see that this <config.sh> is sourced and then some variables are used which are never defined before. A reader might assume that this <config.sh> contains those variables and then assume that it is in the config folder. But it involves enough reasoning to question how explicit this is.
Another problem is the lack of naming the model parameters on the command line.

from josimtext.

alexanderpanchenko commented on July 26, 2024

But you can see that they favor you solution with explicitly passing Spark configuration, such as --driver-memory.

I thought again about this and ready to say that I am very much in favor of such explicit setting spark, not via env. vars.

What I do not like is how they have hard coded the defaults.

yeah, what we do now seems to be even more advanced

create scripts that provide a good entry point into Spark.

my main bias is to make the scripts as simple as possible, which is not really the case in this project. i want them ideally to have no while or for loops, no functions, and as little ifs as possible, so even a kid (=researcher) can read such bash script. in this project, the scripts are quite complex.

The nice part about this idea is, that we can later regenerate such a file and include it into the output folder. (That is by the way similar to what Spark does within MLLibs model persistence.)

I strongly oppose "later" thing. If it is a benefit, then we need to do it now or do not even consider it. For me actually it is not clear how you will do it. Though reflection?

Please answer: which problem you are trying to solve by changing the configuration? Please answer this question very clearly and with as much details as possible. For now, I cannot really understand it and this is very important.

from josimtext.

Enhance configuration mechanics about josimtext HOT 3 OPEN

Comments (3)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent