Coder Social home page Coder Social logo

Comments (3)

piccolbo avatar piccolbo commented on July 26, 2024

These are two orthogonal issues in my opinion. How do we replace the
dangerous backend.parameters without repeating the hadoop manual inside rmr
and disallowing things like -input and -output? The other is: which options
should be set at the package level and which at the job level? For the
former, I put in a deprecation warning so as to hear what people actually
do with it with the hope of formalizing the meaningful uses, but it looks
like the list is long and ever growing so I am kind of stalled on that
front. Maybe the right thing to do is just to disallow some options as
dangerous or already used by rmr2 itself and allow anything else, until
proven dangerous. For the latter, I have to admit I went for case by case
decisions. With keyval.length I probably made the wrong decision, but
instead of backtracking I hope to go in a different direction. I would love
to have a simple criterion that can tell me whether a parameter should be
package level or job level. Or should we just allow to set everything at
the job level, with the package level as a default?

On Mon, Jun 3, 2013 at 10:12 PM, Jamie F Olson [email protected]:

If backend.parameters is considered deprecated, what is the best-practice
for things like rmr-wide options. For example, a multi-purpose Hadoop
cluster is unlikely to have the configuration optimized for R tasks. In
particular, things like mapred.child.java.opts are likely to be set for
Java jobs, allocating a large amount of memory to the JVM. mapreduce jobs
need to either reduce maximum heap space allocated to the JVM (128 MB is as
low as I could go) or increase mapred.job.map.memory.mb which is likely
to make everyone else angry at you since this is probably configured for
your cluster's specific hardware to allow efficient distribution of tasks.

Is there/should there be another way to set rmr Hadoop parameters (as
opposed to job-specific parameters)?


Reply to this email directly or view it on GitHubhttps://github.com//issues/49
.

from rmr2.

jamiefolson avatar jamiefolson commented on July 26, 2024

I agree that they're orthogonal and frankly was just curious if the status had changed since the last comment on backend.parameters I could find.

I think the case-by-case argument is pretty reasonable, since it encourages you to actively think about it fairly regularly and it is less likely to lead to potential surprises caused by conflicting D= arguments. It's only things like D="mapred.child.java.opts=-server -Xmx128m -Djava.net.preferIPv4Stack=true" that I expect to use in every case for a particular Hadoop configuration and that's obscure enough that there probably wouldn't be a good argument for incorporating it into rmr, plus it could change and there seem to be similar-ish parameters that vary with Hadoop distribution. On the flip side, it feels like a hadoop configuration parameter, not a job configuration parameter.

from rmr2.

piccolbo avatar piccolbo commented on July 26, 2024

We now have backend params in rmr.options

from rmr2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.