Coder Social home page Coder Social logo

deprecate LastError? about biocparallel HOT 8 CLOSED

bioconductor avatar bioconductor commented on August 20, 2024
deprecate LastError?

from biocparallel.

Comments (8)

lawremi avatar lawremi commented on August 20, 2024

I have no opinion, as long as an error is still thrown when a computation
fails. Not throwing an error means that code needs to explicitly check for
errors in the result before proceeding, and that's annoying. The other key
feature is ease of gathering results that did succeed, even after the
computation fails. And resuming those that failed. As long as those
features still exist, I don't care about the actual API.

Honestly, we have struggled to use the BiocParallel BatchJobs backend at
Genentech, probably due to issues with NFS. So we don't have much practice
with it.

Somewhat tangential: synchronous execution is more restrictive than useful
when it comes to execution of batch jobs that take hours. Gabe Becker has
implemented an asynchronous mode as a patch to BiocParallel. Not sure how
mature it is.

On Fri, May 15, 2015 at 6:23 PM, vobencha [email protected] wrote:

Hi Michel and others,

I've been re-working the error handling and logging in BiocParallel. I'd
like to deprecate a few pieces of older error code and wanted to get some
feedback.

(1) Remove LastError infrastructure:

Currently BatchJobsParam is the only param that still uses LastError. I'd
like to deprecate LastError, bplasterror() and bpresume() unless others
find them useful. It looks like pure BatchJobs does not have this
functionality so I'm wondering about motivation and utility. Were they
added as more of an experiment and haven't proven useful enough to be added
to BatchJobs?

In the current BiocParallel, .try() returns errors as conditions. This
tames the output and allows traceback from each worker to be accessed with
attr():

res <- bplapply(list(1, "2", 3), sqrt)
res
[[1]]
[1] 1

[[2]]
[1] "non-numeric argument to mathematical function"
traceback() available as 'attr(x, "traceback")'

[[3]]
[1] 1.732051

tail(attr(res[[2]], "traceback"))
[1] call <- sapply(sys.calls(), deparse)
[2] e <- structure(e, class = c("remote-error", "condition"),
[3] traceback = capture.output(traceback(call)))
[4] invokeRestart("abort", e)
[5] }, "non-numeric argument to mathematical function", quote(FUN(...)))
[6] 1: h(simpleError(msg, call))

Given this behavior, my thoughts were that LastError is no longer
necessary.

As for bpresume(), partial results are now returned with the error
messages so successful computations are not lost. I think BatchJobs offers
a resume-type mechanism through resetJobs?Knowing which results were
successful and resubmitting unfinished jobs is useful in the scheduled
cluster setting but I'm not sure bpresume() saw much use interactively.

Opinions?

(2) catch.error and stop.on.error fields not mutually exclusive:

There is a note in the BatchJobsParam code that says these flags are
mutually exclusive. I'm not sure why they need to be. 'stop.on.error' seems
more like a special case of 'catch.errors'. Ideally we'd have only one
field for errors with multiple options, maybe something like,

errors = c("all", "none", "stop.on.error")

Is anyone opposed to consolidating error fields into a single character vs
the two logicals?

(3) remove cache.warnings variable from .try():

Prior to Bioconductor 3.1, .try() did not catch warnings. Maybe that was
the intention for 'cache.warnings' but it wasn't fully implemented? I've
added a warning handler to .try() that does catch and return the warnings.

Is it ok if I remove cache.warnings?

Thanks.
Valerie


Reply to this email directly or view it on GitHub
#40.

from biocparallel.

mllg avatar mllg commented on August 20, 2024

Hi Valerie,

(1) Remove LastError infrastructure:

Currently BatchJobsParam is the only param that still uses LastError. I'd like to deprecate LastError, bplasterror() and bpresume() unless others find them useful. It looks like pure BatchJobs does not have this functionality so I'm wondering about motivation and utility. Were they added as more of an experiment and haven't proven useful enough to be added to BatchJobs?

On batch systems the execution is asynchronous and therefore BatchJobs supports partial results out of the box: Instead of applying a function and directly getting the results, you have to wait for completion and then collect them from the file system. If some jobs failed with an exception, you can fix the problem, resubmit only the failed jobs and then collect the results of all jobs in one go.
The LastError class was my attempt to allow for a similar procedure (a) while using the interface of BiocParallel and (b) with other backends. This turned out uglier than I thought because some backends are quite limited in regards of error handling, e.g. you sometimes have to compose the user function with a try() and catch the errors yourself. LastError also allows to switch backends , e.g. to take jobs which failed with BatchJobsParam() and resume computation using MulticoreParam(). If none of the other backends uses LastError anymore, I think the code base could be drastically simplified here.

(2) catch.error and stop.on.error fields not mutually exclusive:

There is a note in the BatchJobsParam code that says these flags are mutually exclusive. I'm not sure why they need to be. 'stop.on.error' seems more like a special case of 'catch.errors'. Ideally we'd have only one field for errors with multiple options, maybe something like,

errors = c("all", "none", "stop.on.error")

Is anyone opposed to consolidating error fields into a single character vs the two logicals?

Fine for me.

(3) remove cache.warnings variable from .try():

Prior to Bioconductor 3.1, .try() did not catch warnings. Maybe that was the intention for 'cache.warnings' but it wasn't fully implemented? I've added a warning handler to .try() that does catch and return the warnings.

Is it ok if I remove cache.warnings?

I think you're right, it was never really finished. Also fine for me.

from biocparallel.

vobencha avatar vobencha commented on August 20, 2024

Thanks guys.

It sounds like we can get rid of LastError. All back-ends are now capable of returning partial results along with error messages.

We should probably keep bpresume(). BatchJobs already has the ability to identify and re-submit failures but I need to do a little work on the other 3 to unify things.

As far as the asynchronous approach, I've implemented this for SnowParam and MulticoreParam in BiocParallel::bpdynamicClusterApply(). Is Gabe's patch for BatchJobsParam only or was it intended for other back-ends as well?

I take it the NFS problems apply to using BatchJobs in general, not BiocParallel::BatchJobsParam, right? Just making sure I haven't done something to make BiocParallel::BatchJobsParam less functional.

FYI, I've added a vignette on error handling and logging. SnowParam and MulticoreParam can use futile.logger syntax to capture messages, set a threshold etc.

SnowParam(log = TRUE, threshold = "WARN")

http://www.bioconductor.org/packages/3.2/bioc/vignettes/BiocParallel/inst/doc/Errors_Logs_And_Debugging.pdf

from biocparallel.

lawremi avatar lawremi commented on August 20, 2024

The NFS issues seem to present more often when BatchJobs is used via
BiocParallel. Probably due to the order of the operations, and how they are
executed in quick succession. It's a long-standing issue.

What does bpdynamicClusterApply() do exactly? I think Gabe modified only
the BatchJobsParam, but we wanted to generalize it. Please coordinate with
Gabe directly, so we do not duplicate efforts.

Thanks,
Michael

On Mon, May 18, 2015 at 8:26 AM, vobencha [email protected] wrote:

Thanks guys.

It sounds like we can get rid of LastError. All back-ends are now capable
of returning partial results along with error messages.

We should probably keep bpresume(). BatchJobs already has the ability to
identify and re-submit failures but I need to do a little work on the other
3 to unify things.

As far as the asynchronous approach, I've implemented this for SnowParam
and MulticoreParam in BiocParallel::bpdynamicClusterApply(). Is Gabe's
patch for BatchJobsParam only or was it intended for other back-ends as
well?

I take it the NFS problems apply to using BatchJobs in general, not
BiocParallel::BatchJobsParam, right? Just making sure I haven't done
something to make BiocParallel::BatchJobsParam less functional.

FYI, I've added a vignette on error handling and logging. SnowParam and
MulticoreParam can use futile.logger syntax to capture messages, set a
threshold etc.

SnowParam(log = TRUE, threshold = "WARN")

http://www.bioconductor.org/packages/3.2/bioc/vignettes/BiocParallel/inst/doc/Errors_Logs_And_Debugging.pdf


Reply to this email directly or view it on GitHub
#40 (comment)
.

from biocparallel.

vobencha avatar vobencha commented on August 20, 2024

bpdynamicClusterApply() loads up all workers then checks for a completed result with parallel:::recvOneData(). The completed result is processed (ie, logs written out, results saved) and a new job is loaded. This continues until all jobs are finished. Results are collected as they complete, not in the order they were loaded.

I'll write Gabe with this info.

from biocparallel.

mtmorgan avatar mtmorgan commented on August 20, 2024

On 05/18/2015 09:52 AM, vobencha wrote:

bpdynamicClusterApply() loads up all workers then checks for a completed result
with parallel:::recvOneData(). The completed result is processed (ie, logs
written out, results saved) and a new job is loaded. This continues until all
jobs are finished. Results are collected as they complete, not in the order they
were loaded.

I'll write Gabe with this info.

just guessing but maybe 'async' is meant to be submit job, go away and do other
things, check on job progress and maybe retrieve some results, go away and do
more things (I think this is how BatchJobs works...)

Martin


Reply to this email directly or view it on GitHub
#40 (comment).

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

from biocparallel.

lawremi avatar lawremi commented on August 20, 2024

That's right Martin. Gabe has a fairly elegant (sounding) object-oriented
solution based on the concept of "promises" (a better term might be
"future"). He hacked it together in a morning a few weeks ago and hasn't
had time to revisit/generalize it.

On Mon, May 18, 2015 at 9:57 AM, Martin Morgan [email protected]
wrote:

On 05/18/2015 09:52 AM, vobencha wrote:

bpdynamicClusterApply() loads up all workers then checks for a completed
result
with parallel:::recvOneData(). The completed result is processed (ie,
logs
written out, results saved) and a new job is loaded. This continues
until all
jobs are finished. Results are collected as they complete, not in the
order they
were loaded.

I'll write Gabe with this info.

just guessing but maybe 'async' is meant to be submit job, go away and do
other
things, check on job progress and maybe retrieve some results, go away and
do
more things (I think this is how BatchJobs works...)

Martin


Reply to this email directly or view it on GitHub
<
#40 (comment)
.

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


Reply to this email directly or view it on GitHub
#40 (comment)
.

from biocparallel.

vobencha avatar vobencha commented on August 20, 2024

OK, thanks for clarifying. That's (obviously) not what I've implemented. Gabe has written me offline and I'll follow up with him. Closing this now since I think all error-related questions have been answered.

from biocparallel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.