Coder Social home page Coder Social logo

ropensci-books / targets Goto Github PK

View Code? Open in Web Editor NEW
19.0 19.0 25.0 7.28 MB

User manual of the targets R pacakge

Home Page: https://books.ropensci.org/targets

License: Other

data-science high-performance-computing pipeline r reporoducible-research reproducibility rstats statistics targets

targets's People

Contributors

aguynamedryan avatar billdenney avatar bisaloo avatar danlooo avatar gvelasq avatar jameslairdsmith avatar jmbuhr avatar johnwilshire avatar kaiaragaki avatar kkmann avatar liutiming avatar llrs avatar luciorq avatar maciejmotyka avatar mikemahoney218 avatar psychobas avatar robitalec avatar samkimhis avatar svraka avatar wlandau avatar wlandau-lilly avatar yyzeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

targets's Issues

Add link to targetopia post in the intro?

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Even if the targetopia is mentioned twice in the book, I'd find it useful to see a sentence about it in the introduction, especially to put tarchetypes on readers' radar early on?

Chapter on configuration and sub-projects

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

The issue of managing sub-projects comes up a lot, and I think I should write a chapter to explain best practices. This chapter would also cover YAML-based configuration, e.g. _targets.yaml/tar_config_get()/tar_config_set(). I plan to write it sometime after I address these issues:

targets-minimal repo doesn't exist

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • I am reasonably sure this is a genuine bug in this repository's code and most likely not a user error. (If you run into an error and do not know the cause, please submit a "Trouble" issue instead.)

Description

Chapter 2 mentions a targets-minimal repo that contains the source code and data for the walkthrough example, but the link fails.

Discuss new patterns and dynamic branching emulation

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Please describe the potential feature.

To help us read any code you include (optional) please try to follow the tidyverse style guide. The style_text() and style_file() functions from the styler package make it easier.

Consider including `make` arguments in drake comparison?

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

The current comparison with drake does not include the many arguments of drake::make that may or may not have been implemented elsewhere in targets. For example, I recently learned about tar_option_set(storage = "remote", retrieval = "remote"). The proposal is to include the arguments of make in the "What about drake?" section.

Clarify documentation about `findGlobals()` edge cases in a central place, and add comment on S3 methods

I propose adding a new "Edge cases that fail" subsection to the walkthrough chapter (or elsewhere), and collating all code dependencies failure cases there.

This could include:

  1. Namespaced function calls pkg::func() breaking things
  2. The fact that targets does not correctly invalidate targets generated by S3 methods when those methods get updated.

For the second, consider

library(targets)

my_generic <- function(x) {
  UseMethod("my_generic")
}

my_generic.default <- function(x) {
  sum(x^3)
}

list(
  tar_target(
    vec,
    rnorm(100)
  ),
  tar_target(
    result,
    my_generic(vec)
  )
)

Run tar_make(), change my_generic.default() to anything you want, and note that running tar_make() again does not re-run the result target.

Document changes to configuration

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

Maintainer out of office: September 13-20, 2021

Prework

Description

I will be on vacation from September 13 through 20, and I will not have internet access most of the time. If you have experience with targets, it would be great if you would help answer new questions posted to issues and discussions. There are few new posts in a typical week. Thanks in advance.

Small typos in drake chapter

[`targets`](https://github.com/ropensci/targets) is the successor of [`drake`](https://github.com/ropensci/drake), an older pipeline tool. [`drake`](https://github.com/ropensci/drake) is [superseded](https://lifecycle.r-lib.org/articles/stages.html#superseded), which means there are no plans for new features or discretionary enhancements, but basic maintenance and support and will continue indefinitely. Existing projects that use [`drake`](https://github.com/ropensci/drake) can safely continue to use [`drake`](https://github.com/ropensci/drake), and there is no need to retrofit [`targets`](https://github.com/ropensci/targets). New projects should use [`targets`](https://github.com/ropensci/targets) because it friendlier and more robust.

Suggested changes in bold here:

targets is the successor of drake, an older pipeline tool. drake is superseded, which means there are no plans for new features or discretionary enhancements, but basic maintenance and support and will continue indefinitely. Existing projects that use drake can safely continue to use drake, and there is no need to retrofit targets. New projects should use targets because it is friendlier and more robust.

Load packages in consistent manner in document

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

The section "Target script file" from Ch. 2 suggests loading packages other than targets via tar_option_set instead of library.

https://books.ropensci.org/targets/walkthrough.html#target-script-file

However, the section "Workspace" from Ch. 4 is using library(tidyverse) in _targets.R.

https://books.ropensci.org/targets/debugging.html#workspaces

I'd suggest the change below.

atusy@04512f5

Please describe the issue.

To help us read any code you include (optional) please try to follow the tidyverse style guide. The style_text() and style_file() functions from the styler package make it easier.

Explain that directories as file targets should be nonempty

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

Should this be `callr_function = NULL`?

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • I am reasonably sure this is a genuine bug in this repository's code and most likely not a user error. (If you run into an error and do not know the cause, please submit a "Trouble" issue instead.)

Description

Looks like there's just a typo here:

https://github.com/wlandau/targets-manual/blob/fbcfc9a3b1b7d9761a491b41cc3ae3672c32a34e/practice.Rmd#L65

I would even just say targets::tar_make(callr_function = NULL) there

Batched replication and static branching

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

tar_rep(), tar_map(), and tar_combine() in tarchetypes.

explain tar_option_set(error = โ€œnullโ€) in the debugging chapter

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Cloud computing guide

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

There is a community misconception that targets (and drake) do not have HPC capabilities beyond parallel computing over the cores of a single local machine. On the contrary, both tools support distributed computing on clusters (guides here and here) and the workers do not necessarily need access to the file system of the master process. (In fact, I designed targets with an efficient dynamic branching model to go beyond the inherent limitations of map-reduce-like scheduling algorithms and conserve computing resources.) However, I do realize that data scientists from smaller institutions do not always have access to clusters, and an increasing number of folks use AWS. AWS ParallelCluster could be a way to deploy pipelines to the cloud without any need to modify targets itself. If it works, we should probably write a tutorial either in the existing HPC chapter or a chapter of its own.

ropensci/tarchetypes#8 could be an alternative way to deploy to AWS. The advantage of ropensci/tarchetypes#8 is that we should also get the data versioning capabilities of Metaflow for free, and Metaflow may take care of a lot of the AWS setup. However, each new tar_metaflow() will require its own local R worker in order to avoid blocking the master process, which is not ideal.

Add more mentions of tarchetypes custom invalidation rules?

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

I am getting started with targets (such a cool package / ecosystem of packages!) and started wondering how to create a target with a cue that'd be different from the ones in targets, then I discovered https://docs.ropensci.org/tarchetypes/reference/index.html#section-targets-with-custom-invalidation-rules which is exactly what I need. Should these functions be mentioned from

Faulty Hyperlink in User Manual Chapter 2.4

Prework

Description

The hyperlink is faulty in Chapter 2.4 due to a missing "s" in the hyperlink.

Both graphing functions above visualize the underlying directed acyclic graph (DAG) and tell you how targets are connected. This DAG is indifferent to the order of targets in your pipeline. You will still get the same graph even if you rearrange them. This is because `targets` uses static code analysis to detect the dependencies of each target, and this process does not depend on target order. For details, visit the [dependency detection section of the best practices guide](https://books.ropensci.org/targets/practice.html#dependencies).

Small typos in Advanced topic of HPC chapter

Prework

Description

The same sentence should be fixed in both lines, "workers do no have access" to "workers do not have access".

targets/hpc.Rmd

Line 322 in efdb2d4

* `storage`: Choose whether the parallel workers or the main process is responsible for saving the target's value. For slow network file systems on clusters, `storage = "main"` is often faster for small numbers of targets. For large numbers of targets or low-bandwidth connections between the main and workers, `storage = "worker"` is often faster. Always choose `storage = "main"` if the workers do no have access to the file system with the `_targets/` data store.

targets/hpc.Rmd

Line 323 in efdb2d4

* `retrieval`: Choose whether the parallel workers or the main process is responsible for reading dependency targets from disk. Should usually be set to whatever you choose for `storage` (default). Always choose `retrieval = "main"` if the workers do no have access to the file system with the `_targets/` data store.

Discuss shortcutting and branch selection in the debugging and dynamic branching chapters

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • [n/a] For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
    • [n/a] Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • [n/a] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • [n/a] Readable: format your code according to the tidyverse style guide.

Description

Debugging stochastic pipelines

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

Explain how to debug tar_rep() and friends, especially after ropensci/tarchetypes#111. Also explain how to get help: GitHub discussions, reprexes, etc.

Clarify when (not) to use tar_read()

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

In https://books.ropensci.org/targets/walkthrough.html#read-your-data it seems tar_read() is only for exploratory analysis, but it's actually recommended for R Markdown reports. Maybe this section should have a footnote/link to the section about literate programming?

Then, I've learnt (in rOpenSci slack) that using tar_read() in a helper function is an anti-pattern, that if a function uses a target the target should be an argument. Would this tip/warning have its place somewhere? (maybe in best practice; as well as in https://books.ropensci.org/targets/practices.html#dependencies?)

Typo repeated word in Chapter 3 on Debugging

Prework

  • I understand and agree to this repository's code of conduct.
  • I understand and agree to this repository's contributing guidelines.
  • I am reasonably sure this is a genuine bug in this repository's code and most likely not a user error. (If you run into an error and do not know the cause, please submit a "Trouble" issue instead.)

Description

Please describe the bug.

The user manual is a great resource. Thanks for putting it together. Noticed this minor repeated word typo in the debugging chapter the other day.

Reproducible example

Provide a minimal reproducible example with code and output that demonstrates the problem. The reprex() function from the reprex package is extremely helpful for this.

To help us read your code, please follow the tidyverse style guide. The style_text() and style_file() functions from the styler package make it easier.

Expected result

What should have happened? Please be as specific as possible.

  • "This chapter describes solutions to these challenges in terms of both best practices and features in targets."

Diagnostic information

  • A reproducible example.
  • Session info, available through sessionInfo() or reprex(si = TRUE).
  • A stack trace from traceback() or rlang::trace_back().
  • The SHA-1 hash of the GitHub commit of targets currently installed. packageDescription("targets")$GithubSHA1 shows you this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.