Coder Social home page Coder Social logo

theislab / cellrank_reproducibility_preprint Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 233.72 MB

Code to reproduce results from the CellRank preprint

Home Page: https://cellrank.org

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.95% Python 0.05% Shell 0.01% R 0.01%
fate machine-learning mapping reproducibility reproducible-research reproducible-science scrna-seq

cellrank_reproducibility_preprint's People

Contributors

marius1311 avatar michalk8 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cellrank_reproducibility_preprint's Issues

Clean up and add the pancreas notebooks

This concerns the main figures 2 and 3 as well as a number of supplemental figures. Add the Palantir pseudotime to the dataset on figshare and add the magic imputed data as an extra array to figshare.

CoC

My remarks:

  • for the paths, instead of get_paths function which return a dictionary, let's create 1 file where the paths are defined as constans - I think it will be more readable doing and that we only import paths that we need, e.g.
from path import CACHE_DIR, FIG_DIR
  • If we go for the approach above, let's also define some naming convention for these constants (e.g. directories ending with _DIR, data-related stuff prefixed with DATA_, caching with CACHE_, etc.)
  • environment.yml or requirements.txt - I'd provide a conda environment.yaml named cellrank-reproduciblity with all the correct package versions (if something requires a different version, then a separate yaml file will be within that directory)
  • I'd also create a small skeleton .ipynb as basis for all notebooks - this should contain e.g. importing the default packages (like scanpy/cellrank/etc), printing the versions and the sections you mention (section not needed, like Plot results in preprocessing notebooks will be removed when filling the notebook up)
  • initials in the notebooks: it's sometimes hard for me to distinguish ml and mk, maybe if we could use different aliases or capitalize it or move it to the front before the date
  • same for the dates, YYYYMMDD is not friendly format (at least for me) to read, I'd include dashes as YYYY-MM-DD.
  • relative paths: do you mean relative to this repo's root or relative to the position of the file/notebook? I assume you mean the latter
  • I'd also make 1 issue for 1 figure or their dependency and do regular PRs

Concept figure

Move the notebook to compute the concept figure, clean it up and prep.

Print all relevant versions

@michalk8 let's keep in mind to print important package versions like FateID, STEMNET or Palantir in the banchmark notebooks as these are not included in cr.logging.print_versions()

Test if the pipeline works

TODOs:

  • add directories to .gitkeep!
  • test download morris data
  • merge the pickles/csvs into 1
  • test loading/preprocessing or Morris data
  • test runtime benchmark
  • test memory benchmark
  • test robustness benchmark

Caching

I haven't yet added this to the README. I'm still going to need scachepy in my notebooks because I don't want to re-compute velocities and my stochastic kernel each time I have to re-generate a figure. I suggest we have a caching directory that mirrors the structure of the data directory. We won't share the actual cached files because they are too large but I will place .gitkeep files so that we have the same folder structure. What are your thoughts on this?

Scattered to do's

I'm collecting to do's from various notebooks here:

  • in the main uncertainty notebook, insert links to to the stochastic MC notebook and also to the robustness notebook
  • make sure supplemental gene trends are saved to the same directory.
  • remove the code of conduct again.

Repo size

For some reason, it's huge... Did any of us commit and data inside?
Inspecting this, it's git objects (245M ./.git/objects).
And 99M ./notebooks

I suggest we prune this once everything is done - I can do a test run on my private fork to see if we can prune the git objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.