The cellrank_reproducibility_preprint from theislab

Clean up and add the pancreas notebooks

This concerns the main figures 2 and 3 as well as a number of supplemental figures. Add the Palantir pseudotime to the dataset on figshare and add the magic imputed data as an extra array to figshare.

Share the final conda environment.

Had to install a couple new packages, share the final version once we're done.

Untitled Notebook in the root directory

That probably shouldn't be there...

CoC

My remarks:

for the paths, instead of get_paths function which return a dictionary, let's create 1 file where the paths are defined as constans - I think it will be more readable doing and that we only import paths that we need, e.g.

from path import CACHE_DIR, FIG_DIR

If we go for the approach above, let's also define some naming convention for these constants (e.g. directories ending with _DIR, data-related stuff prefixed with DATA_, caching with CACHE_, etc.)
environment.yml or requirements.txt - I'd provide a conda environment.yaml named cellrank-reproduciblity with all the correct package versions (if something requires a different version, then a separate yaml file will be within that directory)
I'd also create a small skeleton .ipynb as basis for all notebooks - this should contain e.g. importing the default packages (like scanpy/cellrank/etc), printing the versions and the sections you mention (section not needed, like Plot results in preprocessing notebooks will be removed when filling the notebook up)
initials in the notebooks: it's sometimes hard for me to distinguish ml and mk, maybe if we could use different aliases or capitalize it or move it to the front before the date
same for the dates, YYYYMMDD is not friendly format (at least for me) to read, I'd include dashes as YYYY-MM-DD.
relative paths: do you mean relative to this repo's root or relative to the position of the file/notebook? I assume you mean the latter
I'd also make 1 issue for 1 figure or their dependency and do regular PRs

Clean up memory performance

Clean up Palantir comparison benchmarking

Clean pancreas main notebook for figure 2

Update comparison benchmarks

Clean up

Palantir notebook (I think you should do this one @Marius1311 )
STEMNET notebook
FateID notebook (almost done)

Clean up the lung analysis

Add skeleton notebook

This should serve as a starting point for the restructuring of notebooks.

Concept figure

Move the notebook to compute the concept figure, clean it up and prep.

Print all relevant versions

@michalk8 let's keep in mind to print important package versions like FateID, STEMNET or Palantir in the banchmark notebooks as these are not included in cr.logging.print_versions()

Clean up robustness analysis

Clean up FateID comparison benchmarking

Clean the main uncertainty notebook

Make sure to add gitkeep files later on for the new cache directly ets.

Clean up STEMNET comparison benchmarking

Clean up runtime performance

Fix date formatting

@michalk8 , I think your date formatting does not follow our guidelines, see https://github.com/theislab/cellrank_reproducibility/wiki/Usage-guidelines

It seems that you went for YYYY-DD-MM, whereas we agreed on YYYY-MM-DD. Can you fix this please? Thanks!

Also in the links please...

Clean the GPCCA toy example notebook

Supplemental figure where we illustrate the idea behind the GPCCA algortithm.

Test if the pipeline works

TODOs:

Caching

I haven't yet added this to the README. I'm still going to need scachepy in my notebooks because I don't want to re-compute velocities and my stochastic kernel each time I have to re-generate a figure. I suggest we have a caching directory that mirrors the structure of the data directory. We won't share the actual cached files because they are too large but I will place .gitkeep files so that we have the same folder structure. What are your thoughts on this?

Prettify the table in the README a bit

The repo is public now, so we can...

add links to nbviewer

Package requirements

Informally:

R.utils
peakMEM
SparseMM
destiny
FateID
RaceID
STEMNET

Scattered to do's

I'm collecting to do's from various notebooks here:

in the main uncertainty notebook, insert links to to the stochastic MC notebook and also to the robustness notebook
make sure supplemental gene trends are saved to the same directory.
remove the code of conduct again.

Repo size

For some reason, it's huge... Did any of us commit and data inside?
Inspecting this, it's git objects (245M ./.git/objects).
And 99M ./notebooks

I suggest we prune this once everything is done - I can do a test run on my private fork to see if we can prune the git objects.

Clean up delta cell differentiation, Fig. 3

I will include this in the main pancreas notebook, that's much easier to maintain, single source of truth etc.

theislab / cellrank_reproducibility_preprint Goto Github PK

cellrank_reproducibility_preprint's People

Contributors

Stargazers

Watchers

cellrank_reproducibility_preprint's Issues

Recommend Projects

Recommend Topics

Recommend Org