Coder Social home page Coder Social logo

cellbox's Introduction

Introduction

This repository contains the code for https://sanderlab.org

How to Edit the sanderlab.org Website

Edit Text

  1. Edit this data file: https://github.com/sanderlab/sanderlab/edit/master/docs/sanderlabdata.json
  2. Make sure the result is valid JSON: https://jsonformatter.curiousconcept.com/#

Edit Images

Images may be added or removed here: https://github.com/sanderlab/sanderlab/tree/master/docs/images NOTE: Ensure images of people are placed in the people folder versus images for research activities.

Deployment

Wait 5-10 minutes for website to be deployed automatically on sanderlab.org with new changes via the GitHub Pages system, if it does not then contact site administrators. NOTE: Only changes in the docs/ folder will trigger re-deployment.

Projects

Linode-Hosted Project List

This is a list of projects that are available on Linode server. They are accesible via either of the following:

Java-Based

Shiny

Javascript (ALL DEPRECATED)

  • rcytoscapejs: Unpublished
  • alignmentviewer: Version unpublished

cellbox's People

Contributors

cannin avatar danieldritter avatar debanitrkl avatar desmondyuan avatar judyueshen avatar mustardburger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cellbox's Issues

expr_index

Hi,
A naive question here. What do column 2 and 3 in expr_index.txt indicate?
Thanks,
Xiao

Questions about train.py

Issue type

Need help

Summary

Some functions in /cellbox/train.py have some ambiguity in what task they perform. These are crucial to understand to reproduce similar results for Pytorch version of CellBox. Therefore, this issue is for resolving the ambiguity.

Details

  • Line 76 to 79 in train.py, are loss_valid_i and loss_valid_mse_i evaluated on one random batch fetched from args.feed_dicts['valid_set'], or are these losses evaluated on the whole validation set?
  • The eval_model function returns different values with different calls. At line 101 to 103, it returns both the total and mse loss for args.n_batches_eval number of batches on the validation set. At line 109 to 111, it returns only the mse loss for args.n_batches_eval number of batches on the test set. And at line 262 it returns the expression predictions y_hat for the whole test set. Are all of these statements correct?
  • The record_eval.csv file generated after training, using the default training arguments and config file as specified in the README (python scripts/main.py -config=configs/Example.random_partition.json), has test_mse column to be None. Is it the expected behaviour of the code?
  • random_pos.csv, generated after training, stores the index of the perturbation conditions. Does it indicate how the conditions for training, validation, and testing are split?
  • After each substage, say substage 6, the code generates 6_best.y_hat.loss.csv, containing the expression prediction for perturbation conditions in the test set for all nodes, but it does not indicate which row in this file corresponds to which perturbation condition. How is this file and random_pos.csv related?

[ URGENT GSOC INQUIRY ] - Cellbox installment

Hey there! Just a small issue I have been facing and tried several iterations to diagnose this particular issue.

this is in specific for the macOS environment.

Screenshot 2021-04-09 at 11 09 39 PM

It specifically mentions disabling the use of optimized BLAS and LAPACK by setting their variables to null string.

I am genuinely curious to know where I can perform this particular operation so that I can build this package.

Usage discussion

Screenshot 2021-02-18 at 5 01 59 PM

I have tried to run the command on Binder and got an output as follows.

Sincere apologies for a dumb doubt, but like I am not exactly able to understand how we are able to train a model without exactly giving any precise outputs?

All inputs are welcome..

Thanking you :)

@cannin @DesmondYuan @judyueshen

Numerical instability between Tensorflow and Pytorch

Issue type

Bug or help needed

Relevant package versions

numpy == 1.24.1
tensorflow == 2.11.0
torch == 2.0.1

Python version

3.8.0

Current behaviour

The envelope forms in tensorflow and pytorch (defined here) yield very similar results (their difference between the two outputs is on the scale of 10e-8). However, these differences accumulate after several time steps in the ODE solver, and become very noticeable after around 150 to 200 time steps in the solver.

Code to reproduce

The recommended envelope form for CellBox is the tanh. The code below calculates the output from tensorflow's and pytorch's isolated envelope form set to tanh (defined in KernelConfig). There is no ODE involved yet.

import numpy as np
import tensorflow.compat.v1 as tf
import torch
tf.disable_v2_behavior()

class KernelConfig(object):
    def __init__(self):
        
        self.n_x = 5
        self.envelope_form = "tanh" # options: tanh, polynormial, hill, linear, clip linear
        self.envelope_fn = None
        self.polynomial_k = 2 # larger than 1
        self.ode_degree = 1
        self.envelope = 0
        self.ode_solver = "heun" # options: euler, heun, rk4, midpoint
        self.dT = 0.1
        self.n_T = 1000
        self.gradient_zero_from = None

args = KernelConfig()
W = np.random.normal(loc=0.01, size=(args.n_x, args.n_x))
eps = np.ones((args.n_x, 1), dtype=np.float32)
alpha = np.ones((args.n_x, 1), dtype=np.float32)
y0_np = np.zeros((args.n_x, 1))

# Test the envelope
def tensorflow_envelope():
    from cellbox.kernel import get_envelope
    envelope_fn = get_envelope(args)

    params = {}
    W_copy = np.copy(W)
    params["W"] = tf.convert_to_tensor(W_copy, dtype=tf.float32)
    if args.ode_degree == 1:
        def weighted_sum(x):
            return tf.matmul(params['W'], x)
    
    return envelope_fn(weighted_sum(tf.convert_to_tensor(params["W"], dtype=tf.float32))).eval(session=tf.compat.v1.Session())

def pytorch_get_envelope(args):
    """get the envelope form based on the given argument"""
    if args.envelope_form == 'tanh':
        args.envelope_fn = torch.tanh
    elif args.envelope_form == 'polynomial':
        k = args.polynomial_k
        assert k > 1, "Hill coefficient has to be k>2."
        if k % 2 == 1:  # odd order polynomial equation
            args.envelope_fn = lambda x: x ** k / (1 + torch.abs(x) ** k)
        else:  # even order polynomial equation
            args.envelope_fn = lambda x: x**k/(1+x**k)*torch.sign(x)
    elif args.envelope_form == 'hill':
        k = args.polynomial_k
        assert k > 1, "Hill coefficient has to be k>=2."
        args.envelope_fn = lambda x: 2*(1-1/(1+nn.functional.relu(torch.tensor(x+1)).numpy()**k))-1
    elif args.envelope_form == 'linear':
        args.envelope_fn = lambda x: x
    elif args.envelope_form == 'clip linear':
        args.envelope_fn = lambda x: torch.clamp(x, min=-1, max=1)
    else:
        raise Exception("Illegal envelope function. Choose from [tanh, polynomial/hill]")
    return args.envelope_fn

def pytorch_envelope():
    envelope_fn = pytorch_get_envelope(args)
    params = {}
    W_copy = np.copy(W)
    params["W"] = torch.tensor(W_copy, dtype=torch.float32)
    if args.ode_degree == 1:
        def weighted_sum(x):
            return torch.matmul(params['W'], x)

    return envelope_fn(weighted_sum(torch.tensor(params["W"], dtype=torch.float32))).numpy()

tf_out = tensorflow_envelope()
torch_out = pytorch_envelope()
print(np.abs(tf_out - torch_out))

The output is:

[[0.0000000e+00 1.4901161e-08 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [5.9604645e-08 0.0000000e+00 5.9604645e-08 2.9802322e-08 5.9604645e-08]
 [1.1920929e-07 0.0000000e+00 9.3132257e-10 0.0000000e+00 2.9802322e-08]
 [2.9802322e-08 1.4901161e-08 5.9604645e-08 1.8626451e-09 5.9604645e-08]
 [5.9604645e-08 5.9604645e-08 5.9604645e-08 0.0000000e+00 0.0000000e+00]]

If using polynomial with args.polynomial_k = 2:

args.envelope_form = "polynomial"
args.polynomial_k = 2
tf_out = tensorflow_envelope()
torch_out = pytorch_envelope()
print(np.abs(tf_out - torch_out))

The output is:

[[0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 5.9604645e-08 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.4551915e-11 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]]

However, if changing the envelope form to clip linear:

args.envelope_form = "clip linear"
tf_out = tensorflow_envelope()
torch_out = pytorch_envelope()
print(np.abs(tf_out - torch_out))

The output is:

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

This difference might be small, but it adds up within the ODE solver, and causes the final result of the tensorflow and pytorch ODE solver to differ significantly. The same issue persisted when args.envelope_form is set to hill or polynomial. However, when args.envelope_form is set to linear or clip linear, the difference between tensorflow and pytorch ODE solver is exactly 0, leading me to believe the numerical discrepancy of the other envelope functions cause this behaviour.

Solution

Is there a way around this? If two ODE solutions are very different, which one is the correct solution?

Typo error in Readme.md

jovyan@jupyter-dfci-2dcellbox-2d7985pt7y:~$ python scripts/main.py -config=configs/Example.random_partition.json
WARNING: Logging before flag parsing goes to stderr.
W0206 06:36:49.652899 139714245154624 __init__.py:329] Limited tf.compat.v2.summary API due to missing TensorBoard installation.

        version 0.0.2
        -- Jan 20, 2019 --
        * Huge Bug fixed with gradually increasing nT
        * Reorganize utils.py

        version 0.0.3
        -- Jan 21, 2019 --
        * Adding test of convergece

        version 0.0.3.1
        -- Jan 23, 2019 --
        * Roll back to x_0 = 1

        version 0.0.3
        -- Jan 21, 2019 --
        * Roll back 0.0.3

        version 0.0.3.2
        -- Jan 26, 2019 --
        * Adding outputs for test_convergence()

        version 0.0.3.3
        -- Jan 30, 2019 --
        * use last 20 time step for test_convergence()

        version 0.0.3.4
        -- Jan 31, 2019 --
        * use 0.1 as initial values for alpha and eps variable

        version 0.0.4
        -- Feb 11, 2019 --
        * Roll back 0.0.3.3
        * Add constraints on direct regulation from drug nodes to phenotypic nodes

        version 0.0.5
        -- Feb 21, 2019 --
        * Add function to normalize mse loss to different nodes.

        version 0.1.0
        -- Aug 21, 2019 --
        * Re-structure codes for publish.

        version 0.1.1
        -- Oct 4, 2019 --
        * Add new kinetics
        * Add new ODE solvers
        * Add new envelop forms

        version 0.2.0
        -- Feb 26, 2020 --
        * Add support of matrix operation rather than function mapping
        * Roughly 5x faster

        version 0.2.1
        -- Apr 5, 2020 --
        * Reformat for better code style
        * Revise docs

        version 0.2.2
        -- Apr 23, 2020 --
        * Add support to tf.Datasets
        * Add support to tf.sparse
        * Prepare for sparse single-cell data

        version 0.2.3
        -- June 8, 2020 --
        * Add support to L2 loss (alone or together with L1, i.e. elastic net)
        * Clean the example configs folder


Traceback (most recent call last):
  File "scripts/main.py", line 64, in <module>
    cfg = pertbio.config.Config(master_args.experiment_config_path)
  File "/srv/conda/envs/notebook/lib/python3.6/site-packages/pertbio/config.py", line 12, in __init__
    with open(config_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/Example.random_partition.json'

Error in python scripts/main.py -config=configs/Example.random_partition.json

Suggested change: python scripts/main.py -config=configs/example.random_partition.json

installing required modules from requirements.txt

After creating new environment. While running cellbox/setup.py, I had to manually install h5py module. So, I think it would be better to mention it in setup.py or install all modules mentioned in requirements.txt while running setup.py.
If you agree. I will send a pull request.

Update CellBox README

This issue is for updating the README file for future users to better install and use CellBox.

Best practices for using CellBox on different datasets

For external users wanting to use CellBox on their own dataset, what is the best practice to train the model? How many total models, differed by the seed, or --working_index, should be trained before the collection of models achieves statistical power? This question follows the Network Interpretation in the Methods section from the original CellBox paper, when 1000 models were trained for downstream analysis. CellBox and its ODE solver is susceptible to suboptimal weight initialization: setting the wrong random seed (--working_index) while keeping other configs and arguments the same can lead to very different results. Therefore, for new users with a new dataset, should they train only one model or multiple models with different random seeds to yield the best performance?

Inconsistent drug indexing in loo_label.csv, expr_index.txt, and --drug_index argument

Can you provide more information about what each row and index in loo_label.csv and expr_index.txt represents? I believe it is the label of each drug perturbation, because each row in loo_label.csv corresponds to each row in pert.csv and expr.csv, but I cannot tell what the number indices in loo_label.csv represent.

From the paper, there are 12 drugs being tested. The --drug_index argument therefore refers to the drug that is left out during training. I would assume that, for example, when I ran python scripts/main.py -config=configs/Example.leave_one_out.json --drug_index 12, all the rows in pert.csv that belong to the drug at index 12 (indicated in loo_label.csv) are left out in the training set. However, with a closer look, I see that testidx (defined in dataset.py) contains the indices that points to rows in loo_label.csv that has the number 9. Similarly, setting --drug_index 11 points to rows with number 8, and so on. But setting --drug_index from 0 to 7 points correctly to rows in loo_label.csv that have that number.

Can you confirm with me if this is an expected bahaviour? This is important for me to test my pytorch dataloader to confirm it fetches the similar rows in pert.csv as the current tensorflow dataloader.

self.train_y0 not found when setting model to LinReg

When setting the model option in the json config file to LinReg, the program throws an error of self.train_y0 not being defined. This is because self.train_y0 is only defined when the model is set to CellBox, as the build function in CellBox instantiates self.train_y0. Otherwise, in other models, self.train_y0 is never instantiated. This also applies to self.monitor_y0 and self.eval_y0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.