Coder Social home page Coder Social logo

damienarnol / svca Goto Github PK

View Code? Open in Web Editor NEW
13.0 7.0 10.0 5.25 MB

License: Apache License 2.0

Python 14.12% R 0.09% CMake 0.17% C++ 64.07% C 3.79% Shell 0.15% Makefile 0.09% Roff 0.53% M4 0.13% Fortran 2.88% MATLAB 0.20% PowerShell 0.02% Batchfile 0.01% TeX 0.21% HTML 12.99% Jupyter Notebook 0.55%

svca's Introduction

Spatial Variance Components Analysis (SVCA)

Dependencies

Python

  • numpy
  • scipy
  • pandas
  • rpy2 (for the notebooks)
  • limix (local version in this repository)

R (for plotting)

  • ggplot2
  • reshape2
  • gplots
  • plyr
  • pheatmap

Others

  • gcc / g++

Installation

Installing limix

SVCA relies on a specific version of limix found in svca_limix. You should first install this package using the setup file in svca_limix.

NB: If you are already a limix user, we recommend you install svca_limix and svca in a dedicated conda environment so there is no interference between your limix versions

cd svca_limix
python setup.py develop

Installing svca

Then install svca

cd ..
cd SVCA
python setup.py develop

Basic usage

Computing spatial variance signatures for single images

Running SVCA on single image and single protein can be done as illustrated in the bash script SVCA/svca/run/call_run_indiv.sh. The script calls the run_indiv.py script with the following inputs:

  1. data_dir='../../examples/data/IMC_example/Cy1x7/' directory with IMC input data
  2. output_dir='./test_svca' the output of the analysis is saved here
  3. protein_index=23 select the protein to be modelled
  4. normalisation='quantile' select the normalisation method.

For the analysis of all the images and proteins we recommend to use a cluster, this is explained in the next section.

Computing spatial variance signatures for multiple images

NB: For data format, look at the example in the data/IMC_example directory, which should correspond to your analysis_dir folder

We recommend using a cluster for this.

  1. Adapt the file SVCA/svca/run_cluster/run_all_cluster.py, to the queuing system used by your cluster.
  2. Your analysis directory analysis_dir should contain one directory per image on which you are fitting svca
  3. Each image folder should contain a positions.txt and an expressions.txt. Rows are cells and columns are (x,y) coordinates for the positions and genes for the expressions, with the gene names as the header for the expression file. No header for the positions.
  4. Run python run_all_cluster.py in the run_cluster directory.
  5. Results are in a results directory in each image directory

Visualising variance signatures

  1. Adapt the file SVCA/svca/plot_scripts/call_plot_signatures.sh. in_dir should be your analysis directory and plot_dir the directory in which you want to save your plots.
  2. run the file

Cross validation

We recommend using a cluster for this. The procedure is the exact same procedure as the one for computing variance signatures, but the file used is SVCA/svca/run_cluster/run_cross_validation_cluster.py.

Visualising cross validation results

  1. Adapt the file SVCA/svca/plot_scripts/plot_r2_cross_validation.R (bottom). working_dir should be your analysis directory and plot_dir the directory in which you want to save your plots.
  2. run the file plot_r2_cross_validation.R

svca's People

Contributors

damienarnol avatar gabora avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

svca's Issues

plot scrirt is looking for non-existing filenames

If you check this two lines of the plotting script:

tmp = tmp[grep('effect', tmp)]
tmp = tmp[grep('interactions', tmp)]

it seems you are looking for filenames in the results folder, which contains both words "effect" and "interactions".
But if I check the directory where you have the results, I only see file names that contains
intrinsic_effects or
env_effects or
local_effects.

maybe it is a typo interaction --> intrinsic?

Issues with running 'run_all_cluster.py'

I currently have svca installed, as well as my data (MIBI) formatted in the correct expression.txt and positions.txt folders. When I go to run the altered 'run_all_cluster.py' script I keep getting the error below:

workComputer:run_cluster id$ python run_all_cluster_fig6.py

Traceback (most recent call last):
  File "run_all_cluster_fig6.py", line 1, in <module>
    from svca.util_functions import cluster_utils, util_functions
  File "/anaconda3/lib/python3.7/site-packages/svca-0.0.1-py3.7.egg/svca/util_functions/cluster_utils.py", line 6, in <module>
ModuleNotFoundError: No module named 'util_functions'

I'm running everything from terminal on a Mac, not sure if that matters.
Thanks!

run_indiv.py fails

I managed to submit the jobs by run_all_cluster.py, but got an error:

from the log file:
ag252714@cluster:~/svca/SVCA/svca/run_cluster[659]$ vim tmp_log

Traceback (most recent call last):
File "../run/run_indiv.py", line 59, in
run(data_dir, protein_index, output_dir, normalisation)
File "../run/run_indiv.py", line 36, in run
model = Model1(phenotype, X, norm=normalisation, oos_predictions=0., cov_terms=cterms, kin_from=kin_from, cv_ix=0)
File "/home/ag252714/.local/lib/python2.7/site-packages/svca/models/model1.py", line 49, in init
self.init_model(cov_terms)
File "/home/ag252714/.local/lib/python2.7/site-packages/svca/models/model_base.py", line 84, in init_model
self.build_cov(cov_terms)
File "/home/ag252714/.local/lib/python2.7/site-packages/svca/models/model1.py", line 81, in build_cov
self.covar = apply(SumCov, self.covar_terms.values())
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/combinators.py", line 22, in init
self.addCovariance(covar)
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/acombinators.py", line 22, in addCovariance
assert covar.dim==self.dim, 'Dimension mismatch.'
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/zkz.py", line 46, in dim
return self.K().shape[0]
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/hcache/_hcache.py", line 173, in method_wrapper
result = method(self, *args, **kwargs)
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/zkz.py", line 180, in K
K = self._K()
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/hcache/_hcache.py", line 173, in method_wrapper
result = method(self, *args, **kwargs)
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/zkz.py", line 224, in _K
z = self.se.K()
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/hcache/_hcache.py", line 173, in method_wrapper
result = method(self, args, **kwargs)
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/sqexp.py", line 194, in K
return self.scale * sp.exp(-self.E()/(2
self.length))
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/hcache/_hcache.py", line 173, in method_wrapper
result = method(self, *args, **kwargs)
File "/home/ag252714/.local/lib/python2.7/site-packages/limix/core/covar/sqexp.py", line 183, in E
rv = SS.distance.pdist(self.X,'euclidean')**2.
File "/usr/lib64/python2.7/site-packages/scipy/spatial/distance.py", line 1176, in pdist
dm = np.zeros((m * (m - 1) / 2,), dtype=np.double)
TypeError: 'float' object cannot be interpreted as an index

How to run on local windows machine?

Hi there,

I have SVCA installed but I can't figure out how to execute the "call_run_indiv.sh" bash script from anaconda prompt on a windows machine?

Any help would be greatly appreciated!
Steve

Predictions/R^2

I want to obtain the R^2 values of the prediction, and the function write_r2 is outputting Nan values.

When I looked at the predictions array they come out empty even with model.gp.predict(). I am unsure how I can obtain the predictions and thus the R^2 values.

readme/examples

just some comments:

right now, setup is for personal computer and the example is supposed to run on the cluster. Consider making it coherent.

You could include a script for a subset of proteins/images that could be run on a local machine in a few minutes, so that the reviewer could try your script and then belive that the pipeline for the other images and proteins are also working.

I think if you have this script then you dont need to give instructions how to install your packages on a cluster. People can figure out themselves.
Anyway, the current installation instructions are not working on the cluster because of permission issues. As you said, conda environments would probably solve the issue, but then you might include a link how to do that.

The way it worked for me on the RWTH cluster by using pip and the --user option (credit for @Nic-Nic ):

  1. git clone https://github.com/damienArnol/svca.git
  2. cd svca/svca_limix
  3. pip install --user ./ -- an issue could be that I could not pass the develop argument for the setup.py script. But pip was able to install it anyway.
  4. cd ../SVCA
  5. pip install --user ./

Format of positions.txt

Why do you use the whitespace as the sep in expressions.txt but use the comma in positions.txt?
Is that some joke to you?
Have fun developing no-one-want-to-use methods!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.