Coder Social home page Coder Social logo

fburic / candia Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 2.0 2.61 MB

Canonical Decomposition of Data-Independent-Acquired Spectra

License: Other

Python 93.87% Shell 3.59% R 2.54%
big-data deconvolution gpu hpc mass-spectrometry parafac parallel-factor-analysis proteomics tensor-decomposition

candia's People

Contributors

fburic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

candia's Issues

More comprehensive user documentation

The current README aims to provide sufficient information for a rather savvy person to get CANDIA running. One would like simpler documentation for everyday use. How much more simplified this will be depends on the functionality of the main script and pipeline configuration process.

Depends on: #2

Diann wrapper Snakefile fails on missing irrelevant paramater

Thescripts/quantification/diann.Snakefile script tries to read the diann_library config parameter, even when it is run in normal mode (the parameter is only required for a DIA-NN library-free search). The script crashes if this parameter is not specified in the config file.

Workaround:

Add diann_library: "results/diann/dummy.tsv" to prevent the script from crashing. (The dummy file won't be created.)

Split decomposed mzXML output to separate step

To allow for flexibility in using the output mzXML file (containing decomposed spectra) with various search engines, this export should be split from the current identification script, which is built to use either Crux or MS-GF+

Add unit tests

To support future development, a set of unit tests should be available, at the very least for sanity checking.
The repo already provides a toy dataset and script to process it, but this should be improved.

Improve logging and error reporting

Logs should clearly highlight the current stage and results of the pipeline.
Inconsequential warnings should not be shown to reduce clutter.

It should be made clear what stage failed through messages and, if possible, the cause of the failure.

Error running PARAFAC decomposition

Hello Filip,

We manage to install the CANDIA singularity container on a DENBI Ubuntu server with 2 cuda-able GPUs.

We are still not able to make the test command ./candia test/test_experiment/config/candia.yaml run through completely.

Actually, it throws an error at the stage of PARAFAC decomposition. The previous steps of the processing seem to run through. The error is persistent even if I execute the commands separately for each stage.

Something like:

Running PARAFAC decomposition...
CANDIA: 2 GPUs found. Dividing input slices into 2 partitions.
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_0_20210302172404.log
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_1_20210302172404.log
done.
Indexing all PARAFAC models and components...
scripts/parafac/models.py:123: YAMLLoadWarning:

calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.

[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():54:    Wrote model index
[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():58:    Wrote spectrum index
done.
Selecting best models
[2021-03-02 17:24:12] [PID 94478] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F12.pt
[2021-03-02 17:24:12] [PID 94477] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F10.pt
...
Traceback (most recent call last):
  File "scripts/parafac/collect_time_mode_values.py", line 113, in <module>
    main()
  File "scripts/parafac/collect_time_mode_values.py", line 45, in main
    model_peak_count = pd.concat(model_peak_count, ignore_index=True)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
    sort=sort,
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

I am attaching here the log of the execution including the results of the previous steps and the complete error.

error_report_CANDIA.txt

Do you think there's something we might be missing in the installation? What would you suggest to trouble-shoot this?

Many thanks in advance for taking a look into this.

Best wishes,
Miguel

Identification and quantitation per sample

Hello Filip,

I wanted to open a different issue for this question.

We are very happy with candia's capabilities so far and would probably be testing it soon with a bigger cohort of samples.

What I find interesting about this spectral decomposition is the ability to run 'classical' searches on the decomposed spectra.

One of the things I am interested in the detection of sequence variants, either by database search or by detecting non-annotated point mutations via xtandem or similar.

Since the candia's output is a single mzXML file, how do you think it would be possible to assign the identification (and therefore, potential quantitation) of a peptide to a particular sample/condition based on this unique mzXML file?

I understand that the quantitation can be performed via DIANN, but probably my ignorance regarding its use (I have not used it extensively) is not allowing me to understand how to differentiate identification between samples after having the decomposed spectra.

Would you have any ideas on how to go from the decomposed candia's spectra into xtandem for the identification of point-mutations not found in the fasta file, and then use this information for differential quatitation between samples?

As always, many thanks for your input.

Best wishes,
Miguel

Installlation problems and execution errors

Hello,

We are very excited about CANDIA's capabilities and are interested in testing it for our data analysis pipelines. I wanted to share my current experience with the installation and execution tests and maybe ask some advice on how to overcome some issues that we found.

  1. We managed to install it into our linux server (Ubutu 20.04.1 LTS), after installing Singularity, but none of the commands suggested for this in the README file actually worked. We needed to use:
singularity pull shub://fburic/candia:def

instead of the suggested

singularity pull shub://fburic/candia
  1. It wasn't clear what is the 'CANDIA's' top-level directory to run the shell script candia to test for proper installation... So we cloned the github repository into a new folder candia, and transferred the candia_def.sif file into it. The candia_def.sif is the one created into the folder where singularity was installed.

Is there a better way to get into the container directory? I might be missing something.

  1. From the top-level of this directory, it was possible to run the command,
./candia test/test_experiment/config/candia.yaml

but the pipeline only runs until the second step. Then it throws this error that seems like a missing dependency.

~/software/candia$ ./candia test/test_experiment/config/candia.yaml
Converting DIA scan files from mzML to CSV...
Building DAG of jobs...
Nothing to be done.
Complete log: /home/schilling/software/candia/.snakemake/log/2021-01-26T171607.531930.snakemake.log
done.
Adjusting precursor isolation windows...
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
       count   jobs
       2       adjust_file
       1       all
       3

[Tue Jan 26 17:16:08 2021]
rule adjust_file:
   input: test/test_experiment/samples/scans_csv/scan1.csv
   output: test/test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv
   jobid: 2
   wildcards: sample=scan1

Job counts:
       count   jobs
       1       adjust_file
       1
Rscript scripts/util/adjust_swaths.R -i test/test_experiment/samples/scans_csv/scan1.csv -o test/test_experiment/samples/scans                                                  _csv_adjusted/scan1_adjusted.csv
/bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck =                                                   vI[[j]]):
namespace ‘tidyselect’ 0.2.5 is already loaded, but >= 1.1.0 is required
In addition: Warning message:
package ‘tidyverse’ was built under R version 3.6.3
Execution halted
[Tue Jan 26 17:16:10 2021]
Error in rule adjust_file:
   jobid: 0
   output: test/test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv

RuleException:
CalledProcessError in line 25 of /home/schilling/software/candia/scripts/util/adjust_swaths.Snakefile:
Command 'set -euo pipefail;  Rscript scripts/util/adjust_swaths.R -i test/test_experiment/samples/scans_csv/scan1.csv -o test/                                                  test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv' returned non-zero exit status 1.
 File "/home/schilling/software/candia/scripts/util/adjust_swaths.Snakefile", line 25, in __rule_adjust_file
 File "/opt/conda/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/schilling/software/candia/.snakemake/log/2021-01-26T171608.586104.snakemake.log

Many thanks in advance for taking the time to read this report. I would be very glad if I can receive some input on how to set CANDIA up and running.

Best wishes,
Miguel

Create development documentation

To support extending or adapting CANDIA functionality, the code base needs to be documented for developers. Ideally, most technical information should be covered here, not the user docs.

Implement better main script

The pipeline was developed as a collection of separate small workflows, to allow inspecting intermediate results and iterative/modular development. A natural way to run the pipeline is through a main control script but this is currently missing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.