fburic / candia Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 2.0 2.61 MB

Canonical Decomposition of Data-Independent-Acquired Spectra

License: Other

Python 93.87% Shell 3.59% R 2.54%

big-data deconvolution gpu hpc mass-spectrometry parafac parallel-factor-analysis proteomics tensor-decomposition

candia's People

Contributors

Stargazers

Watchers

Forkers

jesperkers sysbiochalmers

candia's Issues

Simplify dependencies: Replace R scripts

Supporting all R dependencies adds a lot of complexity to the Singularity specification.
The two existing scripts ought to be reimplemented in Python.

Diann wrapper Snakefile fails on missing irrelevant paramater

Thescripts/quantification/diann.Snakefile script tries to read the diann_library config parameter, even when it is run in normal mode (the parameter is only required for a DIA-NN library-free search). The script crashes if this parameter is not specified in the config file.

Workaround:

Add diann_library: "results/diann/dummy.tsv" to prevent the script from crashing. (The dummy file won't be created.)

Improve test data to cover entire workflow

The current data is too small and the workflow fails after the decompositions stage.

Split decomposed mzXML output to separate step

To allow for flexibility in using the output mzXML file (containing decomposed spectra) with various search engines, this export should be split from the current identification script, which is built to use either Crux or MS-GF+

Add unit tests

To support future development, a set of unit tests should be available, at the very least for sanity checking.
The repo already provides a toy dataset and script to process it, but this should be improved.

Improve logging and error reporting

Logs should clearly highlight the current stage and results of the pipeline.
Inconsequential warnings should not be shown to reduce clutter.

It should be made clear what stage failed through messages and, if possible, the cause of the failure.

Output CSV file that gives the spectrum ID-to-sample abundance correspondence

Create a script to save the correspondence between decomposed spectra ID (<scan> in the mzXML file) to sample_num in their corresponding sample (abundance) modes.

Error running PARAFAC decomposition

Hello Filip,

We manage to install the CANDIA singularity container on a DENBI Ubuntu server with 2 cuda-able GPUs.

We are still not able to make the test command ./candia test/test_experiment/config/candia.yaml run through completely.

Actually, it throws an error at the stage of PARAFAC decomposition. The previous steps of the processing seem to run through. The error is persistent even if I execute the commands separately for each stage.

Something like:

Running PARAFAC decomposition...
CANDIA: 2 GPUs found. Dividing input slices into 2 partitions.
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_0_20210302172404.log
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_1_20210302172404.log
done.
Indexing all PARAFAC models and components...
scripts/parafac/models.py:123: YAMLLoadWarning:

calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.

[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():54:    Wrote model index
[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():58:    Wrote spectrum index
done.
Selecting best models
[2021-03-02 17:24:12] [PID 94478] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F12.pt
[2021-03-02 17:24:12] [PID 94477] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F10.pt
...
Traceback (most recent call last):
  File "scripts/parafac/collect_time_mode_values.py", line 113, in <module>
    main()
  File "scripts/parafac/collect_time_mode_values.py", line 45, in main
    model_peak_count = pd.concat(model_peak_count, ignore_index=True)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
    sort=sort,
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

I am attaching here the log of the execution including the results of the previous steps and the complete error.

error_report_CANDIA.txt

Do you think there's something we might be missing in the installation? What would you suggest to trouble-shoot this?

Many thanks in advance for taking a look into this.

Best wishes,
Miguel

Identification and quantitation per sample

Hello Filip,

I wanted to open a different issue for this question.

We are very happy with candia's capabilities so far and would probably be testing it soon with a bigger cohort of samples.

What I find interesting about this spectral decomposition is the ability to run 'classical' searches on the decomposed spectra.

One of the things I am interested in the detection of sequence variants, either by database search or by detecting non-annotated point mutations via xtandem or similar.

Since the candia's output is a single mzXML file, how do you think it would be possible to assign the identification (and therefore, potential quantitation) of a peptide to a particular sample/condition based on this unique mzXML file?

I understand that the quantitation can be performed via DIANN, but probably my ignorance regarding its use (I have not used it extensively) is not allowing me to understand how to differentiate identification between samples after having the decomposed spectra.

Would you have any ideas on how to go from the decomposed candia's spectra into xtandem for the identification of point-mutations not found in the fasta file, and then use this information for differential quatitation between samples?

As always, many thanks for your input.

Best wishes,
Miguel

Installlation problems and execution errors

Hello,

We are very excited about CANDIA's capabilities and are interested in testing it for our data analysis pipelines. I wanted to share my current experience with the installation and execution tests and maybe ask some advice on how to overcome some issues that we found.

We managed to install it into our linux server (Ubutu 20.04.1 LTS), after installing Singularity, but none of the commands suggested for this in the README file actually worked. We needed to use:

singularity pull shub://fburic/candia:def

instead of the suggested

singularity pull shub://fburic/candia

It wasn't clear what is the 'CANDIA's' top-level directory to run the shell script candia to test for proper installation... So we cloned the github repository into a new folder candia, and transferred the candia_def.sif file into it. The candia_def.sif is the one created into the folder where singularity was installed.

Is there a better way to get into the container directory? I might be missing something.

From the top-level of this directory, it was possible to run the command,

./candia test/test_experiment/config/candia.yaml

but the pipeline only runs until the second step. Then it throws this error that seems like a missing dependency.

~/software/candia$ ./candia test/test_experiment/config/candia.yaml
Converting DIA scan files from mzML to CSV...
Building DAG of jobs...
Nothing to be done.
Complete log: /home/schilling/software/candia/.snakemake/log/2021-01-26T171607.531930.snakemake.log
done.
Adjusting precursor isolation windows...
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
       count   jobs
       2       adjust_file
       1       all
       3

[Tue Jan 26 17:16:08 2021]
rule adjust_file:
   input: test/test_experiment/samples/scans_csv/scan1.csv
   output: test/test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv
   jobid: 2
   wildcards: sample=scan1

Job counts:
       count   jobs
       1       adjust_file
       1
Rscript scripts/util/adjust_swaths.R -i test/test_experiment/samples/scans_csv/scan1.csv -o test/test_experiment/samples/scans                                                  _csv_adjusted/scan1_adjusted.csv
/bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck =                                                   vI[[j]]):
namespace ‘tidyselect’ 0.2.5 is already loaded, but >= 1.1.0 is required
In addition: Warning message:
package ‘tidyverse’ was built under R version 3.6.3
Execution halted
[Tue Jan 26 17:16:10 2021]
Error in rule adjust_file:
   jobid: 0
   output: test/test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv

RuleException:
CalledProcessError in line 25 of /home/schilling/software/candia/scripts/util/adjust_swaths.Snakefile:
Command 'set -euo pipefail;  Rscript scripts/util/adjust_swaths.R -i test/test_experiment/samples/scans_csv/scan1.csv -o test/                                                  test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv' returned non-zero exit status 1.
 File "/home/schilling/software/candia/scripts/util/adjust_swaths.Snakefile", line 25, in __rule_adjust_file
 File "/opt/conda/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/schilling/software/candia/.snakemake/log/2021-01-26T171608.586104.snakemake.log

Many thanks in advance for taking the time to read this report. I would be very glad if I can receive some input on how to set CANDIA up and running.

Best wishes,
Miguel

Create development documentation

To support extending or adapting CANDIA functionality, the code base needs to be documented for developers. Ideally, most technical information should be covered here, not the user docs.

Implement better main script

The pipeline was developed as a collection of separate small workflows, to allow inspecting intermediate results and iterative/modular development. A natural way to run the pipeline is through a main control script but this is currently missing.

fburic / candia Goto Github PK

candia's People

Contributors

Stargazers

Watchers

Forkers

candia's Issues

Recommend Projects

Recommend Topics

Recommend Org