fburic / candia Goto Github PK
View Code? Open in Web Editor NEWCanonical Decomposition of Data-Independent-Acquired Spectra
License: Other
Canonical Decomposition of Data-Independent-Acquired Spectra
License: Other
Supporting all R dependencies adds a lot of complexity to the Singularity specification.
The two existing scripts ought to be reimplemented in Python.
The current README aims to provide sufficient information for a rather savvy person to get CANDIA running. One would like simpler documentation for everyday use. How much more simplified this will be depends on the functionality of the main script and pipeline configuration process.
Depends on: #2
Thescripts/quantification/diann.Snakefile
script tries to read the diann_library
config parameter, even when it is run in normal mode (the parameter is only required for a DIA-NN library-free search). The script crashes if this parameter is not specified in the config file.
Workaround:
Add diann_library: "results/diann/dummy.tsv"
to prevent the script from crashing. (The dummy file won't be created.)
The current data is too small and the workflow fails after the decompositions stage.
To allow for flexibility in using the output mzXML file (containing decomposed spectra) with various search engines, this export should be split from the current identification script, which is built to use either Crux or MS-GF+
To support future development, a set of unit tests should be available, at the very least for sanity checking.
The repo already provides a toy dataset and script to process it, but this should be improved.
Logs should clearly highlight the current stage and results of the pipeline.
Inconsequential warnings should not be shown to reduce clutter.
It should be made clear what stage failed through messages and, if possible, the cause of the failure.
Create a script to save the correspondence between decomposed spectra ID (<scan>
in the mzXML file) to sample_num
in their corresponding sample (abundance) modes.
Hello Filip,
We manage to install the CANDIA singularity container on a DENBI Ubuntu server with 2 cuda-able GPUs.
We are still not able to make the test command ./candia test/test_experiment/config/candia.yaml
run through completely.
Actually, it throws an error at the stage of PARAFAC decomposition. The previous steps of the processing seem to run through. The error is persistent even if I execute the commands separately for each stage.
Something like:
Running PARAFAC decomposition...
CANDIA: 2 GPUs found. Dividing input slices into 2 partitions.
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_0_20210302172404.log
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_1_20210302172404.log
done.
Indexing all PARAFAC models and components...
scripts/parafac/models.py:123: YAMLLoadWarning:
calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():54: Wrote model index
[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():58: Wrote spectrum index
done.
Selecting best models
[2021-03-02 17:24:12] [PID 94478] WARNING: collect_time_mode_values.py:get_model_time_mode_peak_counts():60: Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F12.pt
[2021-03-02 17:24:12] [PID 94477] WARNING: collect_time_mode_values.py:get_model_time_mode_peak_counts():60: Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F10.pt
...
Traceback (most recent call last):
File "scripts/parafac/collect_time_mode_values.py", line 113, in <module>
main()
File "scripts/parafac/collect_time_mode_values.py", line 45, in main
model_peak_count = pd.concat(model_peak_count, ignore_index=True)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
sort=sort,
File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
I am attaching here the log of the execution including the results of the previous steps and the complete error.
Do you think there's something we might be missing in the installation? What would you suggest to trouble-shoot this?
Many thanks in advance for taking a look into this.
Best wishes,
Miguel
Hello Filip,
I wanted to open a different issue for this question.
We are very happy with candia's capabilities so far and would probably be testing it soon with a bigger cohort of samples.
What I find interesting about this spectral decomposition is the ability to run 'classical' searches on the decomposed spectra.
One of the things I am interested in the detection of sequence variants, either by database search or by detecting non-annotated point mutations via xtandem or similar.
Since the candia's output is a single mzXML file, how do you think it would be possible to assign the identification (and therefore, potential quantitation) of a peptide to a particular sample/condition based on this unique mzXML file?
I understand that the quantitation can be performed via DIANN, but probably my ignorance regarding its use (I have not used it extensively) is not allowing me to understand how to differentiate identification between samples after having the decomposed spectra.
Would you have any ideas on how to go from the decomposed candia's spectra into xtandem for the identification of point-mutations not found in the fasta file, and then use this information for differential quatitation between samples?
As always, many thanks for your input.
Best wishes,
Miguel
Hello,
We are very excited about CANDIA's capabilities and are interested in testing it for our data analysis pipelines. I wanted to share my current experience with the installation and execution tests and maybe ask some advice on how to overcome some issues that we found.
singularity pull shub://fburic/candia:def
instead of the suggested
singularity pull shub://fburic/candia
candia
to test for proper installation... So we cloned the github repository into a new folder candia
, and transferred the candia_def.sif
file into it. The candia_def.sif
is the one created into the folder where singularity
was installed.Is there a better way to get into the container directory? I might be missing something.
./candia test/test_experiment/config/candia.yaml
but the pipeline only runs until the second step. Then it throws this error that seems like a missing dependency.
~/software/candia$ ./candia test/test_experiment/config/candia.yaml
Converting DIA scan files from mzML to CSV...
Building DAG of jobs...
Nothing to be done.
Complete log: /home/schilling/software/candia/.snakemake/log/2021-01-26T171607.531930.snakemake.log
done.
Adjusting precursor isolation windows...
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
2 adjust_file
1 all
3
[Tue Jan 26 17:16:08 2021]
rule adjust_file:
input: test/test_experiment/samples/scans_csv/scan1.csv
output: test/test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv
jobid: 2
wildcards: sample=scan1
Job counts:
count jobs
1 adjust_file
1
Rscript scripts/util/adjust_swaths.R -i test/test_experiment/samples/scans_csv/scan1.csv -o test/test_experiment/samples/scans _csv_adjusted/scan1_adjusted.csv
/bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘tidyselect’ 0.2.5 is already loaded, but >= 1.1.0 is required
In addition: Warning message:
package ‘tidyverse’ was built under R version 3.6.3
Execution halted
[Tue Jan 26 17:16:10 2021]
Error in rule adjust_file:
jobid: 0
output: test/test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv
RuleException:
CalledProcessError in line 25 of /home/schilling/software/candia/scripts/util/adjust_swaths.Snakefile:
Command 'set -euo pipefail; Rscript scripts/util/adjust_swaths.R -i test/test_experiment/samples/scans_csv/scan1.csv -o test/ test_experiment/samples/scans_csv_adjusted/scan1_adjusted.csv' returned non-zero exit status 1.
File "/home/schilling/software/candia/scripts/util/adjust_swaths.Snakefile", line 25, in __rule_adjust_file
File "/opt/conda/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/schilling/software/candia/.snakemake/log/2021-01-26T171608.586104.snakemake.log
Many thanks in advance for taking the time to read this report. I would be very glad if I can receive some input on how to set CANDIA up and running.
Best wishes,
Miguel
To support extending or adapting CANDIA functionality, the code base needs to be documented for developers. Ideally, most technical information should be covered here, not the user docs.
The pipeline was developed as a collection of separate small workflows, to allow inspecting intermediate results and iterative/modular development. A natural way to run the pipeline is through a main control script but this is currently missing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.