Coder Social home page Coder Social logo

vdemichev / diann Goto Github PK

View Code? Open in Web Editor NEW
233.0 233.0 50.0 265.16 MB

DIA-NN - a universal automated software suite for DIA proteomics data analysis.

License: Other

C++ 64.15% C 35.56% CMake 0.01% C# 0.27% Assembly 0.01% Shell 0.01% Makefile 0.01%

diann's People

Contributors

ewail avatar vdemichev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diann's Issues

How to generate a .tsv library in silico?

Dear Vadim

I really like DIA-NN; it is quite impressive what you managed to create!

I have an MSX dataset, which unfortunately is not yet supported in DIA-NN, according to the error message.
Therefore, I tried to use DIA-NN to generate an in silico library based on the mouse FASTA proteome, which I obtained from Uniprot. I would then try to use this library in e.g. Skyline, to search my data.
I did manage to create a ".speclib" file, but DIA-NN does not output a ".tsv" library.
I've tried quite a few variations on the following already, but without succes:

.\diann.exe --lib "D:\Documents\output_DIA-NN_2\lib.predicted.speclib" --threads 4 --out-lib "D:\Documents\output_DIA-NN_3\lib.tsv"

What can I do to create a ".tsv" library output?

Many thanks!

Best regards

Ludger

Can not read raw file??

DIA-NN 1.7.11 (Data Independent Acquisition by Neural Networks)
Compiled on May 9 2020 18:21:12
Current date and time: Mon Jun 1 01:38:29 2020
CPU: GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2
Logical CPU cores: 48
diann.exe --f D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-7936.raw --f D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9790.raw --f D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9809.raw --lib D:\CX\human_plasma\microLC\2005143_HXQ_DIA\profile_mzXML_64\output_file_irt_con.sptxt --threads 24 --verbose 1 --out D:\CX\human_plasma\microLC\2005143_HXQ_DIA\DIANN\results.tsv --out-gene D:\CX\human_plasma\microLC\2005143_HXQ_DIA\DIANN\results.genes.tsv --qvalue 0.01 --sptxt-acc 5ppm

Thread number set to 24
Output will be filtered at 0.01 FDR
Fragment filtering for .sptxt/.msp spectral libraries set to 5 ppm

3 files will be processed
[0:00] Loading spectral library D:\CX\human_plasma\microLC\2005143_HXQ_DIA\profile_mzXML_64\output_file_irt_con.sptxt
WARNING: support for .sptxt/.msp spectral libraries is experimental; fragment ions must be annotated; 5 ppm mass accuracy filtering will be used; use --sptxt-acc to change this setting
[0:00] Assembling elution groups
[0:00] Spectral library loaded: 0 protein isoforms, 0 protein groups and 5024 precursors in 3870 elution groups.
[0:00] Initialising library
[0:00] Saving the library to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\profile_mzXML_64\output_file_irt_con.sptxt.speclib

[0:00] File #1/3
[0:00] Loading run D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-7936.raw
No MS2 spectra: aborting
ERROR: cannot load the file, skipping
[0:00] 0 library precursors are potentially detectable
[0:00] Processing...
[0:00] Removing interfering precursors
[0:00] Too few confident identifications, neural network will not be used
[0:00] Number of IDs at 0.01 FDR: 0
[0:00] No protein annotation, skipping protein q-value calculation
[0:00] Quantification
[0:00] Quantification information saved to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-7936.raw.quant.

[0:00] File #2/3
[0:00] Loading run D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9790.raw
No MS2 spectra: aborting
ERROR: cannot load the file, skipping
[0:00] 0 library precursors are potentially detectable
[0:00] Processing...
[0:00] Removing interfering precursors
[0:00] Too few confident identifications, neural network will not be used
[0:00] Number of IDs at 0.01 FDR: 0
[0:00] No protein annotation, skipping protein q-value calculation
[0:00] Quantification
[0:00] Quantification information saved to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9790.raw.quant.

[0:00] File #3/3
[0:00] Loading run D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9809.raw
No MS2 spectra: aborting
ERROR: cannot load the file, skipping
[0:00] 0 library precursors are potentially detectable
[0:00] Processing...
[0:00] Removing interfering precursors
[0:00] Too few confident identifications, neural network will not be used
[0:00] Number of IDs at 0.01 FDR: 0
[0:00] No protein annotation, skipping protein q-value calculation
[0:00] Quantification
[0:00] Quantification information saved to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9809.raw.quant.

ERROR: DIA-NN tried but failed to load the following files: D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-7936.raw, D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9790.raw, D:\CX\human_plasma\microLC\2005143_HXQ_DIA\2005143_HXQ_DIA_2016-9809.raw
[0:00] Cross-run analysis
[0:00] Reading quantification information: 3 files
[0:00] Quantifying peptides
WARNING: not enough peptides for normalisation
[0:00] Quantifying proteins
[0:00] Calculating q-values for protein and gene groups
[0:00] Writing report
[0:00] Report saved to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\DIANN\results.tsv.
[0:00] Stats report saved to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\DIANN\results.stats.tsv
[0:00] Writing gene report
[0:00] Gene report saved to D:\CX\human_plasma\microLC\2005143_HXQ_DIA\DIANN\results.genes.tsv.

0 precursors added to the library

I want to generate a library with DIANN(1.7.4) libfree on linux. But there are two problems.1: Many DIA files indicate that ’Cannot perform a mass calibration, too few rarely identify precursors‘ ; 2: According to your method, I ran on many computers, and finally recalculated the .quant file. There are 237 files, but all hint '0 precursors added to the library', the generated library file is empty, but the quantitative result is successful. I selected two files to test and the test results were successful.

/soft/diann-linux-1.7.4 --f A20181013sunyt_shaoyk_CRCA_DIA_Rp_b3_10.mzML --f A20181114sunyt_shaoyk_CRCA_DIA_b13_9.mzML --lib "" --threads 24 --verbose 1 --out "reportsum.tsv" --out-gene "reportsum.genes.tsv" --qvalue 0.01 --temp "./" --out-lib "libsum.tsv" --gen-spec-lib --fasta "swissprot_human_20180209_target_IRT_contaminant.fasta" --fasta-search --min-fr-mz 100 --max-fr-mz 1500 --met-excision --cut-after KR --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 400 --max-pr-mz 1200 --unimod4 --var-mods 1 --unimod35 --use-quant --rt-profiling --pg-level 1

I deleted a lot of suspicious files, and as I built the library, less and less of this stuff was added to it, until after a few dozen files it was 0
image
image

SpectraST libraries

Hi, I noticed that DIA-NN 1.7.8 provides support for NIST .msp and SpectraST .sptxt libraries. But I was never able to sadd the libraries in the format .SPLIB file (generated by FragPipe-MSFragger) in DIA 1.7.8 or 1.7.7, with or without your suggestion (addition of “"--library-headers ,,,,,,Protein ID,Entry Name” to the Additional options text box). Always the warning: WARNING: cannot find column ModifiedPeptide LabeledSequence FullUniModPeptideName "FullUniModPeptideName" modification_sequence. Any other suggestions? Please.

Phosphorylation modification DIA

Hi Vadim,

I wonder if DIA-NN support phosphorylation modification DIA data anlaysis. How should I use it if it is supported? For unsupported modifications, is there any way for me to add them as extra parameters to the command line?

We have compared the results between DIA-NN and Spectronaut on two datasets. The identification are well overlap in one set and almost without overlap in another set. Do you have any explanation on such discrepancy.

Regards,
Da

About False Positive

Dear Vadim,
I would like to ask how to filter out the false positive proteins with the method library-based and library-free respectively. I have tried to filter out the false positive proteins with the method library-based, which is consistent with the number of proteins identified in the DIANN report, but the method library-free filtering out the false positive proteins is different from the report(Q value is 0.01), so I would like to ask how to filter out the false positive proteins with the method library-free.

Best wishes,

August-j

Segmentation fault on Linux compiled version

Dear Vadim,

I have compiled diann in CentOS and I keep getting crashes at the end of a run, after sample analysis and during the stage of preparing protein groups and output tables.
Crashes can be avoided if certain samples are removed from the analysis, but the bottom line reason is unknown, is this something you have heard of? .

The compilation method seems to play a factor, at least partly. Compilation without "-static" flag gives so far the best executable, which works in most cases, yet still crashes with some samples. Keeping the "-static" flag results in an executable that crashes in all tests, and always at the final stage of building protein groups. Lastly for completeness, the windows version from the repository does not crash with any of these samples.

Fingers crossed !
Marcelo

SILAC implementation & quantification issues

Dear Vadim,

we are currently in the process of setting up new DIA-based workflows currently using 1.7.10. One aim is to also use it for SILAC experiments. I went trough the manual and as far as I understand, recognition of SILAC samples should be automated? However, I get no proper results using either the --mod command w/ UniMod IDs nor using no command line arguments at all.
In both cases, DIA-NN identifies a number of proteins similar to our alternatives from the samples, but no SILAC ratios are present in any of the generated *.tsv (and the Label.Ratio column always reads 0).
I am not sure at which point the modifications have to be added, either on library creation, on sample calculation or at both (or none). I tested all three variants and always gain the same outcome.
We use standard SILAC mass shifts (R10, K8), so nothing fancy there and I will happily provide any additional info I forgot to mention.

Thanks a lot and keep up your great work, really enjoying DIA-NN,
Jan

questions about hardware configuration when runing DIN-NN

Hi,
Today I use my laptop to run the DIA data in DIN-NN(1.7.10).Unfortunately,when it went about 8 hours,the laptop crashed.I wonder if it's owing to my low setup in the haedware.Is it there any requirment in the hardware configuration?
Thanks very much if you could reply to me.
Best wishes!
Clover

Non-unimod modifications

Hey,

Is there any possibility of specifying non-unimod listed modifications, e.g. by specifying the delta mass? Or does the --mod does not require the 'UniMod:xx,' ?

Thanks in advance

Library RTs in minutes seems to be interpreted as iRT values the output tsv-report

Hello,

I have used DIA-NN 1.7.11 in windows through the commandline, to run a DIA analysis. The library I supported was in TSV-format (as given by OpenSwathAssayGenerator) and in the NormalizedRetentionTime column of that library I had retention times in minutes (predicted). However, when I look at the output TSV-file from DIA-NN, it seems like the retention times in the library are interpreted as iRT values in the report. Since this is not the case (unless iRT values by chance happens to be soewhat equal to RT in minutes), I wonder if there is any way of instructing DIA-NN that the library retention times are in minutes?

By the way, I really like DIA-NN, it is impressively fast and it can handle large libraries without any problems.

Best,

Marc

Creating full reports in DIA-NN

Hi Vadim,

First of all, congrats for creating this software, it really works very well! I have two questions for you (I am using version 1.7.6). I would like to create full reports (no 0s or NaNs) and for this I set both precursor q-value and protein q-value to 1. The idea behind is to get a full matrix, in which all missing values are 'imputed' (similar to the q-value filtering option in spectronaut). However, the output still contains some 0s and I cannot really explain why. I also tried to disable protein inference but it did not help either. Any idea why this is the case? Is there any solution to this problem?
The second question is related to the reported number of protein groups. I did some tests, in which I analysed DIA raw files with either a spectral library (created with SN) or using your library-free search option (with deep learning enabled, human fasta swissprot with isoforms). In both cases, I enabled protein inference and for the library-based search I used the command '--library-headers ,,,,,,ProteinGroups' for getting gene name information in the output report. For the protein grouping, I kept the default 'genes' option. While the number of reported protein groups with the library-based search appears to be credible, in the library-free search it is heavily inflated. There are 12,000 or more reported protein groups and I it unclear to me why this is the case. I would assume that the protein grouping algorithm should be the same for both library-based and library-free searches if protein inference is enabled? Thanks for clarifying!

best,
Martin

Directory Parameter

It'd be convenient if there was a -d/--directory option which would process all mzML files in the specified directory. Perhaps --f could become -f and the long parameter name could be --file.

Semi Tryptic Support

Hi,

Does Diann have some support for semi-tryptic peptides?

I have a library constructed(using iprohet and spectrast) from DDA runs, that contain semi-tryptic peptides.
When reanotating the fasta for protein inference, --cut-after KR is set. My understanding is this setting will negate semi-tryptic peptides being associated with any protein groups?

if so, can i set anything to allow for the use of semi-tryptic peptides in protein grouping?

Thanks

Error loading a previously saved .pipeline

Hi Vadim,

I'm contacting you about a problem that I have encountered with the .pipeline format from DiaNN.
I could save and re-open my pipeline without problem, but when I click on one of the steps (for example to verify the settings that were saved), I get the following error : "StartIndex cannot be less than zero" (please see the screenshot below).

Where could this error come from?
Additionnally, when I click on "Continue" DiaNN stays open, but I can see that the settings that are shown in the interface are not the ones I saved into my pipeline.

Thank you in advance for your response.

Best regards,
Tuvana

DiaNN_error_pipeline

DIA PASEF data analysis on DIA-NN

Dear Vadim,

I tried to analyze DIA PASEF data on DIA-NN 1.7.11 by CentOS 6.8 command line, but all the efforts failed.
First I directly added analysis.tdf file under .d folder in DIA-NN, and the command is diann --f FILE --lib SPECLIB --threads 20 --verbose 1 --out "*.tsv" --out-gene "*.genes.tsv" --qvalue 0.01 --out-lib "*.lib.tsv" --gen-spec-lib --fasta FASTA --met-excision --individual-mass-acc --individual-windows --rt-profiling --pg-level 1, but the process stopped with a waning: invalid raw MS data forma.

Then, I tried to convert the .d data to .mzML using Mobi-DIK (an extension of the OpenSWATH workflow), after .mzML used DIA-NN stopped with another error as No index list offset found. File will not be read.

Finally I used msConvert (provided by ProteoWizard) to convert the raw file to .mzML format, but DIA-NN emerged another error as No MS2 specrea: aborting.

Is the command that I entered right? And whether .tdf format can be directly loaded into DIA-NN now as #27 ? If not, which format is work, and which software/script could be able to convert the raw files?

Best wishes,
Hao

Fragment Data

Is it possible to work out which fragments are listed in the columns Fragment.Quant.Raw/Corrected and Fragmetn.Correlations columsn?

Also, for all identical precursors identified during the search are the same fragments llisted in each row? (If the library has more than 6 transitions/precursor)

Thanks a lot, for the great software

linux error: terminate called after throwing an instance of 'std::invalid_argument' what(): stoi Aborted (core dumped)

Hi,
I am trying to run DIA-NN on ubuntu 18.04. but i get the error below:

Compiled on Aug 25 2020 13:05:41
Logical CPU cores: 12
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoi
Aborted (core dumped)

below are the commands used:

$diann --f /home/naadir/Mouse_Infections/linux-diann/010720_NG_43min_SWATH_MouseInfections_MPI-0hrs_1_2.mzML.dia --lib $SpecLib --threads --verbose 1 --out $MainOut --out-gene $GeneOut --qvalue 0.01 --matrices --temp $tempDir --out-lib $SpecLibOut --gen-spec-lib --prosit --reannotate --fasta $fasta --met-excision --cut-after KR --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 350 --max-pr-mz 1050 --unimod4 --var-mods 2 --unimod35 --individual-mass-acc --individual-windows --rt-profiling --pg-level 0 --report-lib-info

thanks

about

Hi,
Recently I have tried a software pfind3.And I just wonder if DIANN supports the DDA library produced by pfind3?Have you tried this?

merging of spectral libraries)

Dear Vadim,

Is there any way to merge different spectral libraries generated by DIA-NN within DIA-NN?
Thanks for your response!
Martin

cores used

Hi Vadim,

we noticed that on one of our server with 96 logical processor (2 sockets, 48 cores) we are only allow to select 48 threads in DiaNN. We had the same "issue" with Spectronaut and they explained us that it's a bug but even if it shows only 48, it runs on 96 cores. Is it the same kind of bug for DiaNN?
On another server with 56 cores it allows us to select the 56th but apparently the bug arise when we are working with a server with more than 64 cores.

Thanks for your help,

The libarary produced by FASTA predicted.speclib is the same as adding FASTA ?

Hi,
About 1 month ago,I have comapared the library produced by FASTA predicted.speclib and just adding FASTA files .And I have found that the latter identifies more protein numbers than the former.But today when I repeated the same operation on the same files,DIANN identifies the same result.Is there an amendment in DIANN?
Best Wishes!
Clover

mzML mscovert option

We tested two types of mzML file to run DIANN. The one without write index selected is not working for No MS2 spectra: aborting. since the sentence in the manual confused us 'vendor centroiding enabled and all other options except “Write index” disable" which actually means write index should be selected.

image
Here we would to confirm if this is the recommended option to use msconvert? Thanks

image

support of ion mobiltiy (FAIMS or DIA-PASEF)?

Dear Vadim,

I have a quick question regarding the compatibility of DIA raw files that contain ion mobility information (e.g. acquired on a Bruker TIMS TOF pro system using DIA PASEF technology or on an Thermo Exploris 480 system using the FAIMS device). Does DIA-NN currently support these files or do you have any plans to implement the support of this kind of files in the future?

For the analysis of our "regular" DIA files (acquired on a QExactive HF-X) DIA-NN is a great tool we're about to explore more and more and we highly appreciate its performance and the great contribution you've made for the whole community. Thank you very much, best wishes, Uli

About DDA library building

Hi Vadim,
I wonder that if there is no iRT addition in the DDA run but I also want to use proteome discoverer or Maxquant to build a DDA librray,it means that I can't use skyline to export the compatible format with DIANN(I have tried to import the PD result without iRT but it failed to run).How could I get a available format with DIANN?
Thank you !
Clover

Error when installing diann-rpackage?

install_github("https://github.com/vdemichev/diann-rpackage")
WARNING: Rtools is required to build R packages, but the version of Rtools previously installed in C:/rtools40 has been deleted.

Please download and install Rtools custom from http://cran.r-project.org/bin/windows/Rtools/.
Downloading GitHub repo vdemichev/diann-rpackage@master
WARNING: Rtools is required to build R packages, but the version of Rtools previously installed in C:/rtools40 has been deleted.

Please download and install Rtools custom from http://cran.r-project.org/bin/windows/Rtools/.
Failed to install 'diann' from GitHub:
Could not find tools necessary to compile a package
Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem.

library(diann)
Error in library(diann) :

Can't create a report from C:/

Hi,
I‘m trying to use DIA-NN for cope with my DIA data.
I have tried the demo data in the GitHub and followed the instructions here.But the text for running prompts that it cannot create a report file in C:/.And today I try to analyse other DIA data.The same error occured and is shown as follows.
image
Could you help me check it out?Thanks a million.
Best regards,
Clover

Cmd-line snakemake?

Hello,

is it possible to implement DiaNN in a snakemake workflow if not is it planned ?

cheers

Chris

crosses initialization of 'int ni' error when compiling on Ubuntu 20.04 and GCC 9.3.0

I was trying to compile on Ubuntu 20.04 and GCC 9.3.0 and received the error below. The newer GCC compiler may be more strict. I was able to compile successfully on Ubuntu 18.04

gcc -O3 -static -I. -I./include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DGCC -DHAVE_EXPAT_CONFIG_H ./src/expat-2.2.0/xmlparse.c -c
gcc -O3 -static -I. -I./include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DGCC -DHAVE_EXPAT_CONFIG_H ./src/expat-2.2.0/xmlrole.c -c
gcc -O3 -static -I. -I./include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DGCC -DHAVE_EXPAT_CONFIG_H ./src/expat-2.2.0/xmltok.c -c
ar rcs libmstoolkitlite.a adler32.o compress.o crc32.o deflate.o inffast.o inflate.o infback.o inftrees.o trees.o uncompr.o zutil.o xmlparse.o xmlrole.o xmltok.o mzp.MSNumpress.o mzp.mzp_base64_lite.o mzp.BasicSpectrum_lite.o mzp.mzParser_lite.o mzp.RAMPface_lite.o mzp.saxhandler_lite.o mzp.saxmzmlhandler_lite.o mzp.saxmzxmlhandler_lite.o mzp.Czran_lite.o mzp.mz5handler_lite.o mzp.mzpMz5Config_lite.o mzp.mzpMz5Structs_lite.o mzp.BasicChromatogram_lite.o mzp.PWIZface_lite.o Spectrum.o MSObject.o mzMLWriter.o pepXMLWriter.o MSReaderLite.o
g++ -o ../diann-linux ../src/diann.cpp -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++17 -lstdc++fs -fpermissive -O3 -mfpmath=sse -msse2 -march=core2 -w -L. -lmstoolkitlite -I. -I./include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DGCC -DHAVE_EXPAT_CONFIG_H
../src/diann.cpp: In member function 'void Library::max_lfq_quant(std::vector, std::vector, double, bool)':
../src/diann.cpp:5113:4: error: jump to label 'save'
5113 | save:
| ^~~~
../src/diann.cpp:5066:10: note: from here
5066 | goto save;
| ^~~~
../src/diann.cpp:5069:8: note: crosses initialization of 'int ni'
5069 | int ni = indices.size();
| ^~
make: *** [Makefile:30: all] Error 1
root@2931130497d9:/diann/mstoolkit# gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Uniprot IDs appear in multiple protein groups per RAW file

Hello,

Thank you very much for the great work! On both sides the algorithmic (runtime, variance, number of ids) and the software management (open source, CC license, issues on github) you put academic proteomics software on a new level.

Problem description

We noted, that in the DiaNN output a UniProt accession can be shared between different protein groups. This is different from what we typically would expect, as in other tools (like Spectronaut or MaxQuant). To our knowledge such overlapping protein groups would be merged further, e.g. retaining the minimum set of accessions required to explain all observed peptides/precursors.

Questions

  1. Is there a specific reason why you decided against such a futrther aggregation? At least in our data it seems in this way the number of distinct protein groups would tend to overestimate the actual protein numbers, which might negatively affect downstreatm analysis like significance tests.
  2. How do you handle the relationship between Uniprot accession and protein groups in your own data? Or do you maybe use further aggregated data in your downstreatm analyses instead of the "Protein.Group" column from "report.tsv" file?

We are currently using DiaNN "v1.7.6".

Best,
Daniel

Deep learning functionality for Linux version?

Hi Vadim,

Regarding lib free search for library construction, is in your roadmap to provide the Deep learning functionality for the linux version ?

MArcelo

ps. Apologies for filing so many issues at once, this all comes from a period of testing and observations.

Protein group FDR control while generating DIA-based DiaNN library

Hi Vadim,

I'm wondering if there's a protein group FDR control that is applied when we generate a DIA-based library in DiaNN, similarly to maxQuant for example? If this is not done by default, is there any additionnal command line to specify while running DiaNN?

Thanks in advance.

Best,
Tuvana

Wrong spectral library headers

Hi,
I'm trying to run DIA-NN using a spectral library generated from SpectraST. I renamed all columns according to the docs

rn = { 'PrecursorMz': 'Precursor m/z', 'ProductMz': 'Fragment ion m/z', 'PrecursorCharge' : 'Precursor charge', 'ModifiedPeptideSequence' : 'Modified peptide sequences', 'NormalizedRetentionTime': 'Reference retention time', 'LibraryIntensity': 'Relative intensity of the fragment ion', 'UniprotId' : 'UniProt identifiers', 'ProteinId' : 'Protein names', 'GeneName': 'Gene name', 'ProductCharge' : 'Fragment ion charge', 'PeptideGroupLabel' : 'Elution group', 'FragmentType' : 'Fragment ion type' }
However, the error raised is
WARNING: cannot find column ModifiedPeptide LabeledSequence FullUniModPeptideName "FullUniModPeptideName"
There is no mention of this column in the manual
https://github.com/vdemichev/DiaNN/blob/master/DIA-NN%20GUI%20manual.pdf

After checking the source code is clear that Modified peptide sequences (from the manual) is not present in the headers, so for most of the other headers. Is the doc outdated?

std::vector<std::string> library_headers = { " ModifiedPeptide LabeledSequence FullUniModPeptideName \"FullUniModPeptideName\" modification_sequence ", " PrecursorCharge Charge \"PrecursorCharge\" prec_z ", " PrecursorMz \"PrecursorMz\" Q1 ", " iRT iRT RetentionTime NormalizedRetentionTime Tr_recalibrated \"Tr_recalibrated\" RT_detected ", " FragmentMz ProductMz \"ProductMz\" Q3 ", " RelativeIntensity RelativeFragmentIntensity RelativeFragmentIonIntensity LibraryIntensity \"LibraryIntensity\" relative_intensity ", " UniprotId UniProtIds UniprotID \"UniprotID\" uniprot_id ", " Protein Name Protein.Name ProteinName \"ProteinName\" ", " Genes Gene Genes \"Genes\" ", " IsProteotypic Is.Proteotypic Proteotypic ", " Decoy decoy \"Decoy\" \"decoy\" ", " FragmentCharge frg_z \"FragmentCharge\" ", " FragmentType frg_type \"FragmentType\" ", " FragmentNumber frg_nr FragmentSeriesNumber \"FragmentSeriesNumber\" ", " FragmentLossType \"FragmentLossType\" ", " Q.Value QValue Qvalue qvalue \"Q.Value\" \"QValue\" \"Qvalue\" \"qvalue\" ", " ElutionGroup ModifiedSequence ", };

Questions for diaPASEF data analysis by DIA-NN

Dear Vadim,

we are building timsTOF peptide library using spectronaut 13.200430. However, it's time-consuming for the mass-spec aquisition and library generation from Pulsar. We wonder if we can analyze diaPASEF data on DIA-NN without library. Have you compare the results of libray searching and library free searching for diaPASEF data? If a library is recommended, which format of library should be imported to analyze DIA data acquired timsTOF? Can you send the recommended parameters you appliedin analyzing diaPASEF data to my emailbox: [email protected]

Thanks a lot
Liang

Calculation of protein level FDR starting from report.tsv

Dear Vadim, many thanks for DiaNN which has really opened a new chapter for DIA processing.

I would like to ask you some guidance on how to calculate protein level FDR using DiaNN PSM scores from targets and decoys:

  • Firstly, I assume that the two scores columns plus the peptide sequence is all you need for this calculation . Please let me know otherwise.

  • Regarding the actual calculation, how are PSMs aggregated , eg choice of aggregation method, perhaps PSM selection ? If there is a reference you followed closely maybe that would be an easy way forward.

  • Lastly, is there a simple way to deduce the sequence of the decoy peptides ? software like percolator needs target and decoy sequences , but they dont seem essential if you have the scores.

Many thanks !

Decoy method

I have used DIANN version 1.6.0 , but in the generation of decoy, I have a question. In your paper just published in nature methods, you used mutate, but before you published in bioRxiv online Mar.15,2018, you used the inverse, which method is it?

Why does the same command have differently output on Linux and Windows?

Hi, vdemichev. When i was Run same command in Linux and Windows, It has a different output. Why is that? Did it affect the output?Thank you.

##Windows:
DiaNN.exe --f A20190303.mzML.dia --lib lib.tsv --threads 4 --verbose 1 --qvalue 0.01 --fasta swissprot_human_20180209_target.fasta --met-excision
**output: **
DIA-NN (Data Independent Acquisition by Neural Networks)
Compiled on Dec 20 2019 18:29:40
Thread number set to 4
Output will be filtered at 0.01 FDR
N-terminal methionine excision enabled

1 files will be processed
[0:00] Loading spectral library lib.tsv
[0:05] Spectral library loaded: 6034 protein isoforms, 5932 protein groups and 77055 precursors in 63976 elution groups.
[0:05] Loading protein annotations from FASTA swissprot_human_20180209_target.fasta
[0:06] Annotating library proteins with information from the FASTA database
[0:06] Protein names missing for some isoforms
[0:06] Gene names missing for some isoforms
[0:06] Library contains 5951 proteins, and 5930 genes
[0:06] Initialising library
[0:06] Saving the library to lib.tsv.speclib

[0:06] File #1/1
[0:06] Loading run A20190303.mzML.dia
[0:07] 77055 library precursors are potentially detectable
[0:07] Processing...
[0:11] RT window set to 2.13686
[0:11] Peak width: 3.464
[0:11] Scan window radius set to 7
[0:20] Optimised mass accuracy: 22.462 ppm
[0:50] Removing interfering precursors
[0:52] Training the neural network: 58088 targets, 73767 decoys
[1:04] Number of IDs at 0.01 FDR: 41243
[1:05] Calculating protein q-values
[1:05] Number of genes identified at 1% FDR: 4490 (precursor-level), 4072 (protein-level) (inference performed using proteotypic peptides only)
[1:05] Quantification
[1:06] Quantification information saved to A20190303.mzML.dia.quant.

[1:06] Cross-run analysis
[1:06] Reading quantification information: 1 files
[1:06] Quantifying peptides
[1:06] Assembling protein groups
[1:06] Quantifying proteins
[1:06] Writing report
[1:07] Report saved to report.tsv.
[1:07] Stats report saved to report.stats.tsv
[1:07] Writing gene report
[1:07] Gene report saved to report.genes.tsv.
Finished

##Linux:
./diann-linux --f A20190303.mzML.dia --lib lib.tsv --fasta swissprot_human_20180209_target.fasta --verbose 1 --qvalue 0.01 --met-excision
**output: **
DIA-NN (Data Independent Acquisition by Neural Networks)
Compiled on Dec 27 2019 02:07:56
Output will be filtered at 0.01 FDR
N-terminal methionine excision enabled

1 files will be processed
[0:00] Loading spectral library lib.tsv
[0:04] Spectral library loaded: 6034 protein isoforms, 5932 protein groups and 77055 precursors in 63976 elution groups.
[0:04] Loading protein annotations from FASTA swissprot_human_20180209_target.fasta
[0:04] Annotating library proteins with information from the FASTA database
[0:04] Protein names missing for some isoforms
[0:04] Gene names missing for some isoforms
[0:04] Library contains 5951 proteins, and 5930 genes
[0:04] Initialising library
[0:04] Saving the library to lib.tsv.speclib

[0:04] File #1/1
[0:04] Loading run A20190303.mzML.dia
[0:05] 77055 library precursors are potentially detectable
[0:05] Processing...
[0:26] RT window set to 2.32759
[0:26] Peak width: 3.336
[0:26] Scan window radius set to 7
[1:03] Optimised mass accuracy: 14.3699 ppm
[2:29] Removing interfering precursors
[2:34] Training the neural network: 59694 targets, 71662 decoys
[3:02] Number of IDs at 0.01 FDR: 41744
[3:03] Calculating protein q-values
[3:03] Number of genes identified at 1% FDR: 4497 (precursor-level), 4113 (protein-level) (inference performed using proteotypic peptides only)
[3:03] Quantification
[3:07] Quantification information saved to A20190303.mzML.dia.quant.

[3:07] Cross-run analysis
[3:07] Reading quantification information: 1 files
[3:07] Quantifying peptides
[3:07] Assembling protein groups
[3:07] Quantifying proteins
[3:07] Writing report
[3:08] Report saved to report.tsv.
[3:08] Stats report saved to report.stats.tsv
[3:08] Writing gene report
[3:08] Gene report saved to report.genes.tsv.
Finished

specLib does not store info about what raw file(s) entries came from?

I'm working on a reader for BiblioSpec to convert the specLib format to blib format. Looking at the data structures in diann.cpp, it appears that Library (and thus the specLib format) does not keep track of which raw runs had evidence of which precursors. Have I missed something or was that linkage just not important for DiaNN?

warning: unrecognized modification

Hi Vadim,
I had a question regarding the deep learning feature in DIA-NN. I am analyzing PTM data and specify the respective UniMod with the --var-mod command. When using library-free deep learning, I get the following error message in the log: Encoding peptides for spectra and RTs prediction
WARNING: skipping 3738390 precursors; unrecognised modifications?
Has this to do with the variable modification I specified? does it have any negative impact on the analysis in general?
Thanks for clarifying!

best wishes,
Martin

Precursor.normalised = 0 with Q.Value < 1%

Hi Vadim,

I noticed that in my dataset processed with DiaNN, for some precursors there is a very small Q.Value (< 1%), but with a precursor quantity Precursor.normalised (and/or Precursor.Quantity) equal to 0. With such low q-values, I would expect high precursor quantities, corresponding to a good quantification, so I'm confused about this 0 value.
This does not happen often, but I can still flag around 1000 precursors in this case.
Could you help me understand these cases?

Thank you in advance.

Best regards,
Tuvana

Linux:Segmentation fault (core dumped)

I have 6,000 DIA to run, so I had to run multiple computers on Linux. So far, I have tested two samples and reported an error: “Segmentation fault (core dumped)”, and the result file cannot be generated. The test parameters are as follows.
diann-linux --f A20181202sunyt_shaoyk_CRCA_DIA_b21_8.mzML --f A20181202sunyt_shaoyk_CRCA_DIA_b21_9.mzML --lib G20190417chenlb_38IDA_spn.xls --threads 20 --verbose 1 --out "report.tsv" --out-gene "report.genes.tsv" --qvalue 0.01 --fasta swissprot_human_20180209_target_IRT_contaminant.fasta --individual-mass-acc --individual-windows --rt-profiling --pg-level 1
In addition, I have two questions, 1. Can you provide all the parameters of the Linux version ?2. Can the results obtained from multiple computers be directly combined.

Bug report.

When there are two tabs (two pipelines running) and the output folder is the same, DiaNN seems to write the result files that are finished first, and the other pipeline results will not be written.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.