hupo-psi / usi Goto Github PK
View Code? Open in Web Editor NEWUniversal Spectrum Identifier for Mass Spectrometry
License: Apache License 2.0
Universal Spectrum Identifier for Mass Spectrometry
License: Apache License 2.0
Here a file with all the USIs corresponding to the reanalysis performed using quantms and DIANN workflow.
At least the proteomicsdb.org example in the Methods section of https://doi.org/10.1038/s41592-021-01184-6 is not working (both PDF and online version). See the screenshot below.
ACTION: Everyone: Consider the following TripleTOF spectrum and see how to support it
mzspec:PXD013210:TTB20160722_ISBHJOMXX001879_r01:scan:19809:SITS[phospho]PTTLYDR/2
https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/ShowObservedSpectrum?usi=mzspec:PXD013210:TTB20160722_ISBHJOMXX001879_r01:scan:19809:SITS[phospho]PTTLYDR/2
Scan numbers should absolutely not be used to identify WIFF spectra. There's simply no reliable way to get back from a scan number to the <sample, period, cycle, experiment> tuple that is necessary to actually pinpoint a spectrum in a WIFF file. Limiting the id to a single number makes the "universal" modifier rather inaccurate. :) The same goes for Waters spectra, where function and scan are orthogonal and both are needed to pinpoint a spectrum in the .raw data.
Index is also unsuitable for maintaining a link back to the native spectrum, especially for multi-dimension formats (WIFF and Waters .raw). Because the enumeration order of the dimensions is not guaranteed nor is there any clarity that the indexes used for any format are based on a completely unfiltered enumeration of data. In other words, someone generating USIs from a DDA mzML that has been filtered to only MS1s will get different indices than someone looking them up in an unfiltered file. It's simply not worth the potential for confusion!
We already solved this problem a decade ago with mzML and nativeIDs. Since they can be a bit verbose in a USI which is already quite long, I suggest we use an abbreviated format. Instead of "controllerType=0 controllerNumber=1 scan=123" we can put "MS:1000768:0.1.123" which is the combination of the Thermo nativeID accession and the abbreviated nativeID. Likewise:
MS:1000770:1.1.123.2
MS:1000769:1.0.123
MS:1000772:123
MS:1000773:_x0031_00_x0020_fmol_x0020_BSA_x002f_0_B1_x002f_1_x002f_1SRef_x002f_fid
(this is an encoded version of 100 fmol BSA/0_B1/1/1SRef/fid
because IDREF is the datatype)MS:1000774:123
MS:100776:123
The WIFF nativeID also solves another problem described here: the sample index in the WIFF file which can contain multiple samples which are NOT necessarily named uniquely. For a WIFF file, the "run name" part of the USI should refer ONLY to the WIFF filename, not the sample name.
However, there is an unresolved discussion about nativeIDs in the soon-to-be-recommended 3-array representation for ion mobility spectra in mzML. That discussion should apply to USIs as well, probably even more urgently because USIs may be paired with a spectrum interpretation. A single 3-array diaPASEF (or Agilent/Waters full IM frame) spectrum may correspond with multiple peptides. When the peptides are separated in the IM dimension, then creating a combined spectrum actually combines evidence that could otherwise be kept separate and combined for each peptide individually (using a unique range of mobility scans).
For example, let's say there is a Waters IM frame, which has 200 mobility scans (they all have the same retention time but cover a range of drift times). One peptide at drift time 5ms is supported by scans 50-60, and another peptide at drift time 10ms is supported by scans 120-130. If the combined spectrum was the entire frame of 200 scans (as @edeutsch suggested in email), then that evidence would all be combined in the same spectrum, and USIs to the spectrum would be ambiguous (kind of like a chimeric spectrum). When reading/converting the raw data, there's no interpretation of course, so a reader/converter can't know that the spectra should be separated by drift time. I was going to suggest that the raw spectra be given the full range of drift scans explicitly, like frame=123 scanStart=1 scanEnd=200
and the interpreting software can make a USI with a subset of the start/end range to refer to a specific subset of mobility scans. But I feel that's too complex if accessing the full combined spectrum in mzML. I think it makes more sense to make sure the USIs for ion mobility identifications include the IM window so reader code can do its own filtering (similar to using the peptide sequence to infer the precursor and product m/zs). The same logic would apply for diaPASEF, but not ddaPASEF. The latter can be easily separated into combined spectra with just the subset of the mobility range relevant to a specific precursor (e.g. frame=123 scanStart=456 scanEnd=567
for precursor 678.9). It's worth noting that ddaPASEF spectra are usually further merged (between frames) for searching purposes, and I think representing that is outside the scope of nativeIds. So those spectra, if searched, could only be tracked back to the mzML or MGF file (a merged=123
spectrum).
Dear Professor
Sorry for disturbing you. I'd like to ask you a question.
The Universal Spectrum identifiers is a very professional tool. I recently used Universal Spectrum identifiers to look up the PSM, but clicking "look up USI" kept saying "USI not found at any of the repositories!" (Universal Spectrum Identifier // ProteomeXchange). But I have checked the relevant information and there is no mistake, and tried other USIs under this project are not running.
I found that the USI that can be looked up successfully displays "Updated project metadata" in "2014" in the column of DATASET HISTORY in the information page of the corresponding project.
For example:
mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_05_2Feb12_Cougar_11-10-09.mzML:scan:12298:[iTRAQ4plex]-LHFFM[Oxidation]PGFAPLTSR/3
The " Updated project metadata in 2014" is not displayed in the DATASET HISTORY column of the project which can not be looked up successfully.
For example: mzspec:PXD006512:CNHPP_HCC_LC_profiling_L006_P_F1:scan:64442:VADALTNAVAHVDDMPNALSALSDLHAHK/3
mzspec:PXD006201:20150804SL_Qe2_HEP2_UBISITE_rep1_A_15_HpH_6:scan:15223:TLSDYNIQK[UNIMOD:1290]/2
I guess if some PSM could not be displayed on USI because there was no "Updated project metadata". Will the tool be able to update project metadata on all projects in the future?
Can you help me with this problem? Relevant information is provided below
Thanks for your attention and time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.