hupo-psi / proforma Goto Github PK
View Code? Open in Web Editor NEWHUPO-PSI Standardized peptidoform notation
HUPO-PSI Standardized peptidoform notation
There are some minor inaccuracies in some of the examples in the specification draft 12:
EM[R: Methionine sulfone]EVEES[O-phospho-L-serine]PEK
-> This term doesn't appear in RESID. Note the leading space, but even without that the name is incorrect. Probably it should be L-methionine sulfone
(RESID:AA0251
)?EM[UNIMOD:15]EVEES[UNIMOD:56]PEK
-> accession UNIMOD:15
does not exist. In case consistency with the previous examples is desired, UNIMOD:35
corresponds to Oxidation
. Same for the invalid example with U:15
just underneath.EVTSEKC[half-cystine]LEMSC[half-cystine]EFD
-> half-cystine
should be half cystine
(no hyphen).242.0096
as the mass with four decimals.More conceptual question:
Q: page 14: Parsing glycan compositions is somewhat non-trivial because some labels overlap. It would be easier if spaces between monosaccharides are used (split on space) or cardinality is always specified (split on [a-zA-Z]+\d+
). Maybe this can be a bit more strongly recommended in section 4.2.8?
A: Parsing is possible without enforcing spaces or cardinality by checking for only defined monosaccharides rather than any string.
Q: page 18: I'm a bit confused how parsers should interpret that global modifications are isotopes? The examples (13C
, 15N
, D
) don't seem to be specified using a controlled vocabulary, whereas this is the case throughout the rest of the document. Is it that when no @
is used in the global modification part, as specified in section 4.6.2, it should always be considered an isotope instead?
A: Yes, I currently interpret global modifications of the form INT* LETTER+ SIGNED_INT*
as an isotope and global modifications of the form "[" mod "]@" (AA ",")* AA
as global amino acid modifications (so square brackets and "@" sign).
Q: page 19: How should multiple global modifications on different amino acids be specified? I guess the following example, with a comma separating the global modifications within the angular brackets, would lie in line with the spec, but this is not explicitly detailed: <[Carbamidomethyl]@C,[Oxidation]@M>MTPEILTCNSIGCLK
.
A: Multiple global modifications are each specified in their own block between angled brackets.
I would like to see a paragraph in the specification indicating how proteoform sequence truncations are to be specified. N-terminal truncations may be biological, as in the removal of the initial Met (perhaps with PTM) or the cleavage of a signal peptide or the action of a viral protease. The truncations may be instead be related to sample treatment, such as a rare cutter like CNBr for middle-down proteomics or due to a "hot" ion source. I believe ProForma should specify how a proteoform sequence compares to the sequence described by the accession, such as indicating the position of the first and last amino acids in the accession's sequence. Are amino acids preceding and succeeding the proteoform sequence expected to be included?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.