There are some minor inaccuracies in some of the examples in the specification draft 1

Glycans: No, the spaces are optional, with the possibility to make this mandator

All right, so the proper order is like this? <code class="notranslat

Minor inconsistencies in spec about proforma HOT 9 OPEN

bittremieux commented on July 2, 2024

Minor inconsistencies in spec

from proforma.

Comments (9)

javizca commented on July 2, 2024 1

In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned.

A: I have added in the specification document (Section 4.3.2): "Labile modification MUST be located before the first amino acid sequence and before N-terminal modifications, if applicable".
In Section 4.6.2: "Fixed modifications MUST be written prior to ambiguous and labile modifications".

from proforma.

mobiusklein commented on July 2, 2024 1

Clearly I didn't submit my note about Neu5Ac last night. Neu5Ac is synonymous with NeuAc. Mincing the monosaccharide apart to determine where the acetyl group is attached is also impossible with the current dissociation methods available. You can find NeuAc with additional O-acetyl groups (though they are pretty fragile and are easily lost in sample processing), but GNOme doesn't index them.

The OBO and the generated JSON file list all the synonyms for each monosaccharide, though most monosaccharides aren't listed in the ProForma spec, and a very restricted subset are actually indexed in GNOme.

My parser isn't handling this properly either. I just wrote the common names from memory.

from proforma.

mobiusklein commented on July 2, 2024

RE Glycan formula parsing, I thought that spaces were required already. Otherwise, without constructing an unambiguous longest-to-shortest testing order, it wouldn't be possible to solve in the general case without extreme look-ahead. It's still doable with a fixed list of monosaccharides.

For multiple global modifications, they should be in separate angle brackets, following the example in 4.6.1?

Both Carbon 13 and Nitrogen 15: <13C><15N>ATPEILTVNSIGQLK

I think this fits similarly to how curly-brace syntax specifies one labile modification, though in that case it takes the place of the square braces. It would make the angle bracket section really laborious to parse if we had to overload , to be a possible state transition

from proforma.

bittremieux commented on July 2, 2024

Glycans: No, the spaces are optional, with the possibility to make this mandatory mentioned in section 4.2.8:

If glycan symbols conflict with themselves or element symbols in such a way that ambiguities occur, we will consider requiring spaces between 'atoms' (see Formula Rule #1).

And formula rule 1 includes:

Pairs SHOULD be separated by spaces but are not required to be.

Maybe this should be revisited?

Global modifications: Ok, makes sense, thanks. I glossed too quickly over the example in 4.6.1.

from proforma.

bittremieux commented on July 2, 2024

Additionally, I have the following comments about the specification draft 13:

Minor comments:

The long example at the top of page 13 should use "//" instead of "\\" to represent the inter-chain crosslink.
Example (b) of branched peptides in section 4.2.4 page 13 uses non-existing modification MOD:000134. This should probably be MOD:00134 (one fewer 0).
Example {Glycan:Hex}{Glycan:NeuAc}EMEVNESPEK contains an invalid glycan. NeuAc should probably be Neu5Ac?
Example MPGLVDSNPAPPESQEKKPLK(PCCACPETKKARDACIIEKGEEHCGHLIEAHKECMRALGFKI)[disulfide][Oxidation][Oxidation] in section 4.5 on page 21 includes the non-existing modification disulfide (in UNIMOD or PSI-MOD).
In section 4.9, page 23, the reference to section 4.2.5 should become 4.2.6.
On page 32, the example [U:iTRAQ4plex]EM[U:Oxidation]EVNES[U:Phospho]PEK[U:iTRAQ4plex]-[U:Methyl]/3 should probably have the first iTRAQ4plex as an N-terminal modification? The "-" is missing in that case.

Suggestions / questions:

In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned.
I don't fully understand section 4.7 on amino acid sequence ambiguity. What does it mean if a single or multiple amino acids are specified to be ambiguous? What is the position where this should be specified w.r.t. other tags that are included at the start of the string?
If a pipe character is used to list multiple options for a modification (section 4.9), can each option have an associated label, specified with #, or should there only be a single label after all options have been listed?

from proforma.

javizca commented on July 2, 2024

Thanks a lot Wout for all your minor corrections. I think all of them are correct apart from the NeuAc, which, as far as I can see it is a valid glycan?. I also considered your previous comments on draft 12.

from proforma.

bittremieux commented on July 2, 2024

I think all of them are correct apart from the NeuAc, which, as far as I can see it is a valid glycan?

Right, this does seem to be a glycan (shows that I don't know much about it). It failed my validation though because apparently it's listed as a synonym of Neu5Ac in the monosaccharides OBO and I was only considering the default names.

from proforma.

bittremieux commented on July 2, 2024

All right, so the proper order is like this?

<GLOBAL_MOD>[UNKNOWN_POS]?{LABILE_MOD}[N_TERM]-PEPTIDE-[C_TERM]

from proforma.

edeutsch commented on July 2, 2024

In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned.

A: I have added in the specification document (Section 4.3.2): "Labile modification MUST be located before the first amino acid sequence and before N-terminal modifications, if applicable".
In Section 4.6.2: "Fixed modifications MUST be written prior to ambiguous and labile modifications".

It is my recollection that a {labile} modification can appear anywhere that a [non-labile modification] can appear. The only difference is that the writer is making the statement that there is not (or there is not expected to be) any evidence of the mod in a particular location because it is completely labile. So the peptidoform SMALLS{Sulfo}NACK simply means that the writer believes that the sulfo is on the second S, but there is no trace of that in the associated evidence because the mod is (or is expected to be) completely labile.

And thus it counts when computing the precursor m/z, but it can be ignored when computing abcxyz ions because it is labile.

Therefore I don't think it is confined to a specific location. {} is equivalent to [] but with a "labile" meaning. Does anyone else remember that or am I confused?

from proforma.

Minor inconsistencies in spec about proforma HOT 9 OPEN

Comments (9)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent