Coder Social home page Coder Social logo

Comments (9)

javizca avatar javizca commented on July 2, 2024 1

In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned.

A: I have added in the specification document (Section 4.3.2): "Labile modification MUST be located before the first amino acid sequence and before N-terminal modifications, if applicable".
In Section 4.6.2: "Fixed modifications MUST be written prior to ambiguous and labile modifications".

from proforma.

mobiusklein avatar mobiusklein commented on July 2, 2024 1

Clearly I didn't submit my note about Neu5Ac last night. Neu5Ac is synonymous with NeuAc. Mincing the monosaccharide apart to determine where the acetyl group is attached is also impossible with the current dissociation methods available. You can find NeuAc with additional O-acetyl groups (though they are pretty fragile and are easily lost in sample processing), but GNOme doesn't index them.

The OBO and the generated JSON file list all the synonyms for each monosaccharide, though most monosaccharides aren't listed in the ProForma spec, and a very restricted subset are actually indexed in GNOme.

My parser isn't handling this properly either. I just wrote the common names from memory.

from proforma.

mobiusklein avatar mobiusklein commented on July 2, 2024

RE Glycan formula parsing, I thought that spaces were required already. Otherwise, without constructing an unambiguous longest-to-shortest testing order, it wouldn't be possible to solve in the general case without extreme look-ahead. It's still doable with a fixed list of monosaccharides.

For multiple global modifications, they should be in separate angle brackets, following the example in 4.6.1?

Both Carbon 13 and Nitrogen 15: <13C><15N>ATPEILTVNSIGQLK

I think this fits similarly to how curly-brace syntax specifies one labile modification, though in that case it takes the place of the square braces. It would make the angle bracket section really laborious to parse if we had to overload , to be a possible state transition

from proforma.

bittremieux avatar bittremieux commented on July 2, 2024
  • Glycans: No, the spaces are optional, with the possibility to make this mandatory mentioned in section 4.2.8:

If glycan symbols conflict with themselves or element symbols in such a way that ambiguities occur, we will consider requiring spaces between 'atoms' (see Formula Rule #1).

And formula rule 1 includes:

Pairs SHOULD be separated by spaces but are not required to be.

Maybe this should be revisited?

  • Global modifications: Ok, makes sense, thanks. I glossed too quickly over the example in 4.6.1.

from proforma.

bittremieux avatar bittremieux commented on July 2, 2024

Additionally, I have the following comments about the specification draft 13:

Minor comments:

  • The long example at the top of page 13 should use "//" instead of "\\" to represent the inter-chain crosslink.
  • Example (b) of branched peptides in section 4.2.4 page 13 uses non-existing modification MOD:000134. This should probably be MOD:00134 (one fewer 0).
  • Example {Glycan:Hex}{Glycan:NeuAc}EMEVNESPEK contains an invalid glycan. NeuAc should probably be Neu5Ac?
  • Example MPGLVDSNPAPPESQEKKPLK(PCCACPETKKARDACIIEKGEEHCGHLIEAHKECMRALGFKI)[disulfide][Oxidation][Oxidation] in section 4.5 on page 21 includes the non-existing modification disulfide (in UNIMOD or PSI-MOD).
  • In section 4.9, page 23, the reference to section 4.2.5 should become 4.2.6.
  • On page 32, the example [U:iTRAQ4plex]EM[U:Oxidation]EVNES[U:Phospho]PEK[U:iTRAQ4plex]-[U:Methyl]/3 should probably have the first iTRAQ4plex as an N-terminal modification? The "-" is missing in that case.

Suggestions / questions:

  • In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned.
  • I don't fully understand section 4.7 on amino acid sequence ambiguity. What does it mean if a single or multiple amino acids are specified to be ambiguous? What is the position where this should be specified w.r.t. other tags that are included at the start of the string?
  • If a pipe character is used to list multiple options for a modification (section 4.9), can each option have an associated label, specified with #, or should there only be a single label after all options have been listed?

from proforma.

javizca avatar javizca commented on July 2, 2024

Thanks a lot Wout for all your minor corrections. I think all of them are correct apart from the NeuAc, which, as far as I can see it is a valid glycan?. I also considered your previous comments on draft 12.

from proforma.

bittremieux avatar bittremieux commented on July 2, 2024

I think all of them are correct apart from the NeuAc, which, as far as I can see it is a valid glycan?

Right, this does seem to be a glycan (shows that I don't know much about it). It failed my validation though because apparently it's listed as a synonym of Neu5Ac in the monosaccharides OBO and I was only considering the default names.

from proforma.

bittremieux avatar bittremieux commented on July 2, 2024

All right, so the proper order is like this?

<GLOBAL_MOD>[UNKNOWN_POS]?{LABILE_MOD}[N_TERM]-PEPTIDE-[C_TERM]

from proforma.

edeutsch avatar edeutsch commented on July 2, 2024

In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned.

A: I have added in the specification document (Section 4.3.2): "Labile modification MUST be located before the first amino acid sequence and before N-terminal modifications, if applicable".
In Section 4.6.2: "Fixed modifications MUST be written prior to ambiguous and labile modifications".

It is my recollection that a {labile} modification can appear anywhere that a [non-labile modification] can appear. The only difference is that the writer is making the statement that there is not (or there is not expected to be) any evidence of the mod in a particular location because it is completely labile. So the peptidoform SMALLS{Sulfo}NACK simply means that the writer believes that the sulfo is on the second S, but there is no trace of that in the associated evidence because the mod is (or is expected to be) completely labile.

And thus it counts when computing the precursor m/z, but it can be ignored when computing abcxyz ions because it is labile.

Therefore I don't think it is confined to a specific location. {} is equivalent to [] but with a "labile" meaning. Does anyone else remember that or am I confused?

from proforma.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.