Fixed modifications, such as carbamidomethylation of C can be written as a global modi

My argument against the N-term:ABC notation, or <code

Explicit support for global terminal modifications about proforma HOT 6 OPEN

RalfG commented on August 22, 2024

Explicit support for global terminal modifications

from proforma.

Comments (6)

edeutsch commented on August 22, 2024

This is currently legal:
[TMT6plex]-ATPEILTCNSIGCLK[TMT6plex]
<[TMT6plex]@k>[TMT6plex]-ATPEILTCNSIGCLK

Options to extend:
<[TMT6plex]@k,N-term>ATPEILTCNSIGCLK (use 'N-term' and 'C-term')
<[TMT6plex]@k,n>ATPEILTCNSIGCLK (use lower case n and c)
<[TMT6plex]@k,^>ATPEILTCNSIGCLK (use ^ for N-term and $ for C-term)

ProForma current allows amino acids to be lower case, so the second is not a good idea
Seems like the preferred format would be:
<[TMT6plex]@k,N-term>ATPEILTCNSIGCLK

If we wanted to support N-term amino acids:
<[TMT6plex]@k,N-term,N-term A>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term,N-term(A)>ATPEILTCASIGCLK

<[TMT6plex]@k,N-term,N-term:AS>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term,N-term(AS)>ATPEILTCASIGCLK

After discussion, end up
<[TMT6plex]@k,N-term>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term:A,N-term:S>ATPEILD[U:Cation:Fe[III]]CASIGCLK

Discuss again with other ProForma 2.0 stakeholders

Other potential things to change:

Clearly specify the order of <>{}[] at the front
This is not clearly defined in the text of the spec. Update to specify more clearly.

from proforma.

douweschulte commented on August 22, 2024

As a bit of a follow up thought after the meeting. I would argue for @N-term:ABC as a valid representation of the concept of a modification on the N terminus of Alanine, Ambiguous glutamine, or Cysteine. The idea in the meeting itself was to not allow this form and instead use @N-term:A,N-term:B,N-term:C which is a slightly easier grammar.

My argument for allowing the first form is that this is easier to type out. This makes the grammar slightly more complex but there is no dividing character used so the rule is that anything (alphabetic characters only) following the colon is a location where this modification can be placed. In terms of logic for the parser this is not much more complex because it had to check if the character following the initial amino acid is a comma anyways and with this addition it just has to keep taking input until the next comma.

On the level of complexity for the intermediate representation used any program using pro forma notation I would argue there is no difference in either syntax. So that means that any program able to handle the N-term:A,N-term:B notation can without any changes to the code (except for the parser of course) handle the N-term:AB notation.

But I am quite interested to hear about the feasibility from the other people writing ProForma parsers. This mostly reflects how my parser is written and it might be harder if you are using other libraries or parser generators.

from proforma.

mobiusklein commented on August 22, 2024

My argument against the N-term:ABC notation, or packing for ease of reference, is that it introduces an extra layer of complexity and it introduces a second way of specifying a list of amino acid targets. The first is colored by my own implementation choices, but suppose we have the following abstract types:

class ModificationRule {
  modification: Modification
  targets: List<ModificationTarget>
}

class ModificationTarget {
  amino_acid: String | null
  terminal: String | null
}

This fully covers the first existing usage, where each amino acid is a separate ModificationTarget. If we allow packing we now need to allow a ModificationTarget to cover multiple amino acids, or we need to add an extra step after parsing where we split those overloaded targets into separate entries. If we allow variadic ModificationTargets, then we break an implicit contract that a target is about a single amino acid. If we do introduce an intermediate splitting step, we break the 1:1 assumption between syntax and representation, and unless you implement rule merging, N-term:ABC may then be rendered N-term:A,N-term:B,N-term:C. ProForma explicitly doesn't advocate standard canonicalization rules, but round-tripping is nice to have.

The second concern is a syntax to semantics concern. Suppose I write N-term:ABC, and then say "Ah but I also need this rule to target Z, X and Q not on the N-terminal". The spec says I should then write Z,X,Q,N-term:ABC, but I just packed ABC together, so why can't I write ZXQ,N-term:ABC, or I may write Z,X,Q,N-term:A,B,C because I think I have a list of targets.

Neither is intractable to break, and others may implement things in such a way that this is not an issue.

from proforma.

douweschulte commented on August 22, 2024

I do the grouping internally already, so for me on the parser side there is no problem. But your second argument on semantics I fully agree with. So that leaves me in favour of the unpacked syntax.

from proforma.

edeutsch commented on August 22, 2024

Original intent:
AC[Carbamidomethyl]AHC[Carbamidomethyl]HAC[Carbamidomethyl]FC[Carbamidomethyl]AC[Carbamidomethyl]
<[Carbamidomethyl]@C>ACAHCHACFCAC

<[Carbamidomethyl]@C>AAHHAFA
(should this be legal? It is according to the current spec, but does it violate the spirit of what ProForma was trying to do?)

Do we want to amend the specification to ProForma 2.1 to clarify these things?
Or should we have an addendum document that clarifies things in ProForma 2.0 that were not clearly specified

Douwe's code has the capability to read a ProForma string that has all the fixed modifications prefixed and normalizes it to what is actually in the peptide.

If we added the N-term support, it would be a breaking change, and would be ProForma 2.1

TODO: Start a Google doc in which we start documenting and resolving these various open issues, including #8 and #9
TODO: Juan will put ProForma 2.0 into an editable Google doc
TODO: Douwe will create a Google doc that is an addendum/clarification of 2.0

from proforma.

bittremieux commented on August 22, 2024

<[Carbamidomethyl]@C>AAHHAFA
(should this be legal? It is according to the current spec, but does it violate the spirit of what ProForma was trying to do?)

This is fine imo.

from proforma.

Explicit support for global terminal modifications about proforma HOT 6 OPEN

Comments (6)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent