Coder Social home page Coder Social logo

Comments (6)

edeutsch avatar edeutsch commented on August 22, 2024

This is currently legal:
[TMT6plex]-ATPEILTCNSIGCLK[TMT6plex]
<[TMT6plex]@k>[TMT6plex]-ATPEILTCNSIGCLK

Options to extend:
<[TMT6plex]@k,N-term>ATPEILTCNSIGCLK (use 'N-term' and 'C-term')
<[TMT6plex]@k,n>ATPEILTCNSIGCLK (use lower case n and c)
<[TMT6plex]@k,^>ATPEILTCNSIGCLK (use ^ for N-term and $ for C-term)

ProForma current allows amino acids to be lower case, so the second is not a good idea
Seems like the preferred format would be:
<[TMT6plex]@k,N-term>ATPEILTCNSIGCLK

If we wanted to support N-term amino acids:
<[TMT6plex]@k,N-term,N-term A>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term,N-term(A)>ATPEILTCASIGCLK

<[TMT6plex]@k,N-term,N-term:AS>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term,N-term(AS)>ATPEILTCASIGCLK

After discussion, end up
<[TMT6plex]@k,N-term>ATPEILTCASIGCLK
<[TMT6plex]@k,N-term:A,N-term:S>ATPEILD[U:Cation:Fe[III]]CASIGCLK

Discuss again with other ProForma 2.0 stakeholders

Other potential things to change:

  • Clearly specify the order of <>{}[] at the front
  • This is not clearly defined in the text of the spec. Update to specify more clearly.

from proforma.

douweschulte avatar douweschulte commented on August 22, 2024

As a bit of a follow up thought after the meeting. I would argue for @N-term:ABC as a valid representation of the concept of a modification on the N terminus of Alanine, Ambiguous glutamine, or Cysteine. The idea in the meeting itself was to not allow this form and instead use @N-term:A,N-term:B,N-term:C which is a slightly easier grammar.

My argument for allowing the first form is that this is easier to type out. This makes the grammar slightly more complex but there is no dividing character used so the rule is that anything (alphabetic characters only) following the colon is a location where this modification can be placed. In terms of logic for the parser this is not much more complex because it had to check if the character following the initial amino acid is a comma anyways and with this addition it just has to keep taking input until the next comma.

On the level of complexity for the intermediate representation used any program using pro forma notation I would argue there is no difference in either syntax. So that means that any program able to handle the N-term:A,N-term:B notation can without any changes to the code (except for the parser of course) handle the N-term:AB notation.

But I am quite interested to hear about the feasibility from the other people writing ProForma parsers. This mostly reflects how my parser is written and it might be harder if you are using other libraries or parser generators.

from proforma.

mobiusklein avatar mobiusklein commented on August 22, 2024

My argument against the N-term:ABC notation, or packing for ease of reference, is that it introduces an extra layer of complexity and it introduces a second way of specifying a list of amino acid targets. The first is colored by my own implementation choices, but suppose we have the following abstract types:

class ModificationRule {
  modification: Modification
  targets: List<ModificationTarget>
}

class ModificationTarget {
  amino_acid: String | null
  terminal: String | null
}

This fully covers the first existing usage, where each amino acid is a separate ModificationTarget. If we allow packing we now need to allow a ModificationTarget to cover multiple amino acids, or we need to add an extra step after parsing where we split those overloaded targets into separate entries. If we allow variadic ModificationTargets, then we break an implicit contract that a target is about a single amino acid. If we do introduce an intermediate splitting step, we break the 1:1 assumption between syntax and representation, and unless you implement rule merging, N-term:ABC may then be rendered N-term:A,N-term:B,N-term:C. ProForma explicitly doesn't advocate standard canonicalization rules, but round-tripping is nice to have.

The second concern is a syntax to semantics concern. Suppose I write N-term:ABC, and then say "Ah but I also need this rule to target Z, X and Q not on the N-terminal". The spec says I should then write Z,X,Q,N-term:ABC, but I just packed ABC together, so why can't I write ZXQ,N-term:ABC, or I may write Z,X,Q,N-term:A,B,C because I think I have a list of targets.

Neither is intractable to break, and others may implement things in such a way that this is not an issue.

from proforma.

douweschulte avatar douweschulte commented on August 22, 2024

I do the grouping internally already, so for me on the parser side there is no problem. But your second argument on semantics I fully agree with. So that leaves me in favour of the unpacked syntax.

from proforma.

edeutsch avatar edeutsch commented on August 22, 2024

Original intent:
AC[Carbamidomethyl]AHC[Carbamidomethyl]HAC[Carbamidomethyl]FC[Carbamidomethyl]AC[Carbamidomethyl]
<[Carbamidomethyl]@C>ACAHCHACFCAC

<[Carbamidomethyl]@C>AAHHAFA
(should this be legal? It is according to the current spec, but does it violate the spirit of what ProForma was trying to do?)

Do we want to amend the specification to ProForma 2.1 to clarify these things?
Or should we have an addendum document that clarifies things in ProForma 2.0 that were not clearly specified

Douwe's code has the capability to read a ProForma string that has all the fixed modifications prefixed and normalizes it to what is actually in the peptide.

If we added the N-term support, it would be a breaking change, and would be ProForma 2.1

TODO: Start a Google doc in which we start documenting and resolving these various open issues, including #8 and #9
TODO: Juan will put ProForma 2.0 into an editable Google doc
TODO: Douwe will create a Google doc that is an addendum/clarification of 2.0

from proforma.

bittremieux avatar bittremieux commented on August 22, 2024

<[Carbamidomethyl]@C>AAHHAFA
(should this be legal? It is according to the current spec, but does it violate the spirit of what ProForma was trying to do?)

This is fine imo.

from proforma.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.