Coder Social home page Coder Social logo

mzml's People

Contributors

bittremieux avatar dependabot[bot] avatar edeutsch avatar germa avatar hechth avatar mailaender avatar mobiusklein avatar samsonjm avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mzml's Issues

`xs:ID` type of `id` attribute of `RunType` makes many mzml files starting with digits as file or sample name invalid.

We're currently implementing a validator tool in Galaxy that simply takes an mzml file and uses the XSD schema to validate files using pyxml or xmllinter and we found that the <run> ... </run> field has an id attribute which has to be an xs:ID type, meaning it can't start with a number. But proteowizard seems to be filling this field with the sample name, which can contain a number at the start (and often does, like the position, order in the study, timestamp etc.), meaning that many mzml files which start with a number are technically invalid.

I personally don't see a reason why the id of the RunType can't be a xs:string - was there a specific reason for the decision?

I'd therefore like to propose to change the id attribute of the RunType from xs:ID to xs:string.

If you agree with the change I can open up a PR with the requested changes to the XSD file. Is there anything else that has to be adapted to make this change?

Mapping for the ion mobility terms

ProteoWizard has for years been writing the individual scan ion mobility CV terms in the wrong place (according to the mapping file), but I think it's the CV terms that should be changed. The CV has them as "ion selection attributes" which the mapping file only allows as a child of "selectedIon". MS1 spectra won't have a "selectedIon" (or a precursor), but MS1s are frequently still separated by ion mobility. ProteoWizard has been writing these terms in the element. Can we update this term to be a scan attribute, or perhaps both (for backward compatibility?). Or at least not somewhere that's not limited to MSn scans? I think my last message about this was sent to the vocab mailing list back in 2016, so obviously it isn't urgent, but it also shouldn't be hard to fix if a fix is warranted.

We should also consider putting "collisional cross section" in an attribute type that can go in the mzML somewhere. I'm not sure exactly where. It's a molecular property, but it's also possible to convert an instrument's mobility value to CCS if it's calibrated and you know the charge and gas and such. So theoretically a CCS could go anywhere a raw ion mobility value could go?

These terms would be affected:

[Term]
id: MS:1002815
name: inverse reduced ion mobility
def: "Ion mobility measurement for an ion or spectrum of ions as measured in an ion mobility mass spectrometer. This might refer to the central value of a bin into which all ions within a narrow range of mobilities have been aggregated." [PSI:MS]
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units MS:1002814 ! volt-second per square centimeter
relationship: has_value_type xsd:float ! The allowed value-type for this CV term

id: MS:1003371
name: SelexION compensation voltage
def: "The voltage applied in the SelexION device to allow certain ions to transmit through to the mass spectrometer." [PSI:MS]
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000218 ! volt
relationship: has_value_type xsd:float ! The allowed value-type for this CV term

[Term]
id: MS:1003394
name: SelexION separation voltage
def: "RF voltage applied in the SelexION device to separate ions in trajectory based on the difference in their mobility between the high field and low field portions of the applied RF." [PSI:MS]
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000218 ! volt
relationship: has_value_type xsd:float ! The allowed value-type for this CV term

[Term]
id: MS:1001581
name: FAIMS compensation voltage
def: "The DC potential applied to the asymmetric waveform in FAIMS that compensates for the difference between high and low field mobility of an ion." [PSI:MS]
synonym: "FAIMS CV" EXACT []
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000218 ! volt
relationship: has_value_type xsd:double ! The allowed value-type for this CV term

[Term]
id: MS:1002476
name: ion mobility drift time
def: "Drift time of an ion or spectrum of ions as measured in an ion mobility mass spectrometer. This time might refer to the central value of a bin into which all ions within a narrow range of drift time have been aggregated." [PSI:MS]
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000028 ! millisecond
relationship: has_value_type xsd:float ! The allowed value-type for this CV term

List of mzML parsers in different programming languages

As discussed at the HUPO Busan Bioinformatics Hub, it would be great to compile a list of all the available mzML (and mzMLb?) parsers for different programming languages.

Off the top of my head and a quick google search I came up with the following:

Python:

C++:

C#:

Java:

R:

Ruby:

Rust:

[mzML] sourceFileList: count=0` is valid, but violates `minOccurs="1"

Hi, we're currently discussing a test failure in our mzR package in sneumann/mzR#192

      <sourceFileList count="0">
      </sourceFileList>

xmlSchemaValidate(mzML_xsd_idx, out_file):
 "Element '{http://psi.hupo.org/ms/mzml}sourceFileList': Missing child element(s). 
Expected is ( {http://psi.hupo.org/ms/mzml}sourceFile )."

This is triggered by a minOccurs in the XSD:

  <xs:complexType name="SourceFileListType">
    <xs:annotation>
      <xs:documentation>List and descriptions of the source files this mzML document was generated or derived from</xs:documentation>
    </xs:annotation>
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="unbounded" name="sourceFile" type="dx:SourceFileType"/>
    </xs:sequence>
    <xs:attribute name="count" type="xs:nonNegativeInteger" use="required">
      <xs:annotation>
        <xs:documentation>Number of source files used in generating the instance document.</xs:documentation>
      </xs:annotation>
    </xs:attribute>
  </xs:complexType>

I am wondering whether that is actually a schema error:
if there are no sourceFiles, then count="0", which sounds right,
count=0 is valid, but violates minOccurs="1"

So name="count" type="xs:nonNegativeInteger" should be constraint by a <minInclusive value='1'/> (https://www.w3.org/TR/xmlschema-2/#rf-minInclusive)
or the sourceFileList would have to be optional. If sourceFileList became optional,
existing files will remain valid in such an updated schema.

Similar issues might be lurking in other *ListTypes, but I haven't checked yet.

Yours, Steffen

mzML validator

Hi,
I wonder where can I find the mzML validator now.
Thanks
Da

No example files since version 1.0.0

I'm writing a .mzML parser and was sitting down to write unit tests, and I noticed that all provided example files are from versions before 1.0.0. The first issue I've noticed is that I can't then test the cv "id" attribute property, as they seem to use the (what I believe to be depricated) "cvLabel" attribute.

As I don't believe I have an example that covers all use cases, it would be beneficial if someone could provide [an] example file[s] for at the very least version 1.1.0.

Moving CV ontologies to another repo

Hi all,

One thing I'd like to bring up as an idea:
since there is migratory adaption to be made by most CV consumers due to the GitHub move anyway,
might it not also be good to create an own repository for the CV?
-It is not only used in the mzML domain (predominantly though), so it is not the most intuitive place to look for it.
1- We may have a readme.md for the repository or even a github page
2- And CV issues would not clog the mzML repository
3- It would make it way easier to propose changes (fork & branch + change -> pullrequest: github shows the diff nicely rendered on the pullrequest page, where additional comments and discussion can be made.)

furthermore, a small remark, if you want to use direct web file access via your programs
https://raw.githubusercontent.com/HUPO-PSI/mzML/master/cv/psi-ms.obo
might be the more robust choice. Or https://raw.githubusercontent.com/HUPO-PSI/ControlledVocabulary/master/... if we dare this additional step.

In any case, thanks for the work on the CV, Gerhard!

best,
@mwalzer

Validation error in mzML with Thermo RAW with UV/PDA data from Orbitrap Elite

Hi,

We have an issue with an msconvert converted mzML file obtained from an Orbitrap Elite (software versions below).
In the mzML there are three <instrumentConfigurations>, one for the Orbitrap, one for the ion trap, and finally one for the UV detector. There are <spectrum> with MS data, and others with UV/PDA data in the mzML, referencing the respective instrument configurations.

OpenMS FileInfo validation complains with

Validating mzML file against XML schema version 1.1.0
Validation error in file 'LAA_MM8_nFS.mzML' line 78 column 25: 
     element 'detector' is not allowed for content model '(source+,analyzer+,detector+)'
Failed - errors are listed above!

which is because the UV/PDA does not really come with a source and analyser component.
=> That looks like an issue with the mzML specification (or documentation) to me.

I'd like to start discussing possible solutions:

  1. Have msconvert drop the instrument information, what isn't there can't fail validation. Poor choice, because we loose that information.
  2. Relax the mzML schema definition, and don't enforce all three of (source+,analyzer+,detector+)'
  3. Use Empty/Null values for source+,analyzer+' but require their presence to make validation happy.
  4. Something else.

Ideas ?

Yours,
Steffen

mzML XSD with componentList definition:

<xs:complexType name="ComponentListType">

The instrument components contain 0:n cvParams, so they could be left empty:

<xs:complexType name="ParamGroupType">

    <softwareList count="2">
      <software id="Xcalibur" version="2.7.0 SP1">
        <cvParam cvRef="MS" accession="MS:1000532" name="Xcalibur" value=""/>
      </software>
      <software id="pwiz" version="3.0.22242">
        <cvParam cvRef="MS" accession="MS:1000615" name="ProteoWizard software" value=""/>
      </software>
    </softwareList>
...
      <instrumentConfiguration id="IC3">
        <referenceableParamGroupRef ref="CommonInstrumentParams"/>
        <componentList count="1">
          <detector order="1">
            <cvParam cvRef="MS" accession="MS:1000621" name="photodiode array detector" value=""/>
          </detector>
        </componentList>

Raw data available at https://drive.google.com/file/d/1EEUO_F1X1PLYe10qsKYTNwRO5FnHkNqu/view?usp=sharing
mzML at https://drive.google.com/file/d/1ogA7lIfeYAZKQ7vdwA3L-jeu9Gbd8W9v/view?usp=sharing

Software/Version: Xcalibur 2.2 - Qual Browser Thermo Xcalibur 2.2 SP1.48 (Analysis), Orbitrap Elite 2.7 - LTQ Tune Plus Version 2.7.0.1103 SP1 (Control)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.