hupo-psi / mzml Goto Github PK
View Code? Open in Web Editor NEWRepository for mzML and the corresponding examples
Repository for mzML and the corresponding examples
We're currently implementing a validator tool in Galaxy that simply takes an mzml file and uses the XSD schema to validate files using pyxml or xmllinter and we found that the <run> ... </run>
field has an id attribute which has to be an xs:ID
type, meaning it can't start with a number. But proteowizard seems to be filling this field with the sample name, which can contain a number at the start (and often does, like the position, order in the study, timestamp etc.), meaning that many mzml files which start with a number are technically invalid.
I personally don't see a reason why the id
of the RunType
can't be a xs:string
- was there a specific reason for the decision?
I'd therefore like to propose to change the id
attribute of the RunType
from xs:ID
to xs:string
.
If you agree with the change I can open up a PR with the requested changes to the XSD file. Is there anything else that has to be adapted to make this change?
ProteoWizard has for years been writing the individual scan ion mobility CV terms in the wrong place (according to the mapping file), but I think it's the CV terms that should be changed. The CV has them as "ion selection attributes" which the mapping file only allows as a child of "selectedIon". MS1 spectra won't have a "selectedIon" (or a precursor), but MS1s are frequently still separated by ion mobility. ProteoWizard has been writing these terms in the element. Can we update this term to be a scan attribute, or perhaps both (for backward compatibility?). Or at least not somewhere that's not limited to MSn scans? I think my last message about this was sent to the vocab mailing list back in 2016, so obviously it isn't urgent, but it also shouldn't be hard to fix if a fix is warranted.
We should also consider putting "collisional cross section" in an attribute type that can go in the mzML somewhere. I'm not sure exactly where. It's a molecular property, but it's also possible to convert an instrument's mobility value to CCS if it's calibrated and you know the charge and gas and such. So theoretically a CCS could go anywhere a raw ion mobility value could go?
These terms would be affected:
[Term]
id: MS:1002815
name: inverse reduced ion mobility
def: "Ion mobility measurement for an ion or spectrum of ions as measured in an ion mobility mass spectrometer. This might refer to the central value of a bin into which all ions within a narrow range of mobilities have been aggregated." [PSI:MS]
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units MS:1002814 ! volt-second per square centimeter
relationship: has_value_type xsd:float ! The allowed value-type for this CV term
id: MS:1003371
name: SelexION compensation voltage
def: "The voltage applied in the SelexION device to allow certain ions to transmit through to the mass spectrometer." [PSI:MS]
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000218 ! volt
relationship: has_value_type xsd:float ! The allowed value-type for this CV term
[Term]
id: MS:1003394
name: SelexION separation voltage
def: "RF voltage applied in the SelexION device to separate ions in trajectory based on the difference in their mobility between the high field and low field portions of the applied RF." [PSI:MS]
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000218 ! volt
relationship: has_value_type xsd:float ! The allowed value-type for this CV term
[Term]
id: MS:1001581
name: FAIMS compensation voltage
def: "The DC potential applied to the asymmetric waveform in FAIMS that compensates for the difference between high and low field mobility of an ion." [PSI:MS]
synonym: "FAIMS CV" EXACT []
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000218 ! volt
relationship: has_value_type xsd:double ! The allowed value-type for this CV term
[Term]
id: MS:1002476
name: ion mobility drift time
def: "Drift time of an ion or spectrum of ions as measured in an ion mobility mass spectrometer. This time might refer to the central value of a bin into which all ions within a narrow range of drift time have been aggregated." [PSI:MS]
is_a: MS:1000455 ! ion selection attribute
is_a: MS:1002892 ! ion mobility attribute
is_a: MS:1003254 ! peak attribute
relationship: has_units UO:0000028 ! millisecond
relationship: has_value_type xsd:float ! The allowed value-type for this CV term
As discussed at the HUPO Busan Bioinformatics Hub, it would be great to compile a list of all the available mzML (and mzMLb?) parsers for different programming languages.
Off the top of my head and a quick google search I came up with the following:
Python:
C++:
C#:
Java:
R:
Ruby:
Rust:
Hi, we're currently discussing a test failure in our mzR package in sneumann/mzR#192
<sourceFileList count="0">
</sourceFileList>
xmlSchemaValidate(mzML_xsd_idx, out_file):
"Element '{http://psi.hupo.org/ms/mzml}sourceFileList': Missing child element(s).
Expected is ( {http://psi.hupo.org/ms/mzml}sourceFile )."
This is triggered by a minOccurs
in the XSD:
<xs:complexType name="SourceFileListType">
<xs:annotation>
<xs:documentation>List and descriptions of the source files this mzML document was generated or derived from</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element minOccurs="1" maxOccurs="unbounded" name="sourceFile" type="dx:SourceFileType"/>
</xs:sequence>
<xs:attribute name="count" type="xs:nonNegativeInteger" use="required">
<xs:annotation>
<xs:documentation>Number of source files used in generating the instance document.</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>
I am wondering whether that is actually a schema error:
if there are no sourceFile
s, then count="0"
, which sounds right,
count=0
is valid, but violates minOccurs="1"
So name="count" type="xs:nonNegativeInteger"
should be constraint by a <minInclusive value='1'/>
(https://www.w3.org/TR/xmlschema-2/#rf-minInclusive)
or the sourceFileList
would have to be optional. If sourceFileList
became optional,
existing files will remain valid in such an updated schema.
Similar issues might be lurking in other *ListTypes
, but I haven't checked yet.
Yours, Steffen
Hi,
I wonder where can I find the mzML validator now.
Thanks
Da
Please update:
http://psidev.info/files/ms/mzML/xsd/mzML1.1.1_idx.xsd
to point to
https://raw.githubusercontent.com/HUPO-PSI/mzML/master/schema/schema_1.1/mzML1.1.1_idx.xsd
and:
http://psidev.info/files/ms/mzML/xsd/mzML1.1.2_idx.xsd
to point to:
https://raw.githubusercontent.com/HUPO-PSI/mzML/master/schema/schema_1.1/mzML1.1.2_idx.xsd
(note very subtle differences in 1s and 2s)
I'm writing a .mzML parser and was sitting down to write unit tests, and I noticed that all provided example files are from versions before 1.0.0. The first issue I've noticed is that I can't then test the cv "id" attribute property, as they seem to use the (what I believe to be depricated) "cvLabel" attribute.
As I don't believe I have an example that covers all use cases, it would be beneficial if someone could provide [an] example file[s] for at the very least version 1.1.0.
Hi all,
One thing I'd like to bring up as an idea:
since there is migratory adaption to be made by most CV consumers due to the GitHub move anyway,
might it not also be good to create an own repository for the CV?
-It is not only used in the mzML domain (predominantly though), so it is not the most intuitive place to look for it.
1- We may have a readme.md for the repository or even a github page
2- And CV issues would not clog the mzML repository
3- It would make it way easier to propose changes (fork & branch + change -> pullrequest: github shows the diff nicely rendered on the pullrequest page, where additional comments and discussion can be made.)
furthermore, a small remark, if you want to use direct web file access via your programs
https://raw.githubusercontent.com/HUPO-PSI/mzML/master/cv/psi-ms.obo
might be the more robust choice. Or https://raw.githubusercontent.com/HUPO-PSI/ControlledVocabulary/master/... if we dare this additional step.
In any case, thanks for the work on the CV, Gerhard!
best,
@mwalzer
Hi,
We have an issue with an msconvert converted mzML file obtained from an Orbitrap Elite (software versions below).
In the mzML there are three <instrumentConfigurations>
, one for the Orbitrap, one for the ion trap, and finally one for the UV detector. There are <spectrum>
with MS data, and others with UV/PDA data in the mzML, referencing the respective instrument configurations.
OpenMS FileInfo validation complains with
Validating mzML file against XML schema version 1.1.0
Validation error in file 'LAA_MM8_nFS.mzML' line 78 column 25:
element 'detector' is not allowed for content model '(source+,analyzer+,detector+)'
Failed - errors are listed above!
which is because the UV/PDA does not really come with a source and analyser component.
=> That looks like an issue with the mzML specification (or documentation) to me.
I'd like to start discussing possible solutions:
(source+,analyzer+,detector+)'
source+,analyzer+'
but require their presence to make validation happy.Ideas ?
Yours,
Steffen
mzML XSD with componentList
definition:
mzML/schema/schema_1.1/mzML1.1.1.xsd
Line 333 in 81e0145
The instrument components contain 0:n cvParams
, so they could be left empty:
mzML/schema/schema_1.1/mzML1.1.1.xsd
Line 137 in 81e0145
<softwareList count="2">
<software id="Xcalibur" version="2.7.0 SP1">
<cvParam cvRef="MS" accession="MS:1000532" name="Xcalibur" value=""/>
</software>
<software id="pwiz" version="3.0.22242">
<cvParam cvRef="MS" accession="MS:1000615" name="ProteoWizard software" value=""/>
</software>
</softwareList>
...
<instrumentConfiguration id="IC3">
<referenceableParamGroupRef ref="CommonInstrumentParams"/>
<componentList count="1">
<detector order="1">
<cvParam cvRef="MS" accession="MS:1000621" name="photodiode array detector" value=""/>
</detector>
</componentList>
Raw data available at https://drive.google.com/file/d/1EEUO_F1X1PLYe10qsKYTNwRO5FnHkNqu/view?usp=sharing
mzML at https://drive.google.com/file/d/1ogA7lIfeYAZKQ7vdwA3L-jeu9Gbd8W9v/view?usp=sharing
Software/Version: Xcalibur 2.2 - Qual Browser Thermo Xcalibur 2.2 SP1.48 (Analysis), Orbitrap Elite 2.7 - LTQ Tune Plus Version 2.7.0.1103 SP1 (Control)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.