hupo-psi / mzidentml-validator Goto Github PK
View Code? Open in Web Editor NEWmzidentml validator ui and command line tool
License: Apache License 2.0
mzidentml validator ui and command line tool
License: Apache License 2.0
Hi,
I've tried to rebuild the Validator (using Netbeans as the repository already contained project files for this IDE).
It failed with the following error:
Failed to execute goal on project mzIdentMLValidator:
Could not resolve dependencies for project psidev.psi.pi:mzIdentMLValidator:jar:1.4.35-SNAPSHOT:
Failed to collect dependencies at psidev.psi.tools:validator:jar:2.0.10
-> psidev.psi.tools:ontology-manager:jar:2.0.10
-> uk.ac.ebi.ols:ols-core:jar:1.19
-> proteomics:proteomics-common:jar:1.5:
Failed to read artifact descriptor for proteomics:proteomics-common:jar:1.5:
Could not transfer artifact proteomics:proteomics-common:pom:1.5 from/to maven-default-http-blocker (http://0.0.0.0/):
Blocked mirror for repositories:
[
nexus-ebi-repo-old (http://www.ebi.ac.uk/intact/maven/nexus/content/repositories/ebi-repo/, default, releases+snapshots),
nexus-ebi-release-repo (http://www.ebi.ac.uk/Tools/maven/repos/content/groups/ebi-repo/, default, releases+snapshots),
nexus-ebi-snapshot-repo (http://www.ebi.ac.uk/Tools/maven/repos/content/groups/ebi-snapshots/, default, releases+snapshots),
ebi-repo (http://www.ebi.ac.uk/~maven/m2repo, default, releases+snapshots),
ibiblio-repo (http://mirrors.ibiblio.org/pub/mirrors/maven2/, default, releases+snapshots),
java-repo (http://download.java.net/maven/2/, default, releases+snapshots)
]
-> [Help 1]
I'm not sure how to interpret this? Are some of the dependencies missing or only accessible when inside EBI?
Can anyone give advice on how to make it compile?
Thanks,
Colin
The validator appears to complain about
And also about
The usually used formula for fdr-calculation with cross-links is:
FDR = (TD-DD)/TT
So there are cases where the calculated fdr can be 0 (no decoys) or negative (DD > TD). Both are not really meaningful results as an FDR but still valid results of the calculation. So I think the validator should not complain about these as errors.
The validator flags residue level scores up as not paired - but this disregards ambiguity. I.e. I have a case where I have a link from peptide XXXXXKXR
to PEPTIDEKR
and the protein actually contains the sequence ...PEPTIDEKRPEPTIDEKR...
- therefore I get an ambiguous residue pair. Now in the resulting mzIdentML contains:
<ProteinAmbiguityGroup id="PAG_13">
<ProteinDetectionHypothesis dBSequence_ref="dbseq_XXXXX_target" passThreshold="false" id="PAG_13_PDH_0">
...
<cvParam cvRef="PSI-MS" accession="MS:1002677" name="residue-pair-level global FDR" value="151808167.a:214:0.0:true"></cvParam>
<cvParam cvRef="PSI-MS" accession="MS:1002677" name="residue-pair-level global FDR" value="151808167.b:141:0.0:true"></cvParam>
<cvParam cvRef="PSI-MS" accession="MS:1002677" name="residue-pair-level global FDR" value="151808167.b:149:0.0:true"></cvParam>
But the validator comes back with:
Message 14497:
Level: ERROR
--> Interaction score is not paired for XL interaction ID 151810610 and score 0.0 (has only 1 entries for : PAG: PAG_13 and PDH: PAG_13_PDH_0
but it is paired - only that 151808167.b:
turns up twice (as that side is the ambiguous).
Mascot Server 2.6 and later support combined spectral library and FASTA searches. The software exports the type using a valid CV term:
<SpectrumIdentificationProtocol id="SIP" analysisSoftware_ref="AS_mascot_server">
<SearchType>
<cvParam accession="MS:1002755" name="combined ms-ms + spectral library search" cvRef="PSI-MS" value="" />
</SearchType>
Term definition in psi-ms.obo:
id: MS:1002755
name: combined ms-ms + spectral library search
def: "A combined MS2 (with fragment ions) and spectral library search." [PSI:PI]
is_a: MS:1001080 ! search type
The validator isn't happy and gives two errors:
Message 2:
Rule ID: SearchTypeObjectRule
Level: ERROR
Context(/MzIdentML/AnalysisProtocolCollection/SpectrumIdentificationProtocol )
--> At least one child term of 'search type' must occur in SearchType of the SpectrumIdentificationProtocol (id='SIP') element at /MzIdentML/AnalysisProtocolCollection/SpectrumIdentificationProtocol
Tip: The SearchType element of SpectrumIdentificationProtocol must contain a CV term.
Message 3:
Rule ID: SearchType_must_rule
Level: ERROR
Context(/searchType/cvParam/@accession ) in 2 locations
--> The result found at: /searchType/cvParam/@accession for which the values is ''MS:1002755'' didn't match any of the 6 specified CV terms:
- The sole term MS:1001010 (de novo search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001031 (spectral library search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001081 (pmf search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001082 (tag search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001083 (ms-ms search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001584 (combined pmf + ms-ms search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name
The validator is wrong. The mzIdentML 1.1 specification section 6.63 says "MUST supply a child term of MS:1001080 (search type)" and MS:1002755 is a child of MS:1001080. mzIdentML 1.2 specification section 6.66 says the same.
Mascot Server 2.7 and later export protein FDR using the CV term MS:1001214:
<Threshold>
<cvParam accession="MS:1001214" name="protein-level global FDR" cvRef="PSI-MS" value="0.0562" />
</Threshold>
</ProteinDetectionProtocol>
Definition in psi-ms.obo:
id: MS:1001214
name: protein-level global FDR
def: "Estimation of the global false discovery rate of proteins." [PSI:PI]
xref: value-type:xsd\:double "The allowed value-type for this CV term."
is_a: MS:1002705 ! protein-level result list statistic
mzIdentMLValidator-1.4.35-SNAPSHOT.jar doesn't accept this. The error is:
Message 1:
Rule ID: ProteinDetectionProtocolThreshold_must_rule
Level: ERROR
Context(/threshold/cvParam/@accession ) in 2 locations
--> The result found at: /threshold/cvParam/@accession for which the values is ''MS:1001214'' didn't match any of the 5 specified CV terms:
- Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.
- The sole term MS:1001494 (no threshold) or any of its children. The term can be repeated. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1002572 (protein detection statistical threshold). The term can be repeated. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1002706 (protein group-level result list statistic). The term can be repeated. The matching value has to be the identifier of the term, not its name.
The mzIdentML 1.1 specification doesn't mention protein FDR. Spec 1.2 mentions protein FDR in Threshold element in section 6.83, where one of the few examples is "MUST supply term MS:1001447 (prot:FDR threshold) only once". However, MS:1001447 is a child of MS:1002485 "protein-level statistical threshold", so maybe the validator would reject that too, and it's not appropriate for Mascot. Mascot doesn't apply a protein FDR threshold, it just reports what the FDR is.
Neither spec says anything about MS:1002705 "protein-level result list statistic". It's not clear to me if there is another place where MS:1001214 could be reported if it isn't intended to be under ProteinDetectionProtocol/Threshold.
think i saw it doing this, just noting it here.
thought it was result of mistake in schema but actually schema does require it.
requirement is inherited from ExternalDataType
https://github.com/HUPO-PSI/mzIdentML/blob/master/schema/mzIdentML1.2.0.xsd#L1099
https://github.com/HUPO-PSI/mzIdentML/blob/master/schema/mzIdentML1.2.0.xsd#L1487-L1508
Most mzIdentML files, including the example files, have a schemaLocation attribute like:
xsi:schemaLocation="http://psidev.info/psi/pi/mzIdentML/1.1 http://www.psidev.info/files/mzIdentML1.1.0.xsd"
That second URL should actually resolve to the schema file. Right now it's a 404. Who could fix the website so those links aren't errors? The alternative is fixing all the existing mzIdentML files that just followed the lead of the mzIdentML examples.
ERROR: cvParam anchor protein should have a value, but it does not!
ERROR: cvParam protein-pair-level global FDR has a value, but it should not!
ERROR: cvParam residue-pair-level global FDR has a value, but it should not!
WARNING: CV term MS:1002675 ('residue-pair-level global FDR') is not in the cv
WARNING: CV term MS:1002676 ('protein-pair-level global FDR') is not in the cv
WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file'
WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein'
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.'
WARNING: MS:1002544 should be 'xi' instead of 'xiFDR'
WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'
Hi,
I tested this and it compiles and runs, which is great, thanks @ypriverol.
It runs much slower than the previous version - is it just the more verbose debugging log? Sorry, I don't know to change the logging level - does someone know?
best wishes,
Colin
Hi,
when I switch on "show flaw errors" in the validator, it displays an error saying:
Message 5:
Rule ID: CrosslinkingPeptideModification_may_rule
Level: ERROR
Context(Flaw in the rule definition: CrosslinkingPeptideModification_may_rule )
--> Could not find property 'Peptide' of the xpath expression '/Peptide/modification/cvParam/@accession' (element position: 1) in the given object of: uk.ac.ebi.jmzidml.model.mzidml.SequenceCollection - Did you mean 'peptide' ?
what does it mean? Is something broken in the rule for CrosslinkingPeptideModification?
Best wishes,
Colin
When I run mzIdentMLValidator-1.4.35-SNAPSHOT.jar behind a Squid HTTP proxy server, the first error is always that the application is unable to download the XSD file. Works fine when I go to a network without proxy server. Please either add support for proxy servers or ship the XSD files with the validator and avoid downloading anything.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.