Coder Social home page Coder Social logo

mzidentml-validator's People

Contributors

ypriverol avatar

Watchers

 avatar

mzidentml-validator's Issues

how to recompile Validator?

Hi,

I've tried to rebuild the Validator (using Netbeans as the repository already contained project files for this IDE).

It failed with the following error:

Failed to execute goal on project mzIdentMLValidator: 
Could not resolve dependencies for project psidev.psi.pi:mzIdentMLValidator:jar:1.4.35-SNAPSHOT: 
Failed to collect dependencies at psidev.psi.tools:validator:jar:2.0.10 
-> psidev.psi.tools:ontology-manager:jar:2.0.10 
-> uk.ac.ebi.ols:ols-core:jar:1.19 
-> proteomics:proteomics-common:jar:1.5: 
Failed to read artifact descriptor for proteomics:proteomics-common:jar:1.5: 
Could not transfer artifact proteomics:proteomics-common:pom:1.5 from/to maven-default-http-blocker (http://0.0.0.0/): 

Blocked mirror for repositories: 
[
nexus-ebi-repo-old (http://www.ebi.ac.uk/intact/maven/nexus/content/repositories/ebi-repo/, default, releases+snapshots), 
nexus-ebi-release-repo (http://www.ebi.ac.uk/Tools/maven/repos/content/groups/ebi-repo/, default, releases+snapshots), 
nexus-ebi-snapshot-repo (http://www.ebi.ac.uk/Tools/maven/repos/content/groups/ebi-snapshots/, default, releases+snapshots), 
ebi-repo (http://www.ebi.ac.uk/~maven/m2repo, default, releases+snapshots), 
ibiblio-repo (http://mirrors.ibiblio.org/pub/mirrors/maven2/, default, releases+snapshots), 
java-repo (http://download.java.net/maven/2/, default, releases+snapshots)
] 
-> [Help 1]

I'm not sure how to interpret this? Are some of the dependencies missing or only accessible when inside EBI?

Can anyone give advice on how to make it compile?
Thanks,
Colin

mzIdentML validator and zero or negative FDR

The validator appears to complain about


And also about

The usually used formula for fdr-calculation with cross-links is:
FDR = (TD-DD)/TT

So there are cases where the calculated fdr can be 0 (no decoys) or negative (DD > TD). Both are not really meaningful results as an FDR but still valid results of the calculation. So I think the validator should not complain about these as errors.

mzIdentML validator: residue-pair-level score pairing and ambiguity

The validator flags residue level scores up as not paired - but this disregards ambiguity. I.e. I have a case where I have a link from peptide XXXXXKXR to PEPTIDEKR and the protein actually contains the sequence ...PEPTIDEKRPEPTIDEKR... - therefore I get an ambiguous residue pair. Now in the resulting mzIdentML contains:

  <ProteinAmbiguityGroup id="PAG_13">
    <ProteinDetectionHypothesis dBSequence_ref="dbseq_XXXXX_target" passThreshold="false" id="PAG_13_PDH_0">
      ...
      <cvParam cvRef="PSI-MS" accession="MS:1002677" name="residue-pair-level global FDR" value="151808167.a:214:0.0:true"></cvParam>
      <cvParam cvRef="PSI-MS" accession="MS:1002677" name="residue-pair-level global FDR" value="151808167.b:141:0.0:true"></cvParam>
      <cvParam cvRef="PSI-MS" accession="MS:1002677" name="residue-pair-level global FDR" value="151808167.b:149:0.0:true"></cvParam>

But the validator comes back with:

Message 14497:
    Level: ERROR
    --> Interaction score is not paired for XL interaction ID 151810610 and score 0.0 (has only 1 entries for : PAG: PAG_13 and PDH: PAG_13_PDH_0

but it is paired - only that 151808167.b: turns up twice (as that side is the ambiguous).

Validator - incorrect error about SearchType CV term

Mascot Server 2.6 and later support combined spectral library and FASTA searches. The software exports the type using a valid CV term:

    <SpectrumIdentificationProtocol id="SIP" analysisSoftware_ref="AS_mascot_server">
      <SearchType>
        <cvParam accession="MS:1002755" name="combined ms-ms + spectral library search" cvRef="PSI-MS" value="" />
      </SearchType>

Term definition in psi-ms.obo:

id: MS:1002755
name: combined ms-ms + spectral library search
def: "A combined MS2 (with fragment ions) and spectral library search." [PSI:PI]
is_a: MS:1001080 ! search type

The validator isn't happy and gives two errors:

Message 2:
    Rule ID: SearchTypeObjectRule
    Level: ERROR
    Context(/MzIdentML/AnalysisProtocolCollection/SpectrumIdentificationProtocol )
    --> At least one child term of 'search type' must occur in SearchType of the SpectrumIdentificationProtocol (id='SIP') element at /MzIdentML/AnalysisProtocolCollection/SpectrumIdentificationProtocol
    Tip: The SearchType element of SpectrumIdentificationProtocol must contain a CV term.

Message 3:
    Rule ID: SearchType_must_rule
    Level: ERROR
    Context(/searchType/cvParam/@accession ) in 2 locations
    --> The result found at: /searchType/cvParam/@accession for which the values is  ''MS:1002755'' didn't match any of the 6 specified CV terms:
  - The sole term MS:1001010 (de novo search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001031 (spectral library search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001081 (pmf search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001082 (tag search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001083 (ms-ms search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001584 (combined pmf + ms-ms search) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name

The validator is wrong. The mzIdentML 1.1 specification section 6.63 says "MUST supply a child term of MS:1001080 (search type)" and MS:1002755 is a child of MS:1001080. mzIdentML 1.2 specification section 6.66 says the same.

Validator - protein-level global FDR term

Mascot Server 2.7 and later export protein FDR using the CV term MS:1001214:

      <Threshold>
        <cvParam accession="MS:1001214" name="protein-level global FDR" cvRef="PSI-MS" value="0.0562" />
      </Threshold>
    </ProteinDetectionProtocol>

Definition in psi-ms.obo:

id: MS:1001214
name: protein-level global FDR
def: "Estimation of the global false discovery rate of proteins." [PSI:PI]
xref: value-type:xsd\:double "The allowed value-type for this CV term."
is_a: MS:1002705 ! protein-level result list statistic

mzIdentMLValidator-1.4.35-SNAPSHOT.jar doesn't accept this. The error is:

Message 1:
    Rule ID: ProteinDetectionProtocolThreshold_must_rule
    Level: ERROR
    Context(/threshold/cvParam/@accession ) in 2 locations
    --> The result found at: /threshold/cvParam/@accession for which the values is  ''MS:1001214'' didn't match any of the 5 specified CV terms:
  - Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001494 (no threshold) or any of its children. The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1002572 (protein detection statistical threshold). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1002706 (protein group-level result list statistic). The term can be repeated. The matching value has to be the identifier of the term, not its name.

The mzIdentML 1.1 specification doesn't mention protein FDR. Spec 1.2 mentions protein FDR in Threshold element in section 6.83, where one of the few examples is "MUST supply term MS:1001447 (prot:FDR threshold) only once". However, MS:1001447 is a child of MS:1002485 "protein-level statistical threshold", so maybe the validator would reject that too, and it's not appropriate for Mascot. Mascot doesn't apply a protein FDR threshold, it just reports what the FDR is.

Neither spec says anything about MS:1002705 "protein-level result list statistic". It's not clear to me if there is another place where MS:1001214 could be reported if it isn't intended to be under ProteinDetectionProtocol/Threshold.

Most files have invalid schema locations

Most mzIdentML files, including the example files, have a schemaLocation attribute like:
xsi:schemaLocation="http://psidev.info/psi/pi/mzIdentML/1.1 http://www.psidev.info/files/mzIdentML1.1.0.xsd"

That second URL should actually resolve to the schema file. Right now it's a 404. Who could fix the website so those links aren't errors? The alternative is fixing all the existing mzIdentML files that just followed the lead of the mzIdentML examples.

Validation issues for xiFDR-CrossLinkExample.mzid

ERROR: cvParam anchor protein should have a value, but it does not!
ERROR: cvParam protein-pair-level global FDR has a value, but it should not!
ERROR: cvParam residue-pair-level global FDR has a value, but it should not!
WARNING: CV term MS:1002675 ('residue-pair-level global FDR') is not in the cv
WARNING: CV term MS:1002676 ('protein-pair-level global FDR') is not in the cv
WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file'
WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein'
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.'
WARNING: MS:1002544 should be 'xi' instead of 'xiFDR'
WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'

runs much slower - is it just the debug log?

Hi,
I tested this and it compiles and runs, which is great, thanks @ypriverol.

It runs much slower than the previous version - is it just the more verbose debugging log? Sorry, I don't know to change the logging level - does someone know?

best wishes,
Colin

validator - Flaw in the rule definition: CrosslinkingPeptideModification_may_rule?

Hi,

when I switch on "show flaw errors" in the validator, it displays an error saying:

Message 5:
    Rule ID: CrosslinkingPeptideModification_may_rule
    Level: ERROR
    Context(Flaw in the rule definition: CrosslinkingPeptideModification_may_rule )
    --> Could not find property 'Peptide' of the xpath expression '/Peptide/modification/cvParam/@accession' (element position: 1) in the given object of: uk.ac.ebi.jmzidml.model.mzidml.SequenceCollection - Did you mean 'peptide' ?

what does it mean? Is something broken in the rule for CrosslinkingPeptideModification?
Best wishes,
Colin

Validator - add support for proxy servers

When I run mzIdentMLValidator-1.4.35-SNAPSHOT.jar behind a Squid HTTP proxy server, the first error is always that the application is unable to download the XSD file. Works fine when I go to a network without proxy server. Please either add support for proxy servers or ship the XSD files with the validator and avoid downloading anything.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.