hupo-psi / mztab Goto Github PK
View Code? Open in Web Editor NEWmzTab Reporting MS-based Proteomics and Metabolomics Results
Home Page: https://hupo-psi.github.io/mzTab
mzTab Reporting MS-based Proteomics and Metabolomics Results
Home Page: https://hupo-psi.github.io/mzTab
Uploading peptides_1_1_0.pride.mztab.zip…
In the case of Identification Complete mzTab files, the numbers of protein
columns grow very fast because of the mandatory fields referencing ms run.
This situation is not very common but you can find it when converting a
mzIdentML file generated with a tool like pep2pro to mztab.
As a temporary solution the file can be generated as a Identification Summary
because in this case these fields are not mandatory.
Example:
20 ms_run
2 protein_search_engine_score
-mandatory columns in an Identification Complete mzTab
ms_run/protein_search_engine_score unrelated = 10 columns
-best_search_engine_score[1-n] = num protein_search_engines_score = 2 columns
-search_engine_score[1-n]_ms_run[1-n] = num protein_search_engines_score x num
ms_run = 40 columns
-num_psms_ms_run[1-n] = num ms_run = 20 columns
-num_peptides_distinct_ms_run[1-n] = num ms_run = 20 columns
-num_peptide_unique_ms_run[1-n] = num ms_run = 20 columns
Protein section total columns = 112 columns
Original issue reported on code.google.com by noedelta
on 9 Oct 2014 at 3:13
For GC and HPLC, derivatization is often applied in order to specifically target compounds that are otherwise hard to measure at all, being non-volatile or otherwise chemically / phyiscally suboptimally suited for the separation method and to increase ionization efficiency and selectivity for subsequent MS analysis [1,2].
For GC, the primary derivatization methods are
For HPLC, some examples can be found for the following methods:
[1] Qi et al., Derivatization for liquid chromatography-mass spectrometry; TrAC Trends in Analytical Chemistry. 59. . 10.1016/j.trac.2014.03.013.
[2] Halket et al.; Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS, Journal of Experimental Botany, Volume 56, Issue 410, 1 January 2005, Pages 219–243, https://doi.org/10.1093/jxb/eri069
Would be good to have examples showing how the MetaData section of mztab could link to an ISA-TAB for a richer description of experimental design. @rsalek
In the metabolomics 1.1-draft, there is a plan to use a prefix system before identifiers to mark which database the given ID has come from, where the prefix is explained in the header.
Propose same change for Protein and peptide table for "Database" and "Database version".
Metadata would say following:
database[1-n] "UniProt human"
database[1-n]-prefix "uh"
database[1-n]-version "v82"
... accession .....
PRT uh:P678435
At the moment, we have proposed e.g. small_molecule-database[1-n], but the model can be the same for both kinds of databases, no need to split into two models.
Further to this point, taxid and species seem to be null in almost all examples, since this information may not be available to most export software (especially with customised databases). Propose to suggest that these are optional columns or list possible species contained within database as part of the header.
What steps will reproduce the problem?
1. java -cp mzTabCLI.jar uk.ac.ebi.pride.jmztab.MZTabCommandLine -convert
inFile=/some/absolute/path/input.mzid format=MZIDENTML -outFile
/some/absolute/path/output.mzTab
What is the expected output? What do you see instead?
Expected = Successful conversion from mzid to mztab.
Actual =
Exception in thread "main" java.lang.IllegalStateException: XML File to index
does not exist: /current/working/path/./some/absolute/path/input.mzid
Apparently, the command-line converter attempts to resolve the argument file
path relative to the current working directory. Not only does this break the
use case of absolute paths, but it should not even be necessary in Java. It
should be sufficient to just literally pass the String argument value from the
command line to a File constructor, and then test for existence and readability
of the input file.
Original issue reported on code.google.com by [email protected]
on 28 Oct 2014 at 6:09
I have created a fork of the currently latest svn respository version of jmztab
in order to
make it more modular and to cut down the dependencies to the absolute necessary
ones. This may be especially useful for developers who do not need the
peptide/proteomics parts of jmztab, but rather only the small molecules part,
as in my case.
I therefore separated the model from the utils packages, as well as providing
separate modules for the pride-converter, the cli, and the gui.
Building the cli or the gui distributions can now be triggered by using maven
profiles.
Would be great if someone else found this useful.
You can find the git repository here:
github.com/nilshoffmann/jmztab
Original issue reported on code.google.com by [email protected]
on 21 Feb 2014 at 4:48
This issue was raised in #58
What steps will reproduce the problem?
1. In the validation seems like the order of columns do not matter, for example
when you have a secptra_ref after the assays and study variables the parsers do
not give any error, but when you try to retrieve the data, it does not provide
any data. The problem must be related with the the order of the mandatory
columns.
2.
3.
What is the expected output? What do you see instead?
Please use labels and text to provide additional information.
Original issue reported on code.google.com by ypriverol
on 29 Sep 2014 at 2:04
change:
protein_search_engine_score[1-n] SC SC
peptide_search_engine_score[1-n] SC SC
psm_search_engine_score[1-n] SC SC
smallmolecule_search_engine_score[1-n] SC SC
to :
protein_search_engine_score[1-n] SC (if protein section presen) SC (if protein
section presen)
peptide_search_engine_score[1-n] SC (if peptide section presen) SC (if
peptidesection presen)
psm_search_engine_score[1-n] SC (if psm section presen) SC (if psm section
presen)
smallmolecule_search_engine_score[1-n] SC (if smallmolecule section presen) SC
(if smallmolecule section presen)
in table and detailed specification
Original issue reported on code.google.com by [email protected]
on 15 Jul 2014 at 12:20
It would be interesting to generate optional columns with null value by default.
What is the expected output? What do you see instead?
Please use labels and text to provide additional information.
Original issue reported on code.google.com by ypriverol
on 17 Oct 2014 at 3:06
Opening an issue here for discussion of Complete vs Summary and Quant vs ID as we want to encode it for mzTab 1.1
In brief - in metabolomics, there is no need of the ID only workflow.
Complete vs Summary is much simpler in the 1.1-metabolomics draft, 3 tables (SML, SMF, SME) vs 1 table (SML).
Reading back the 1.0 specs, the proteomics complete vs summary, quant vs ID split looks over-complicated to me. Plus if we want to release the metabolomics update, we may need to make breaking changes to 1.0.
As such, feels like we need to decide whether we can/should revisit the proteomics part now, or just make 1.1 a metabolomics only branch of the standard.
I've added a powerpoint with some items for discussion here: https://github.com/HUPO-PSI/mzTab/blob/master/specification_document/1_1_draft_specs/Version11_design_considerations.pptx
Please take a look. Would be good if @javizca could take a look, since I know you're away for the workshop coming up
Currently, fraction support is rather limited.
One possibility to define the fraction number would be adding additional meta data.
Following up on discussions on the PSI meeting.
The specification document seems to be wrong/not clear in this regards.
Having one ms_run linking to several files is problematic because linking to source spectra is not easily possible.
Having one assay linking to multiple ms_runs might help but is currently not supported in the standard.
Hi, in the mzML and other formats, we have a mappingfile.xml
,
which specifies which CV Terms are allowed in place of some cvParam
.
We should be able to specify that e.g. a
https://github.com/HUPO-PSI/mzTab/blob/master/specification_document/1_1_draft_specs/mzTab_format_specification_1_1-M_draft.adoc#6217-quantification_method
must be a child of
https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMS_1001833
This will require a new filed in the spec doc, and the validator should check that.
Yours, Steffen
What steps will reproduce the problem?
1. Unknow modifications in MS experiments exporter from mzidentml should be
converted using the CHEMOD notations. The converter should be updated.
2.
3.
What is the expected output? What do you see instead?
Please use labels and text to provide additional information.
Original issue reported on code.google.com by ypriverol
on 29 Sep 2014 at 2:14
mandatory reporting of study_variable columns prior to statistical downstream processing might not fit well to data processing workflows.
Maybe relaxing the requirement of summary files to be also able to report assays only yields cleaner files. Otherwise, assay information needs to be represented in study_variables columns that might not match to actual study_variables (call it a hack).
While drafting metabolomics1.1, we added to the specs whether a given value was nullable. This seems really useful to implementers.
The 1.0 specs say the following
In general, “null” values SHOULD not be given within any column of a “Complete” file if the information is available.
But this is never really used in practice, since there are lots of cases where information is unknown to export software or lazy export writers don't want to locate something difficult (perfectly reasonably).
I vote for adding isNullable to each data type in the specs (and not separating by Complete or Summary - but that's another discussion).
Hi, I think we discussed this some time, but I couldn't find it,
and it doesn't harm to document here. I need advice how to represent XCMS results in mzTab-1.1
So, the SML has the summary of all grouped features after running group(xset)
(i.e. a wide matrix). But where to put the peak picking results, which you get from peaks(xset)
(i.e. the tall matrix) ?
IIRC we suggested that for file/sample no. 1 we'd have SMF
with abundance_assay[1]=value
and abundance_assay[2..N]=NULL
,
The issue I have with that approach is that we're encoding the fact
that this encodes the tall matrix in the pattern of values. I'd prefer
to either mention that in the MTD, or have just one column abundance_assay
and another column assay_name
referencing for which assay this is the abundance.
Thoughts ? Yours, Steffen
> head(peaks(xset)[])
mz mzmin mzmax rt rtmin rtmax into intf maxo maxf sample
[1,] 200.1000 200.1 200.1 2928.610 2912.961 2942.695 147887.53 290506.9 9687 15899.054 1
[2,] 201.0638 201.0 201.1 2531.112 2515.463 2549.892 204572.42 280386.0 7726 13300.725 1
[3,] 205.0000 205.0 205.0 2784.635 2770.550 2800.284 1778568.94 3610059.7 84280 195026.431 1
[4,] 205.9819 205.9 206.0 2786.200 2772.115 2800.284 237993.62 448580.0 10681 23860.099 1
[5,] 207.0821 207.0 207.1 2712.647 2698.562 2726.731 380873.05 730980.9 18800 40065.736 1
[6,] 208.0671 208.0 208.1 2640.659 2625.009 2656.308 96070.72 150033.4 4112 7560.078 1
> tail(peaks(xset)[])
mz mzmin mzmax rt rtmin rtmax into intf maxo maxf sample
[4771,] 596.3574 596.3 596.4 3825.328 3811.244 3839.413 511236.1 1106531.3 25928 59878.60 12
[4772,] 596.3193 596.3 596.4 3615.625 3601.540 3628.144 249717.7 507054.4 14174 28983.33 12
[4773,] 597.3714 597.3 597.4 3825.328 3809.679 3840.978 206925.5 388002.7 9424 19741.25 12
[4774,] 597.3132 597.3 597.4 2803.414 2789.329 2817.498 122272.2 288468.1 7136 16469.23 12
[4775,] 599.2920 599.2 599.3 3662.573 3651.618 3676.658 236041.1 377861.0 12721 22822.38 12
[4776,] 599.3033 599.3 599.4 3615.625 3601.540 3628.144 341495.2 604176.8 17448 35206.17 12
> head(peakTable(xset)[,c(2,3,5,6,10:21)])
mzmin mzmax rtmin rtmax ko15 ko16 ko18 ko19 ko21 ko22 wt15 wt16 wt18 wt19 wt21 wt22
1 200.1000 200.1000 2876.967 2931.740 147887.5 451600.7 65290.38 NA 91635.45 162012.4 175177.1 82619.48 NA 69198.22 153273.5 98144.28
2 205.0000 205.0000 2784.635 2795.591 1778568.9 1567038.1 1482796.38 1039129.8 1223132.35 1072037.7 1950287.5 1466780.60 1572679.16 1275312.76 1356014.3 1231442.16
3 205.9786 206.0023 2784.635 2795.591 237993.6 269714.0 201393.42 150107.3 176989.65 156797.0 276541.8 222366.15 211717.71 186850.88 188285.9 172348.76
4 207.0440 207.1000 2712.647 2726.731 380873.0 460629.7 351750.14 219288.0 286848.56 235022.6 417169.6 324892.46 277990.70 220972.35 252874.0 236728.16
5 219.0488 219.1000 2518.592 2529.547 235544.9 173623.4 NA NA 185792.43 174458.8 244584.5 161184.05 72029.38 NA 238194.4 173829.95
6 231.0000 231.0812 2509.202 2535.807 NA NA 222609.07 286232.1 435094.49 NA NA NA NA 240261.21 201316.2 179437.72
> tail(peakTable(xset)[,c(2,3,5,6,10:21)])
mzmin mzmax rtmin rtmax ko15 ko16 ko18 ko19 ko21 ko22 wt15 wt16 wt18 wt19 wt21 wt22
402 595.3000 595.3411 3603.105 3664.138 234262.30 339975.4 NA NA 276909.61 NA NA NA NA NA 256556.99 1041407.3
403 595.2000 595.2633 2994.338 3006.858 178436.86 257167.1 NA NA NA NA 857615.4 NA 195758.33 NA NA 493703.1
404 595.8645 596.0487 3707.957 3736.128 NA 381904.3 65769.92 61369.30 NA NA NA 94422.29 47342.49 NA 52713.48 NA
405 596.3000 596.3645 3797.160 3831.590 195519.46 4482351.4 273204.87 161920.11 171784.27 137865.6 NA 620204.70 293010.81 142623.14 473434.14 511236.1
406 597.3073 597.3714 3798.724 3831.590 67248.65 1671423.3 104149.97 66215.79 67707.45 NA NA 230637.90 124673.31 59829.39 180230.29 206925.5
407 598.3695 598.4778 3700.133 3726.738 NA 592732.1 51683.75 43868.52 NA NA NA 93591.09 NA NA 55230.02 NA
What steps will reproduce the problem?
1. I would be interesting to define the proper CVTerm in the PSI-MS for
different reagent to be use for search
engines. This Param should be use in the metadata:
MTD assay[1]-quantification_reagent
We can use the PRIDE Terms like http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=PRIDE&termId=PRIDE:0000433&termName=Reagents%20used%20in%20Labeled%20Methods
but we should move some of them to PSI-MS.
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by ypriverol
on 16 Sep 2014 at 2:51
Currently, the multiplicity for [UNIT_ID]-uri is set to 0..*
Does it make sense to have multiple URIs for a single unit? Personally, I think
it should be changed to 0..1.
Original issue reported on code.google.com by [email protected]
on 27 Jun 2011 at 3:37
We should have a way to represent Protein Ambiguity Groups in mzTab. My
suggestions is that we can add an optional columns with the CVterm MS:1001591
which is the anchor protein. If we use this way, we will know which is the
anchor protein for the group and to which group bellows each protein.
Best regards
Original issue reported on code.google.com by ypriverol
on 17 Oct 2014 at 3:13
There is some concern whether GO should be given an extra column as it is just
one of several systems to classify proteins. F.e. some people might want to add
Reactome pathway accessions instead.
Original issue reported on code.google.com by [email protected]
on 4 Jul 2011 at 2:27
We implicitly refer to the most commonly used CV in the spec. doc.
With:
MTD {UNIT_ID}-used_obo
Description: A Connection between the used CV token and the URI
Type: String
Multiplicity: 0 .. *
Example:
MTD PRIDE_1234-used_obo
MS:http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVoc
abulary/psi-ms.obo
MTD PRIDE_1234-used_obo
PRIDE:http://ebi-pride.googlecode.com/svn/trunk/pride-core/schema/pride_cv.obo
Original issue reported on code.google.com by [email protected]
on 11 Feb 2013 at 5:05
One assay is reported for all fractions. This does not allow to model fractionated design with channel swaps between fractions (though I don't know how relevant this is).
Example: consider Assay 1 (A1). It is bound to one channel (e.g., 114 of an iTRAQ experiment)
A1 iTRAQ reagent 114
At the same time it is bound to 3 ms_runs (one for each fraction)
A1 F1,F2,F3
From a recent Email exchange with Jürgen:
I have question to the summary section. I am referring to the example MTBLS263.mztab. There, in the
first line of the summary section, is Creatinine found with the SML_ID 469. As far as I can see, this molecule
has been found by [M+H]+ and [M+Na]+ adducts, and has a theoretical mass of 113.0589. However,
there is a column “exp_mass_to_charge”, which I assume is the experimental mass to charge ratio.How is this value calculated when there is more than one adduct? Is it the mean of the neutral masses
of both adducts? Or is it weighted mean according to the abundance of the found hits?
Quoting the draft 1.1 standard:
The experimental mass to charge of the small molecule’s primary adduct form (e.g. mean m/z across assays), assumed by default to be the protonated (positive mode) or de-protonated (negative mode), otherwise the first reported adduct under the adduct ions column. For GC-MS approaches, this MAY be the m/z of the ion used for quantification.
We could consider to also allow multiple masses in this field, each separated by '|', following the same order as in adduct_ions.
For the Example, MTBLS263.mztab, there seems to be an inconsistency regarding exp_mass_to_charge:
According to the definition you sent me, it looks like the protonated form
has to be reported in the MTBLS263 example in in that column which would be around 114 m/z,
however, a value of 113.0582 is reported which can only be the neutral mass.
I would argue, that exp_mass_to_charge should only report the actually measured mass (adducts, derivatized ...)
Purpose of code changes on this branch:
Modularization of the monolithic maven project. The following modules were
added:
jmztab-model
jmztab-util
jmztab-cli
jmztab-gui
jmztab-converter-pride
The artifact Ids have been changed for now for disambiguation reasons, e.g. for
the jmztab-model module, the artifact id is "jmztab-modular-model", where
"jmztab-modular" is the artifact id of the parent pom.xml of the multi-module
project.
The artifact Ids are preliminary proposals.
When reviewing my code changes, please focus on:
I checked that all test cases still run, however, please stay alert for
possible issues. The main benefit of this modularization is that users do not
have to include all dependencies since all modules have clear and minimal
dependencies. This allows selective inclusion of the cli, gui, and
converter-pride modules, for those who need them. The model module now only
contains domain-dependent code. The util module contains the former
...jmztab.utils package code, expect for the pride converter related code,
which now resides in its own module.
After the review, I'll merge this branch into:
jmztab/trunk
Original issue reported on code.google.com by [email protected]
on 28 Mar 2014 at 2:16
Hi, as part of the mzTab-M discussions we are using
https://www.ebi.ac.uk/metabolights/MTBLS263 as an example.
There are some aspects unclear to me:
Source
only has one value -> Just one person ? Everything pooled in one vial ?Preparation replicate
s1
three injections were performed.From my understanding of the ISA-Tab approach, there should be only four rows in the sample table, with four distinct sample names, and then six rows in the a_assay.txt, where three rows would have the same sample name but three different MS Assay Names
.
Yours, Steffen
Hi,
I put together a RegEx for the way could encode adducts and several possibilities arise:
https://regex101.com/r/9gcJZG/2
if we want to encode: [kM+nAdduct]charge(+|-)
e.g.:
[4M+2NH4]4+
[M-H]1-
we could use:
\[\d*M(\+|\-)\d*(([A-Z]\d*)+)\]\d+(\+|\-)
if we also want to allow:
[M-H]-
we could use:
\[\d*M(\+|\-)\d*(([A-Z]\d*)+)\]((\+|\-)|\d+(\+|\-))
Open question: How to encode multiple adducts on the same molecule: [M+Na+CH3OH]+ ?
The spec could be clearer (e.g., by adding more examples)
The spec is also a bit weak on the SME section that seems to contain quite a few copy and paste artefacts from the SMF section. I think this needs to be discusses and reworked a bit.
In the protein section search_engine_score[1-n]_ms_run[1-n] is the protein score of a protein for an individual ms run.
This does not make sense if the inference (and probability score) is calculated based on IDs from several runs.
Given that this also leads to a ton of columns I would vote for removing it or to make it optional in subsequent versions of the format.
doesn't make sense to offload this to sample annotation. correct would be to annotate this at ms_run level
MTD id_confidence_measure could be renamed.
For one, id is used several times in mzTab in different contexts, thus it would be better to not abbreviate it.
Why not use identification_score?
Higher confidence = better, scores may also have the opposite direction e.g. lower p_value = better
We do not specify the direction of the score, Determining this manually may be super difficult for simple parsers (e.g., require additional fields in an obo lookup (or are we even not restricted to obo - then it would be even more difficult?)). Note: in OpenMS we made good experience with storing the score direction (higher is better / worse) along with the score type.
to which section is this score applied? this should be clear from the MTD entry
Example needs update to metabolomics use case
The example file "PSM_SQ.mzTab" contains the PSM header "search_engine_score"
which is stored as a string. But in the specification version 1.0 section 6.5.8
indicates that it should be in the format: "search_engine_score[1-n]" and type
of double. This example file, and others , do not follow this specification. I
am assuming the specs are correct and these example files are
invalid/out-of-date.
Original issue reported on code.google.com by [email protected]
on 14 Jul 2014 at 4:58
In the small molecule section:
best_search_engine_score[1-n] and search_engine_score[1-n]_ms_run[1-n] still
have "Parameter List" as type.
This should be a double like in the other sections.
Original issue reported on code.google.com by [email protected]
on 18 Jan 2015 at 5:07
In mzML instruments can have multiple sources, analyzers and detectors.
Currently we've only defined one cvParam for these attributes.
I'd suggest to change that to any number of "|" delimited params.
Original issue reported on code.google.com by [email protected]
on 27 Jun 2011 at 10:46
In the current specification it's stated (page 26): "The protein's accession
the peptide is associated with. In case no protein section is present in the
file or the peptide was not assigned to a protein the field should be filled
with “NA”."
It's not clear from this description how peptides shared by several proteins
should be treated? Should it be NA (but then "unique" column doesn't make sense
since it's true iff the accession is not NA), or should it be comma-separated
list of the protein accession codes (in this case "unique" column also looks
redundant, maybe it could be replaced by the column specifying the number of
protein peptide could be assigned to, "num_proteins_shared")?
Original issue reported on code.google.com by [email protected]
on 30 Nov 2012 at 3:04
Please post any issues you spot with examples files here
updated
In the spec document, we use_
protein-quantification-unit and protein-quantification_unit
For consistency reasons, it should be protein_quantification_unit or protein-quantification_unit
Both metagenomics, but also if you align files from samples from different species
Need some examples
What steps will reproduce the problem?
1. ms_run[1]-location in the specification allow null values but the jmzTab
library fail the testing process.
2. The Unknown modifications should be implemented in jmztab using CHEMOD
notation for converters.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by ypriverol
on 15 Sep 2014 at 3:48
Setting the PSM_ID seems not to work in 2.1.5.
When reading in a PSM line with getRecord, all PSM_IDs are null.
Also by setting the PSM_ID in a PSM created with "new PSM(metadata)" and
setting the ID via setPSM_ID does not change it.
Every time calling getPSM_ID() returns null.
Original issue reported on code.google.com by [email protected]
on 2 Apr 2014 at 3:34
There seems to be no example containing a PEP
section - would it be possible to add one?
In the Specification document (1.0 rc 5, dated 20 June 2014) there are
inconsistencies in how modifcations are listed.
In the text in section 6.2 (Metadata Section) the last bullet point references
"fixed_modification[1-n]" and "variable_modification[1-n]" but sections 6.2.24
and 6.2.27 abbreviate modification to mod.
I noticed this inconsistency as well in section 5.8.
Original issue reported on code.google.com by [email protected]
on 2 Jul 2014 at 4:25
TODO Insert some text in here about standard numerical encoding, e.g. US default style “x.x”, i.e. using a period for decimal separation and no commas to separate thousands. |
---|
@jmrein to look into it
MzTab-M uses the abbreviations id and ID in many different contexts.
I first suggested renaming:
ms_run[1-n]-id_format to ms_run[1-n]-nativeID_format or just ms_run[1-n]-nativeID
which would break compatibility with MzTab 1.0
maybe we can still check for the newly introduced ones if replacing the abbreviation is less ambiguous
What steps will reproduce the problem?
1. Param param = new UserParam("Some parameter", "\"[...]\"");
2. System.out.println(param.getValue());
What is the expected output? What do you see instead?
Expected = [...]
Actual = ...
Please use labels and text to provide additional information.
CV parameters are normally encoded in string format using the standard square
bracket-enclosed, comma-delimited tuple format:
[<label>,<accession>,<name>,<value>]
Because this format makes use of square brackets ("[]") and commas (","), these
are generally reserved characters that should not appear in the element values.
However, some CV param names are known to contain commas, e.g.:
MOD:00648 - N,O-diacetylated L-serine
Therefore, it is well-documented that a Param's "name" argument, when
containing illegal characters, should be enclosed in quotation marks ("") to
inform the parsing engine that it should not treat those characters as
delimiters in the overall parameter tuple string.
However, the Param's "value" argument is not treated in this same manner, even
though it should be, since the semantics of this element can be arbitrary and
user-defined. Currently, even when the string "value" argument is explicitly
enclosed in quotation marks, these special characters are always just stripped
out of the stored string value. This should not happen when the argument value
is enclosed in quotation marks.
Original issue reported on code.google.com by [email protected]
on 29 Sep 2014 at 11:13
Hi, in
https://github.com/HUPO-PSI/mzTab/blob/master/specification_document/1_1_draft_specs/mzTab_format_specification_1_1-M_draft.adoc#6225-study_variable_function1-n
the example has MTD small_molecule-quantification_unit [PRIDE, PRIDE:0000395, Ratio, ]
Figure 1 is currently a PNG graphics file with the figure caption being part of the graphics file.
@andrewrobertjones do you have the original source, is the schematic good as-is, or should we redo it?
Modifications in Small Molecules must have a different structure than used for
proteins / peptides.
Suggestion 1: support modifications without positional information.
Original issue reported on code.google.com by [email protected]
on 11 Nov 2011 at 11:45
If a peptide can be mapped to multiple proteins, the 1.0 specs recommend duplicating the rows, and just changing the accession. I have a strong preference to change this so that multiple accessions can be separated by semi-colons (or other second separator).
Otherwise this can cause problems for stats/visualisation or other software that wants to work with the quant data. Logic to work out duplicates would need to be encoded
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.