hupo-psi / proxi-schemas Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 3.0 4.26 MB

ProXI: Schema definitions for the Proteomics eXpression Interface

JavaScript 100.00%

api proteomexchange proteomics proteomics-data schemas swagger

proxi-schemas's People

Contributors

Stargazers

Watchers

Forkers

edeutsch orenogithub ralfg

proxi-schemas's Issues

Served datasets can be PX/RPXD or internal accession datasets (MSVnn and PAenn)

In PROXI, when referring to served datasets at any resource, for any dataset with a PXD/RPXD, prefer to refer to it with the PXD/RPXD.

For any datasets without a PXD/RPXD, it is fine to use local identifiers such as MSVnnnn or PAennnnnn

Validator for the endpoint of proxi

We need to have a validator for proxi endpoints. @ypriverol

In the PSM object the score should be plural

The searchEngineScore in psm should be plural because it is a list

searchEngineScore -> searchEngineScores

No attributes for spectra at PRIDE?

This URL returns a spectrum:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850:[UNIMOD:214]YYWGGLYSWDMSK[UNIMOD:214]/2

But there are no attributes.

They are not formally required, but seems like some basic metadata might be available?

Spectra attributes. Verbose or compact?

According to the current schema, a Spectrum from /spectra has something like this:

{
"attributes": [
{
"accession": "MS:1000744",
"cv_param_group": null,
"name": "selected ion m/z",
"value": "473.1234"
},
{
"accession": "MS:1000041",
"cv_param_group": null,
"name": "charge state",
"value": "2"
},
...
}

This is nice, but quite verbose. And what if the value is another CV term?
Over in PSI Spectra libraries format land:
http://proteomecentral.proteomexchange.org/cgi/spectra?usi=mzspec:PXL000001:05-29-2014:index:5001&output_format=json
I started using a more compact notation, e.g.:

{
"attributes": [
[
"MS:1000041|charge state",
"2"
],
[
"MS:1000744|selected ion m/z",
"847.417"
],
[
"MS:1009030|representative spectrum type",
"MS:1009032|consensus spectrum"
],
[
"MS:1009040|number of enzymatic termini",
2,
"1"
],
[
"MS:1001045|cleavage agent name",
"MS:1001251|Trypsin",
"1"
],

The first item in each sublist is the key (accession|name), the second item is the value, and the third optional item is the cv_param_group

More cryptic for sure. But a lot less verbose and a bit more graceful when the value is the cvParam.

What thinks we?

Spectra end-point with no filter should be avoid

The spectra endpoint contains three possible filters:
usi, accession, scan, file collection

However, we never defined that at least one of them should be defined for the query. Then, the following query is possible:

http://www.peptideatlas.org/api/proxi/v0.1/spectra?pageSize=100&resultType=compact&responseContentType=json

In practical terms we shouldn't allow this because no many users will loop in the entire resource to get all the spectra.

Opinions?

Develop a system to automatically generate the swagger docs

Validation of the swagger schema
Generation of the documentation in git pages.

Develop JSON Schema validator

As discussed during today's call, we need a validator for the JSON response that goes further than only checking whether there is a response or not.

The data served will not only be proteomeXchange data

The main goal here is for each resource to serve whatever data they have via the same exact API, but we should NOT just be limiting to ProteomeXchange data. Release to issue #7

Responses are all arrays unless id?

In the current schema /datasets returns an array (except for the {identifier} form.
But /spectra and /psms and all the rest are not returning arrays. Shouldn't they all return arrays?

inconsistent count* in Peptidoform

In the definition of Peptidoform, countPSM is singular and countDatasets is plural. I suggest we make them consistent. Probably plural is best. change to countPSMs?

  countPSM: 
    type: integer
    description: Number of PSMs that support the current Peptidoform 
  countDatasets: 
    type: string 
    description: Number of datasets that support the current Peptidoform

accession input and output

/psms input has:

    - name: accession
      in: query 
      type: string 
      description: Dataset accession

But Psm output has:

  accession:
     type: string 
     description: Accession of the PSM

This is either confusing or an error.

I also suggest that "accession" is too vague. May I suggest datasetIdentifier?
I suppose we use "accession" everywhere else. But I think this is vague and confusing.
At bare minimum, we should not use access for a PSM accession.

How should we resolve?

default pageNumber?

in /psms:

    - name: pageNumber
      in: query
      description: Current page to be shown paged psms (default page 1)
      required: false
      type: integer
      default: 0

is the default 0 or 1? schema says 0, but words say one.
This occurs in multiple other places in the schema

Also, we should state clearly whether we are 0-based for page 1 and 1-based for page 1

Swagger and OpenAPI defines a method to query for multiple values

We should improve the specification for multiValue parameters in each entrypoint using the same notation like swagger and OpenAPI:

https://swagger.io/docs/specification/2-0/describing-parameters/

Framework error responses

Our schema nominally defines an error result as this:

     Error:
         required:
            - code
            - message
          properties:
               code:
                    type: integer
                    format: int32
               message:
                    type: string

We can code that up. But what happens when our frameworks encounter an error like for a schema violation?

ProteomeCentral:
curl -i -X GET --header 'Accept: application/json' 'http://proteomecentral.proteomexchange.org/api/proxi/v0.1/datasets?pageSize=100&pageNumber=1&resultType=foo'

HTTP/1.1 400 BAD REQUEST
{
  "detail": "'foo' is not one of ['compact', 'full']\n\nFailed validating 'enum' in schema:\n    {'default': 'compact',\n     'description': 'Type of the object to be retrieve Compact or Full '\n                    'dataset',\n     'enum': ['compact', 'full'],\n     'in': 'query',\n     'name': 'resultType',\n     'type': 'string'}\n\nOn instance:\n    'foo'",
  "status": 400,
  "title": "Bad Request",
  "type": "about:blank"
}

After investigating the framework code, they are implementing this RFC:
https://tools.ietf.org/html/draft-ietf-appsawg-http-problem-00

PRIDE:
curl -i -X GET --header 'Accept: application/json' 'http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/datasets?pageSize=100&pageNumber=1&resultType=foo'

HTTP/1.1 400
{
"timestamp" : 1581580688155,
"status" : 400,
"error" : "Bad Request",
"message" : "Failed to convert value of type 'java.lang.String' to required type 'uk.ac.ebi.pride.ws.pride.utils.WsContastants$ResultType'; nested exception is org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [@org.springframework.web.bind.annotation.RequestParam uk.ac.ebi.pride.ws.pride.utils.WsContastants$ResultType] for value 'foo'; nested exception is java.lang.IllegalArgumentException: No enum constant uk.ac.ebi.pride.ws.pride.utils.WsContastants.ResultType.foo",
"path" : "/pride/proxi/archive/v0.1/datasets"
}

MassIVE:
curl -i -X GET --header 'Accept: application/json' 'ccms-internal.ucsd.edu/ProteoSAFe/proxi/v0.1/datasets?pageSize=100&pageNumber=1&resultType=foo'

HTTP/1.1 400 Bad Request

<title>Apache Tomcat/6.0.24 - Error report</title><style></style>

HTTP Status 400 - Unrecognized "resultType" parameter value [foo]

type Status report

message Unrecognized "resultType" parameter value [foo]

description The request sent by the client was syntactically incorrect (Unrecognized "resultType" parameter value [foo]).

Apache Tomcat/6.0.24

jPOST seems not to mind the schema violation:
curl -i -X GET --header 'Accept: application/json' 'https://repository.jpostdb.org/proxi/datasets?resultType=foo&accession=PXD005159'

HTTP/1.1 200 OK
[{"accession":[{"name":"jPOST dataset identifier","value":"JPST000200","accession":"MS:1002632","cvLabel":"MS"},{"name":"ProteomeXchange accession number","value":"PXD005159","accession":"MS:1001919","cvLabel":"MS"}],"title":"HeLa standard shotgun DDA analysis using a two-meter C18 monolithic silica column","publications":[{"name":"PubMed identifier","accession":"MS:1000879","value":"","cvLabel":"MS"},{"name":"Reference","accession":"MS:1002866","value":"","cvLabel":"MS"}],"contacts":[[{"name":"dataset submitter","accession":"MS:1002037","cvLabel":"MS"},{"name":"contact name","accession":"MS:1000586","value":"Saki Nambu","cvLabel":"MS"},{"name":"contact email","accession":"MS:1000589","value":"[email protected]","cvLabel":"MS"},{"name":"contact affiliation","accession":"MS:1000590","value":"Kyoto university","cvLabel":"MS"}],[{"name":"lab head","accession":"MS:1002332","cvLabel":"MS"},{"name":"contact name","accession":"MS:1000586","value":"N/A","cvLabel":"MS"},{"name":"contact affiliation","accession":"MS:1000590","value":"N/A","cvLabel":"MS"}]],"species":[[{"name":"taxonomy: scientific name","value":"Homo sapiens (Human)","accession":"MS:1001469","cvLabel":"MS"},{"name":"taxonomy: NCBI TaxID","value":"9606","accession":"MS:1001467","cvLabel":"MS"}]],"instruments":[[{"name":"Q Exactive","accession":"MS:1001911","cvLabel":"MS"}]]}]

How do we feel about these results?

Error status should contain only error code

Massive, PeptideAtlas, ProteomeCentral:

Hi, all, the Status Code in the URL should contain the word Not Implemented. It should be only the error code.

We think the way to thought the message should be in the body: http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/proteins?resultType=compact

Lets discuss in the next meeting

In Dataset, why summary?

In the schema definition of Dataset, it seems that we have an attribute "summary" that really is "Description" in PX XML? Can we preserve the name for clarity and call this attribute "description" instead?

Do we need Psm accession?

Psm is defined in the YAML as:

Psm:
required:
- peptideSequence
properties:
accession:
type: string
description: Accession of the PSM
usi:
type: string
description: The USI representation for the PSM
...

I like the usi. But what is the accession? Does anyone plan on filling in some other kind of accession for a PSM?

Related, the output does not have datasetIdentifier. All the other components needed to build a USI are part of the output. Except datasetIdentifier. Seems like we should have it. Maybe that's what accession was supposed to be?

Slack channel

@edeutsch @jjcarver Shin, Nuno and Juan do we want to have an open channel in slack that enable use to talk about the project daily basics for example. Also this can be open to other collaborators to interact with the group/project.

We have been using this strategy in other projects such as Biocontainers and people join making questions and proposing features for the resource.

How should a spectrum status be used?

The current Spectrum class defines a required attribute:

      status:
        type: string 
        enum: [READABLE, PEAK UNAVAILABLE]
        description: Status of the Spectrum

Can we define these status entries?

What does READABLE mean? Does this mean that the spectrum exists can be fetched and provided? I suppose this is fine, although a strange word, since the antonym is UNREADABLE. But what would UNREADABLE mean? And that isn't an option.

What does "PEAK UNAVAILABLE" mean exactly? Is that the first peak unavailable? or any one peak unavailable? All peaks unavailable? Some peaks unavailable? Or does it mean the spectrum is unavailable? How is this different from a 404?

How should this be used? At PeptideAtlas a spectrum is either available and provided or it is not available and just not in the returned list or is a 404. PeptideAtlas doesn't use "PEAK UNAVAILABLE" since I don't know what it should mean or how it should be used.

Should it be used if there is no such spectrum at the repository?
Should it be used if the spectrum is real and valid and should be available, but due to some technical glitch it cannot be fetched from the data store? So not 404. But closer to 500?

We should decide and document this.

What should /psms compact and full look like?

All endpoints have resultType=compact|full
What should compact for /psms be?
the YAML says:
Psm:
required:
- peptideSequence
properties:

but just a list of peptideSequences is useless. I thought just USIs would be a fine compact. But peptideSequence is required. so here's a possibility:

http://www.peptideatlas.org/api/proxi/v0.1/psms?resultType=compact&accession=PXD005942
[
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
},
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:13:LSSPATLNSR/2"
},
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-40:scan:15:LSSPATLNSR/2"
},
...

Do we like that?

is startPosition and endPosition really needed?

ProteinIdentification:
required:
- proteinAccession
- startPosition
- endPosition

Is the startPosition and endPosition really required here?
We don't have it trivially available at the moment, so are lying with -1 and -1.
We can get it and will, I guess.
but I'm questioning if we really should have these required. Most proteomics data output doesn't normally capture this?

What should the compact form of /peptidoforms return?

I don't think have not specified what the compact form should return for /peptidoforms

PeptideAtlas now implements
{
"countDatasets": 2,
"countPSM": 7,
"peptidoform": "[iTRAQ]-AAHEEIC[Carbamidomethyl]TTNEGVMYR"
}

good? changes? I suppose the only required field it peptidoform. Should it just be the required field peptidoform only?

http://www.peptideatlas.org/api/proxi/v0.1/peptidoforms?resultType=compact&peptideSequence=AAHEEICTTNEGVMYR

Spectra endpoint should be around retrieve one spectra at a time.

The spectra endpoint should serve only one spectra at a time.

peptide vs peptidoform conflation

We are totally conflating the terms peptide and peptidoform. From the /peptides YAML doc:
http://www.peptideatlas.org/api/proxi/v0.1/ui/#/

"The peptide entry point returns global peptidoform statistics across an entire resource. Each peptide contains a summary of the statistics of the peptidoform across the entire resource."

We should aim to be clear and precise. If this endpoint is dealing in peptidoforms (and it does because there are ptms there), then I think we should call it:
/peptidoforms

Do we also want to have /peptides entry point that is scrubbed of all mass mods?
i.e. the /peptides entry point is agnostic to mass mods
the /peptidoforms endpoint requires full handling of mass mods

What do you think?

Error status for multiple providers

As we discussed last week, we will need to have a different definition of errors or status when querying all entry points. The broker will need to retrieve multiple statuses for multiple entry points. We have multiple options here:

I think @edeutsch mention this option. Add a parameter called for examples, statuses and then we can attach the error as another object in the response. Like (http://www.peptideatlas.org/api/proxi/v0.1/psms?resultType=compact&accession=PXD005942):

 [ 
     {
         "peptideSequence": "LSSPATLNSR",
         "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
     },
     {
        "peptideSequence": "APLVCLPVFVSR",
        "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2"
     },
 ]
{
 errors: []
}

The second option is to encode the data into one part of the object and the errors in another Like:

{ 
   data: [
     {
         "peptideSequence": "LSSPATLNSR",
         "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
     },
     {
        "peptideSequence": "APLVCLPVFVSR",
        "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2"
     },
   ], 
   errors: []
}

The second approach define a global object with two parts data and errors.

/datasets API endpoint JSON output format

Below is some sample JSON that we would tentatively output from the /datasets API endpoint. The dataset used in this example is live in both MassIVE and ProteomeCentral, and can be found at the following links:

Link	URL
MassIVE dataset	https://massive.ucsd.edu/ProteoSAFe/QueryMSV?id=MSV000081125
MassIVE FTP	ftp://massive.ucsd.edu/MSV000081125
ProteomeCentral dataset	http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=6629
ProteomeCentral dataset XML	http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=6629&outputMode=XML&test=no

This is a "full" record with all files listed out:

{
    "accession": "PXD006629",
    "title": "Mitochondrial H+-ATP synthase in human skeletal muscle: contribution to dyslipidemia and insulin resistance",
    "summary": "Mitochondrial H+-ATP synthase in human skeletal muscle: contribution to dyslipidemia and insulin resistance",
    "species": [
        {"accession": "MS:1001467", "name": "taxonomy: NCBI TaxID", "value": "9606", "cvLabel": "MS"}
    ],
    "instruments": [
        {"accession": "MS:1002416", "name": "Orbitrap Fusion", "cvLabel": "MS"}
    ],
    "modifications": [
        {"accession": "UNIMOD:737", "name": "TMT6plex", "cvLabel": "UNIMOD"},
        {"accession": "UNIMOD:35", "name": "Oxidation", "cvLabel": "UNIMOD"},
        {"accession": "UNIMOD:4", "name": "Carbamidomethyl", "cvLabel": "UNIMOD"}
    ],
    "contacts": [
        {"contactProperties":[
            {"accession": "MS:1002037", "name": "dataset submitter", "cvLabel": "MS"},
            {"accession": "MS:1000586", "name": "contact name", "value": "John Lapek", "cvLabel": "MS"},
            {"accession": "MS:1000589", "name": "contact email", "value": "[email protected]", "cvLabel": "MS"},
            {"accession": "MS:1000590", "name": "contact affiliation", "value": "UCSD", "cvLabel": "MS"}
        ]},
        {"contactProperties":[
            {"accession": "MS:1002332", "name": "lab head", "cvLabel": "MS"},
            {"accession": "MS:1000586", "name": "contact name", "value": "Laura Formentini", "cvLabel": "MS"},
            {"accession": "MS:1000589", "name": "contact email", "value": "[email protected]", "cvLabel": "MS"},
            {"accession": "MS:1000590", "name": "contact affiliation", "value": "UAM University Madrid", "cvLabel": "MS"}
        ]}
    ],
    "publications": [
        {"accession": "MS:1002853", "name": "Dataset with no associated published manuscript", "cvLabel": "MS"}
    ],
    "keywords": [
        {"accession": "MS:1001925", "name": "submitter keyword", "value": "mitochondria", "cvLabel": "MS"},
        {"accession": "MS:1001925", "name": "submitter keyword", "value": "insulin resistance", "cvLabel": "MS"},
        {"accession": "MS:1001925", "name": "submitter keyword", "value": "ATP synthase", "cvLabel": "MS"}
    ],
    "datasetLink": {"accession": "MS:1002488", "name": "MassIVE dataset URI", "value": "http://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=d6756ac742ed4f13811ddab2843e7d54", "cvLabel": "MS"},
    "dataFiles": [
        {"accession": "MS:1002846", "name": "Associated raw file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/raw/DG000895_Francisco_Normal_Mitos.raw", "cvLabel": "MS"},
        {"accession": "MS:1002850", "name": "Peak list file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/peak/DG000895_Francisco_Normal_Mitos.mzML", "cvLabel": "MS"},
        {"accession": "MS:1002845", "name": "Result file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/result/DG000895_Francisco_Normal_Mitos_PSMs.mzTab", "cvLabel": "MS"},
        {"accession": "MS:1002848", "name": "Result file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/ccms_result/DG000895_Francisco_Normal_Mitos_PSMs.mzTab", "cvLabel": "MS"},
        {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/other/DG000895_Francisco_Normal_Mitos.zip", "cvLabel": "MS"},
        {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/other/Francisco_Normal_Mitos.xlsx", "cvLabel": "MS"},
        {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/ccms_parameters/params.xml", "cvLabel": "MS"},
        {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/ccms_statistics/statistics.tsv", "cvLabel": "MS"}
    ],
    "links": [
        {"rel": "self", "href": "http://massive.ucsd.edu/ProteoSAFe/proxi/datasets/PXD006629"}
    ]
}

Please comment on any potential issues you see with this sample output format.

do we really stipulate a max pageSize?

Current yaml says:
- name: pageSize
in: query
description: How many items to return at one time (default 100, max 100)
required: false
type: integer
default: 100

I'm fine with a default 100 so that a naive query does not return a billion rows. But why should we stipulate a max of 100? If a client wants to pull all 10,000 PSMs from PXD123, why should they have to do it in chunks of 100? How irritating for them. And extra work for my machine too.

I propose we can keep default of 100, but let each implementing site choose what max or limits to impose. If PRIDE only wants to allow 100 at a time, fine. But I don't think we should prevent PeptideAtlas from returning 10,000 rows if the user asks for it? It's not enforceable anyway via the schema, so I propose we strike that.

Comments?

/peptidoforms still labeled getPeptides

The operationId for /peptidoforms is still getPeptides, which leads to confusing autogenerated code and will cause a problem if we ever create a /peptides endpoint

/peptidoforms:
get:
summary: Get a collection of peptidoforms
operationId: getPeptides

Way of filtering datasets

It would be great if we can filter datasets by the following fields:

/datasets?pageSize=50&pageNumber=1&resultType=compact
/datasets?pageSize=50&pageNumber=1&resultType=full

Filters (filtering result by columns that are returned):
/datasets?species=human
/datasets?species=homo*
/datasets?species=*sapiens
/datasets?species=homo sapiens
/datasets?species=homo sapiens&pageSize=50&pageNumber=1&resultType=compact
/datasets?species=9606
/datasets?species=taxon:9606
/datasets?species=human;mouse # decide which of these to use or what is conventional
/datasets?species=[human,mouse]
accession=
instrument=
contact=
publication=
modification=
search=liver
search=P12345 # this might be honored by a service, but not mandatory. At some point in the future, we as a consortium might decide to add protein=

Searches: (selecting results based on terms that can apply to any part of the returned records)
/datasets?search=liver
/datasets?species=human&contact=Mann&search=liver

/datasets?species=human&pageSize=50&pageNumber=1&resultType=compact

Include in the specification the attributes that would be REQUIRED

We should have a list of REQUIRED fields and how we express them into spectra.

Move all the discussions from GoogleDoc to github.

species as query parameter?

Also in my implementation notes was the idea that "species" should be an input parameter to all of the endpoints. One can imagine wanting to constrain any of those queries to limit results to just one species.

What do you think?

Massive data files do not exist

@jjcarver I was testing today the Proxi API and I realize that massive endpoint is not returning the files associated with the dataset. Can you do an effort to return that information?

This is important because if we start implementing clients and tools associated with the API the users will expect as much information as possible.

In Dataset, datasetLink should be plural

All the other lists are plural except datasetLink. Should be datasetLinks

Comments on current form of Spectrum class

Regarding the current schema:
https://raw.githubusercontent.com/HUPO-PSI/proxi-schemas/master/specs/swagger.yaml

Here is a toy example of a Spectrum object as defined by the current schema:
http://www.peptideatlas.org/api/proxi/v0/spectra/238293
UI: http://www.peptideatlas.org/api/proxi/v0/ui/

Comments/questions:
- usi is great, but what is accession? Anything the repo wants? preferably in usi notation? (like an lsi {local spectrum identifier})
- charge: this already has a very specific meaning (assuming this means precursor_charge) Do we really need a CV term?
- mz: this already has a very specific meaning (assuming this means precursor_mz) Do we really need a CV term?
- If so, what is the proper term? MS:1000744 selected ion m/z is what mzML uses
- I suggest the very limited number of fixed slots don't need OntologyTerm since they are clearly defined
- What if there are multiple precursors in the selection window? rarely supported but a common occurrence..
- In addition to the standard set of attributes, how about a container for additional CV terms
  e.g. spectrumAttributes = [ OntologyTerm, OntologyTerm, ... ]
- In OntlogyTerm, can we omit cvLabel? I don't mzML et al. have it, but since our accessions are full CURIEs, the CURIE prefix is the cvLabel
  UNLESS, we defining a cvList in the documentheader somewhere, which we're not. So how about remove cvLabel and always use the common CURIE prefixes.
  CURIE prefixes are not completely unambiguous (e.g. PMID: vs PUBMED: but in our little MS world, effectively it is)
- What is the desired format of peakList? Array of peaks, yes, but what is each peak? 3-element array? dict of attributes?

What should "not implemented" look like?

What should it look like when a PROXI server does not implement a particular endpoint?

I suggest returning HTTP error code 450:

http://proteomecentral.proteomexchange.org/api/proxi/v0.1/proteins?resultType=compact

{
"detail": "Although this is an officially defined PROXI endpoint, it has not yet been implemented at this server",
"status": 450,
"title": "Endpoint not implemented",
"type": "about:blank"
}

What do you think?

PRIDE USI without interpretation not working?

This USI with an interpretation works:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850:[UNIMOD:214]YYWGGLYSWDMSK[UNIMOD:214]/2

but the same USI without an interpretation does not return a valid response:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850

Add 500 to the list?

Do we want to add 500 and 501 to the specification, e.g. here:

proxi-schemas/specs/swagger.yaml

Line 105 in 154e8af

default:

500 means that the server has some internal fault that doesn't fall under 400 and 404.
501 means that the server does not implement this endpoint

msrun and filename

@edeutsch :

In the parameters of the psm we use the msrun and filename what is the different between them.

PRIDE PROXI spectra returning mzs not in order

Fetching a spectrum from PRIDE via PROXI such as:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850:[UNIMOD:214]YYWGGLYSWDMSK[UNIMOD:214]/2

returns the mzs in random order. This is not against the current spec, which does not specify. But it is breaking the Lorikeet viewer at ProteomeCentral. Seems like many applications may assume mzs in order.

What should be the resolution?

Should we update the documentation to allow mzs in any order? Write some code to compensate for out of order mzs in ProteomeCentral and all other applications?
Should we update the documentation to require mzs in ascending order? And update the PRIDE implementation?

Does the random order of mzs at PRIDE match the same order of intensities? One risk of separate arrays is that they become unaligned.