hupo-psi / proxi-schemas Goto Github PK
View Code? Open in Web Editor NEWProXI: Schema definitions for the Proteomics eXpression Interface
ProXI: Schema definitions for the Proteomics eXpression Interface
In PROXI, when referring to served datasets at any resource, for any dataset with a PXD/RPXD, prefer to refer to it with the PXD/RPXD.
For any datasets without a PXD/RPXD, it is fine to use local identifiers such as MSVnnnn or PAennnnnn
We need to have a validator for proxi endpoints. @ypriverol
The searchEngineScore
in psm should be plural because it is a list
searchEngineScore -> searchEngineScores
This URL returns a spectrum:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850:[UNIMOD:214]YYWGGLYSWDMSK[UNIMOD:214]/2
But there are no attributes.
They are not formally required, but seems like some basic metadata might be available?
According to the current schema, a Spectrum from /spectra has something like this:
{
"attributes": [
{
"accession": "MS:1000744",
"cv_param_group": null,
"name": "selected ion m/z",
"value": "473.1234"
},
{
"accession": "MS:1000041",
"cv_param_group": null,
"name": "charge state",
"value": "2"
},
...
}
This is nice, but quite verbose. And what if the value is another CV term?
Over in PSI Spectra libraries format land:
http://proteomecentral.proteomexchange.org/cgi/spectra?usi=mzspec:PXL000001:05-29-2014:index:5001&output_format=json
I started using a more compact notation, e.g.:
{
"attributes": [
[
"MS:1000041|charge state",
"2"
],
[
"MS:1000744|selected ion m/z",
"847.417"
],
[
"MS:1009030|representative spectrum type",
"MS:1009032|consensus spectrum"
],
[
"MS:1009040|number of enzymatic termini",
2,
"1"
],
[
"MS:1001045|cleavage agent name",
"MS:1001251|Trypsin",
"1"
],
The first item in each sublist is the key (accession|name), the second item is the value, and the third optional item is the cv_param_group
More cryptic for sure. But a lot less verbose and a bit more graceful when the value is the cvParam.
What thinks we?
The spectra endpoint contains three possible filters:
usi
, accession
, scan
, file collection
However, we never defined that at least one of them should be defined for the query. Then, the following query is possible:
http://www.peptideatlas.org/api/proxi/v0.1/spectra?pageSize=100&resultType=compact&responseContentType=json
In practical terms we shouldn't allow this because no many users will loop in the entire resource to get all the spectra.
Opinions?
As discussed during today's call, we need a validator for the JSON response that goes further than only checking whether there is a response or not.
The main goal here is for each resource to serve whatever data they have via the same exact API, but we should NOT just be limiting to ProteomeXchange data. Release to issue #7
In the current schema /datasets returns an array (except for the {identifier} form.
But /spectra and /psms and all the rest are not returning arrays. Shouldn't they all return arrays?
In the definition of Peptidoform, countPSM is singular and countDatasets is plural. I suggest we make them consistent. Probably plural is best. change to countPSMs?
countPSM:
type: integer
description: Number of PSMs that support the current Peptidoform
countDatasets:
type: string
description: Number of datasets that support the current Peptidoform
/psms input has:
- name: accession
in: query
type: string
description: Dataset accession
But Psm output has:
accession:
type: string
description: Accession of the PSM
This is either confusing or an error.
I also suggest that "accession" is too vague. May I suggest datasetIdentifier?
I suppose we use "accession" everywhere else. But I think this is vague and confusing.
At bare minimum, we should not use access for a PSM accession.
How should we resolve?
in /psms:
- name: pageNumber
in: query
description: Current page to be shown paged psms (default page 1)
required: false
type: integer
default: 0
is the default 0 or 1? schema says 0, but words say one.
This occurs in multiple other places in the schema
Also, we should state clearly whether we are 0-based for page 1 and 1-based for page 1
We should improve the specification for multiValue parameters in each entrypoint using the same notation like swagger and OpenAPI:
https://swagger.io/docs/specification/2-0/describing-parameters/
Our schema nominally defines an error result as this:
Error:
required:
- code
- message
properties:
code:
type: integer
format: int32
message:
type: string
We can code that up. But what happens when our frameworks encounter an error like for a schema violation?
ProteomeCentral:
curl -i -X GET --header 'Accept: application/json' 'http://proteomecentral.proteomexchange.org/api/proxi/v0.1/datasets?pageSize=100&pageNumber=1&resultType=foo'
HTTP/1.1 400 BAD REQUEST
{
"detail": "'foo' is not one of ['compact', 'full']\n\nFailed validating 'enum' in schema:\n {'default': 'compact',\n 'description': 'Type of the object to be retrieve Compact or Full '\n 'dataset',\n 'enum': ['compact', 'full'],\n 'in': 'query',\n 'name': 'resultType',\n 'type': 'string'}\n\nOn instance:\n 'foo'",
"status": 400,
"title": "Bad Request",
"type": "about:blank"
}
After investigating the framework code, they are implementing this RFC:
https://tools.ietf.org/html/draft-ietf-appsawg-http-problem-00
PRIDE:
curl -i -X GET --header 'Accept: application/json' 'http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/datasets?pageSize=100&pageNumber=1&resultType=foo'
HTTP/1.1 400
{
"timestamp" : 1581580688155,
"status" : 400,
"error" : "Bad Request",
"message" : "Failed to convert value of type 'java.lang.String' to required type 'uk.ac.ebi.pride.ws.pride.utils.WsContastants$ResultType'; nested exception is org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [@org.springframework.web.bind.annotation.RequestParam uk.ac.ebi.pride.ws.pride.utils.WsContastants$ResultType] for value 'foo'; nested exception is java.lang.IllegalArgumentException: No enum constant uk.ac.ebi.pride.ws.pride.utils.WsContastants.ResultType.foo",
"path" : "/pride/proxi/archive/v0.1/datasets"
}
MassIVE:
curl -i -X GET --header 'Accept: application/json' 'ccms-internal.ucsd.edu/ProteoSAFe/proxi/v0.1/datasets?pageSize=100&pageNumber=1&resultType=foo'
HTTP/1.1 400 Bad Request
<title>Apache Tomcat/6.0.24 - Error report</title><style></style>type Status report
message Unrecognized "resultType" parameter value [foo]
description The request sent by the client was syntactically incorrect (Unrecognized "resultType" parameter value [foo]).
jPOST seems not to mind the schema violation:
curl -i -X GET --header 'Accept: application/json' 'https://repository.jpostdb.org/proxi/datasets?resultType=foo&accession=PXD005159'
HTTP/1.1 200 OK
[{"accession":[{"name":"jPOST dataset identifier","value":"JPST000200","accession":"MS:1002632","cvLabel":"MS"},{"name":"ProteomeXchange accession number","value":"PXD005159","accession":"MS:1001919","cvLabel":"MS"}],"title":"HeLa standard shotgun DDA analysis using a two-meter C18 monolithic silica column","publications":[{"name":"PubMed identifier","accession":"MS:1000879","value":"","cvLabel":"MS"},{"name":"Reference","accession":"MS:1002866","value":"","cvLabel":"MS"}],"contacts":[[{"name":"dataset submitter","accession":"MS:1002037","cvLabel":"MS"},{"name":"contact name","accession":"MS:1000586","value":"Saki Nambu","cvLabel":"MS"},{"name":"contact email","accession":"MS:1000589","value":"[email protected]","cvLabel":"MS"},{"name":"contact affiliation","accession":"MS:1000590","value":"Kyoto university","cvLabel":"MS"}],[{"name":"lab head","accession":"MS:1002332","cvLabel":"MS"},{"name":"contact name","accession":"MS:1000586","value":"N/A","cvLabel":"MS"},{"name":"contact affiliation","accession":"MS:1000590","value":"N/A","cvLabel":"MS"}]],"species":[[{"name":"taxonomy: scientific name","value":"Homo sapiens (Human)","accession":"MS:1001469","cvLabel":"MS"},{"name":"taxonomy: NCBI TaxID","value":"9606","accession":"MS:1001467","cvLabel":"MS"}]],"instruments":[[{"name":"Q Exactive","accession":"MS:1001911","cvLabel":"MS"}]]}]
How do we feel about these results?
Massive, PeptideAtlas, ProteomeCentral:
Hi, all, the Status Code in the URL should contain the word Not Implemented
. It should be only the error code.
We think the way to thought the message should be in the body: http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/proteins?resultType=compact
Lets discuss in the next meeting
In the schema definition of Dataset, it seems that we have an attribute "summary" that really is "Description" in PX XML? Can we preserve the name for clarity and call this attribute "description" instead?
Psm is defined in the YAML as:
Psm:
required:
- peptideSequence
properties:
accession:
type: string
description: Accession of the PSM
usi:
type: string
description: The USI representation for the PSM
...
I like the usi. But what is the accession? Does anyone plan on filling in some other kind of accession for a PSM?
Related, the output does not have datasetIdentifier. All the other components needed to build a USI are part of the output. Except datasetIdentifier. Seems like we should have it. Maybe that's what accession was supposed to be?
@edeutsch @jjcarver Shin, Nuno and Juan do we want to have an open channel in slack that enable use to talk about the project daily basics for example. Also this can be open to other collaborators to interact with the group/project.
We have been using this strategy in other projects such as Biocontainers and people join making questions and proposing features for the resource.
The current Spectrum class defines a required attribute:
status:
type: string
enum: [READABLE, PEAK UNAVAILABLE]
description: Status of the Spectrum
Can we define these status entries?
What does READABLE mean? Does this mean that the spectrum exists can be fetched and provided? I suppose this is fine, although a strange word, since the antonym is UNREADABLE. But what would UNREADABLE mean? And that isn't an option.
What does "PEAK UNAVAILABLE" mean exactly? Is that the first peak unavailable? or any one peak unavailable? All peaks unavailable? Some peaks unavailable? Or does it mean the spectrum is unavailable? How is this different from a 404?
How should this be used? At PeptideAtlas a spectrum is either available and provided or it is not available and just not in the returned list or is a 404. PeptideAtlas doesn't use "PEAK UNAVAILABLE" since I don't know what it should mean or how it should be used.
Should it be used if there is no such spectrum at the repository?
Should it be used if the spectrum is real and valid and should be available, but due to some technical glitch it cannot be fetched from the data store? So not 404. But closer to 500?
We should decide and document this.
All endpoints have resultType=compact|full
What should compact for /psms be?
the YAML says:
Psm:
required:
- peptideSequence
properties:
but just a list of peptideSequences is useless. I thought just USIs would be a fine compact. But peptideSequence is required. so here's a possibility:
http://www.peptideatlas.org/api/proxi/v0.1/psms?resultType=compact&accession=PXD005942
[
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
},
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:13:LSSPATLNSR/2"
},
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-40:scan:15:LSSPATLNSR/2"
},
...
Do we like that?
ProteinIdentification:
required:
- proteinAccession
- startPosition
- endPosition
Is the startPosition and endPosition really required here?
We don't have it trivially available at the moment, so are lying with -1 and -1.
We can get it and will, I guess.
but I'm questioning if we really should have these required. Most proteomics data output doesn't normally capture this?
I don't think have not specified what the compact form should return for /peptidoforms
PeptideAtlas now implements
{
"countDatasets": 2,
"countPSM": 7,
"peptidoform": "[iTRAQ]-AAHEEIC[Carbamidomethyl]TTNEGVMYR"
}
good? changes? I suppose the only required field it peptidoform. Should it just be the required field peptidoform only?
The spectra endpoint should serve only one spectra at a time.
We are totally conflating the terms peptide and peptidoform. From the /peptides YAML doc:
http://www.peptideatlas.org/api/proxi/v0.1/ui/#/
"The peptide entry point returns global peptidoform statistics across an entire resource. Each peptide contains a summary of the statistics of the peptidoform across the entire resource."
We should aim to be clear and precise. If this endpoint is dealing in peptidoforms (and it does because there are ptms there), then I think we should call it:
/peptidoforms
Do we also want to have /peptides entry point that is scrubbed of all mass mods?
i.e. the /peptides entry point is agnostic to mass mods
the /peptidoforms endpoint requires full handling of mass mods
What do you think?
As we discussed last week, we will need to have a different definition of errors or status when querying all entry points. The broker will need to retrieve multiple statuses for multiple entry points. We have multiple options here:
statuses
and then we can attach the error as another object in the response. Like (http://www.peptideatlas.org/api/proxi/v0.1/psms?resultType=compact&accession=PXD005942): [
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
},
{
"peptideSequence": "APLVCLPVFVSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2"
},
]
{
errors: []
}
{
data: [
{
"peptideSequence": "LSSPATLNSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
},
{
"peptideSequence": "APLVCLPVFVSR",
"usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2"
},
],
errors: []
}
The second approach define a global object with two parts data and errors.
Below is some sample JSON that we would tentatively output from the /datasets API endpoint. The dataset used in this example is live in both MassIVE and ProteomeCentral, and can be found at the following links:
Link | URL |
---|---|
MassIVE dataset | https://massive.ucsd.edu/ProteoSAFe/QueryMSV?id=MSV000081125 |
MassIVE FTP | ftp://massive.ucsd.edu/MSV000081125 |
ProteomeCentral dataset | http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=6629 |
ProteomeCentral dataset XML | http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=6629&outputMode=XML&test=no |
This is a "full" record with all files listed out:
{ "accession": "PXD006629", "title": "Mitochondrial H+-ATP synthase in human skeletal muscle: contribution to dyslipidemia and insulin resistance", "summary": "Mitochondrial H+-ATP synthase in human skeletal muscle: contribution to dyslipidemia and insulin resistance", "species": [ {"accession": "MS:1001467", "name": "taxonomy: NCBI TaxID", "value": "9606", "cvLabel": "MS"} ], "instruments": [ {"accession": "MS:1002416", "name": "Orbitrap Fusion", "cvLabel": "MS"} ], "modifications": [ {"accession": "UNIMOD:737", "name": "TMT6plex", "cvLabel": "UNIMOD"}, {"accession": "UNIMOD:35", "name": "Oxidation", "cvLabel": "UNIMOD"}, {"accession": "UNIMOD:4", "name": "Carbamidomethyl", "cvLabel": "UNIMOD"} ], "contacts": [ {"contactProperties":[ {"accession": "MS:1002037", "name": "dataset submitter", "cvLabel": "MS"}, {"accession": "MS:1000586", "name": "contact name", "value": "John Lapek", "cvLabel": "MS"}, {"accession": "MS:1000589", "name": "contact email", "value": "[email protected]", "cvLabel": "MS"}, {"accession": "MS:1000590", "name": "contact affiliation", "value": "UCSD", "cvLabel": "MS"} ]}, {"contactProperties":[ {"accession": "MS:1002332", "name": "lab head", "cvLabel": "MS"}, {"accession": "MS:1000586", "name": "contact name", "value": "Laura Formentini", "cvLabel": "MS"}, {"accession": "MS:1000589", "name": "contact email", "value": "[email protected]", "cvLabel": "MS"}, {"accession": "MS:1000590", "name": "contact affiliation", "value": "UAM University Madrid", "cvLabel": "MS"} ]} ], "publications": [ {"accession": "MS:1002853", "name": "Dataset with no associated published manuscript", "cvLabel": "MS"} ], "keywords": [ {"accession": "MS:1001925", "name": "submitter keyword", "value": "mitochondria", "cvLabel": "MS"}, {"accession": "MS:1001925", "name": "submitter keyword", "value": "insulin resistance", "cvLabel": "MS"}, {"accession": "MS:1001925", "name": "submitter keyword", "value": "ATP synthase", "cvLabel": "MS"} ], "datasetLink": {"accession": "MS:1002488", "name": "MassIVE dataset URI", "value": "http://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=d6756ac742ed4f13811ddab2843e7d54", "cvLabel": "MS"}, "dataFiles": [ {"accession": "MS:1002846", "name": "Associated raw file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/raw/DG000895_Francisco_Normal_Mitos.raw", "cvLabel": "MS"}, {"accession": "MS:1002850", "name": "Peak list file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/peak/DG000895_Francisco_Normal_Mitos.mzML", "cvLabel": "MS"}, {"accession": "MS:1002845", "name": "Result file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/result/DG000895_Francisco_Normal_Mitos_PSMs.mzTab", "cvLabel": "MS"}, {"accession": "MS:1002848", "name": "Result file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/ccms_result/DG000895_Francisco_Normal_Mitos_PSMs.mzTab", "cvLabel": "MS"}, {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/other/DG000895_Francisco_Normal_Mitos.zip", "cvLabel": "MS"}, {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/other/Francisco_Normal_Mitos.xlsx", "cvLabel": "MS"}, {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/ccms_parameters/params.xml", "cvLabel": "MS"}, {"accession": "MS:1002851", "name": "Other type file URI", "value": "ftp://massive.ucsd.edu/MSV000081125/ccms_statistics/statistics.tsv", "cvLabel": "MS"} ], "links": [ {"rel": "self", "href": "http://massive.ucsd.edu/ProteoSAFe/proxi/datasets/PXD006629"} ] }
Please comment on any potential issues you see with this sample output format.
Current yaml says:
- name: pageSize
in: query
description: How many items to return at one time (default 100, max 100)
required: false
type: integer
default: 100
I'm fine with a default 100 so that a naive query does not return a billion rows. But why should we stipulate a max of 100? If a client wants to pull all 10,000 PSMs from PXD123, why should they have to do it in chunks of 100? How irritating for them. And extra work for my machine too.
I propose we can keep default of 100, but let each implementing site choose what max or limits to impose. If PRIDE only wants to allow 100 at a time, fine. But I don't think we should prevent PeptideAtlas from returning 10,000 rows if the user asks for it? It's not enforceable anyway via the schema, so I propose we strike that.
Comments?
The operationId for /peptidoforms is still getPeptides, which leads to confusing autogenerated code and will cause a problem if we ever create a /peptides endpoint
/peptidoforms:
get:
summary: Get a collection of peptidoforms
operationId: getPeptides
It would be great if we can filter datasets by the following fields:
/datasets?pageSize=50&pageNumber=1&resultType=compact
/datasets?pageSize=50&pageNumber=1&resultType=full
Searches: (selecting results based on terms that can apply to any part of the returned records)
/datasets?search=liver
/datasets?species=human&contact=Mann&search=liver
/datasets?species=human&pageSize=50&pageNumber=1&resultType=compact
We should have a list of REQUIRED fields and how we express them into spectra
.
Also in my implementation notes was the idea that "species" should be an input parameter to all of the endpoints. One can imagine wanting to constrain any of those queries to limit results to just one species.
What do you think?
@jjcarver I was testing today the Proxi API and I realize that massive endpoint is not returning the files associated with the dataset. Can you do an effort to return that information?
This is important because if we start implementing clients and tools associated with the API the users will expect as much information as possible.
All the other lists are plural except datasetLink. Should be datasetLinks
Regarding the current schema:
https://raw.githubusercontent.com/HUPO-PSI/proxi-schemas/master/specs/swagger.yaml
Here is a toy example of a Spectrum object as defined by the current schema:
http://www.peptideatlas.org/api/proxi/v0/spectra/238293
UI: http://www.peptideatlas.org/api/proxi/v0/ui/
What should it look like when a PROXI server does not implement a particular endpoint?
I suggest returning HTTP error code 450:
http://proteomecentral.proteomexchange.org/api/proxi/v0.1/proteins?resultType=compact
{
"detail": "Although this is an officially defined PROXI endpoint, it has not yet been implemented at this server",
"status": 450,
"title": "Endpoint not implemented",
"type": "about:blank"
}
What do you think?
This USI with an interpretation works:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850:[UNIMOD:214]YYWGGLYSWDMSK[UNIMOD:214]/2
but the same USI without an interpretation does not return a valid response:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850
Do we want to add 500 and 501 to the specification, e.g. here:
proxi-schemas/specs/swagger.yaml
Line 105 in 154e8af
500 means that the server has some internal fault that doesn't fall under 400 and 404.
501 means that the server does not implement this endpoint
In the parameters of the psm we use the msrun
and filename
what is the different between them.
Fetching a spectrum from PRIDE via PROXI such as:
http://wwwdev.ebi.ac.uk/pride/proxi/archive/v0.1/spectra?resultType=full&usi=mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_12_5Feb12_Cougar_11-10-11.mzML:scan:11850:[UNIMOD:214]YYWGGLYSWDMSK[UNIMOD:214]/2
returns the mzs in random order. This is not against the current spec, which does not specify. But it is breaking the Lorikeet viewer at ProteomeCentral. Seems like many applications may assume mzs in order.
What should be the resolution?
Does the random order of mzs at PRIDE match the same order of intensities? One risk of separate arrays is that they become unaligned.
Extended ERROR Code should be defined for each entry point.
We need to start adding versioning on the API, @edeutsch suggested start with the v0.1.
We need to update the specification document with USE Cases, examples. Etc. A PR will be done soon with the first template.
Would be great if we can implements hateos for every entry point that list collections.
We need to have a way to know which service implements which entry point.
Spectrum info https://github.com/HUPO-PSI/proxi-schemas/blob/master/specs/swagger.yaml#L268
The mandatory fields are:
Extra metadata:
proxi-schemas/api-swagger.yaml
Line 14 in 89f772b
why should be have localhost as and edpoint?
We will use the Open-api 2.0 because code autogeneration is supported by many tools.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.