o2r-project / api Goto Github PK
View Code? Open in Web Editor NEWReproducibility service RESTful web API specification and documentation
Home Page: https://o2r.info/api
License: Other
Reproducibility service RESTful web API specification and documentation
Home Page: https://o2r.info/api
License: Other
It would be helpful to also publish the API as a HTML document, ideally as a GH-page.
return MIME type in attribute type
for each file
Retrieve the public link for a compendium at ../api/v1/compendium/ABC12/link
.
This only concerns editors/admins, so the current listing of all links is probably fine for now.
Let's not run the action on forks, so we do not create each PDF twice.
For testing, one can run the code form the action manually.
https://stackoverflow.com/questions/54259816/how-to-generate-a-pdf-or-markup-from-openapi-3-0
This is the old make target for PDF generation of the mkdocs-based site:
pdf: build
wkhtmltopdf --version;
# fix protocol relative URLs, see https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2713
find site/ -type f -name '*.html' | xargs sed -i 's|href="//|href="https://|g'
find site/ -type f -name '*.html' | xargs sed -i 's|src="//|src="https://|g'
wkhtmltopdf --margin-top 20mm --no-background --javascript-delay 5000 \
file://$(shell pwd)/site/index.html \
file://$(shell pwd)/site/compendium/view/index.html \
file://$(shell pwd)/site/compendium/candidate/index.html \
file://$(shell pwd)/site/compendium/files/index.html \
file://$(shell pwd)/site/compendium/delete/index.html \
file://$(shell pwd)/site/compendium/upload/index.html \
file://$(shell pwd)/site/compendium/public_share/index.html \
file://$(shell pwd)/site/compendium/download/index.html \
file://$(shell pwd)/site/compendium/metadata/index.html \
file://$(shell pwd)/site/compendium/substitute/index.html \
file://$(shell pwd)/site/compendium/link/index.html \
file://$(shell pwd)/site/job/index.html \
file://$(shell pwd)/site/search/index.html \
file://$(shell pwd)/site/shipment/index.html \
file://$(shell pwd)/site/user/index.html \
o2r-web-api.pdf
in 01-API.md
02-upload.md
../compendium
, or do we want to distinguish URC, ERC, PERC? I am against modelling these states in the URLs/upload/urc/:id
, but content with JSON?zip
, but also tar.gz
./upload/remote
with POST
and then we figure out from the provided JSON payload what it is (e.g. git URL)03-execution.md
execute_now
is very limited, why not a delay_seconds
which is 0
by default? or easier, a job_start_time
, which can be in the past (execute now) or in the future (the client has to do the math)FileDescriptor
should be marked a a potential later feature/compendium/<compendium ID>/jobs
?create
part in the URL confusing - isn't that what POST
and PUT
are for?../run
- should this not be POST
to /jobs/:id
?/jobs/view/:id
should simply be GET
/jobs/:id
04-ERC.md
/erc/view/:id
> no need for /view
05-user.md
/v1/upload/...
etc.Useful because jobs can be run even before publication, cf. o2r-project/o2r-UI#151
See https://stackoverflow.com/a/4073451/261210 for background.
In the examples for direct upload, we use curl -F
which points to form/multipart
:
-F, --form CONTENT Specify HTTP multipart POST data (H)
See o2r-project/o2r-bouncer#14
Important: must update all microservices for this!
There fore it might be preferable to switch to a non-cookie based way to provide the token, for example with Authentication: Bearer
header.
Instead of ISO 8601, we should refer (and use) only to RFC-3339 (tools.ietf.org/html/rfc3339) which is open/free.
This only requires to change the name of the standard (search project for "8601").
PDF document metadata should also have the number of pages, not just the file size. This would allow the client to request a preview or the whole file, potentially better than the mere file size.
As proposed by @nuest:
See https://restful-api-design.readthedocs.org/en/latest/urls.html#entry-point
Should contain version, list of resources etc.
@MarkusKonk you were asking recently what levels we have hare:o2r-project/o2r-platform#73 The API docs would be a good place to keep the levels in one place across all microservices.
Where [FileDescriptor]
allows overriding files from the ERC with files
from a different execution Job or a different ERC.
[what? diese Funktionalität ist mir neu.]
[jk: O3,4, User Stories 53-55]
[MK: Ok, wenn ich die User Stories dazu nehme, ist das klarer, aber dann müsstest du hier noch genauer werden.]
See #60
A draft to support direct file upload during substitution, i.e. a user selects a file from the browser instead of from the overlay ERC.
@MarkusKonk Can you take an hour and update the API docs for the current state of the bindings, please?
Identify some APIs who do similar things (Amazon compute, Travis, Drone) and see what useful patterns can be re-used
When there is no job and subsequently not image tarball that can be downloaded, then the transportar returns an HTTP 500
with a JSON body error message. This is not documented in the API, so please add it.
Ideally an ERC is connected to a single publication, which has a doi
.
If that is a goal and often enough the case, we can try to support this in search or filter operations
/api/v1/compendium?doi=doi%3A10.10.1038%2Fnphys1170
/api/v1/search?doi=doi%3A10.10.1038%2Fnphys1170
Should this go into filtering or into search?
Change the process so that users can Save an ERC without it being published, see https://o2r.info/api/compendium/candidate/#metadata-review-and-saving
Add API feature to publish ERC.
The orcid id
has already been added to the metadata of a single compendium. It would be nice if also the username
could be added. Maybe something like this?
"User": {
"name": "xy",
"orcid": "1234"
}
Hello,
referring to the API a file has the attributes path, name and size. Please also add the attribute type containing a string with the file's mime type. Folders should not have this attribute.
Best,
Jan
As of now, the check is binary. This requires the outputs to be pixel-perfect, whereas reality might require a human reviewer to make that call. Therefore we should discuss to loosen the binary nature of the check and...
check_overruled
?The metadata part of the platform is still very much under development. Instead of putting an intermediate documentation into the API we will keep an updated version of the documentation in this issue, and also discuss it here.
Eventually this should go into a file docs/compendium-metadata.md
o2r
metadata <- edit: see schema for full descriptiono2r provides seperate MD for different purposes making use of their translatability:
{
"id":"XyZ19",
"metadata":{
"third_party": {},
"o2r": {
"license": {},
},
"raw": {}
},
"created":"2016-12-15T08:22:27.029Z",
"user":"0000-0002-0024-5046",
"files":{
}
}
edit: need license in main for mappings!
Licensing information is provided separately for the main parts of a compendium, i.e. data, text, and code. Cases such as different licenses for different files or sub-projects are not covered directly.
License MD can contain free text or a list of licenses.
A license must be provided for each part of the compendium (code, data, text). The license might be identical. The license string is based on Open Licenses Service names.
edit: need to look at repository requirements, cf. trello card
- for code, the list of OSI licenses is recommended
{
"metadata":{
"license":{
"data":"Against-DRM",
"text":"CC-BY-4.0",
"code":"AAL"
}
}
}
This subset contains a core set of metadata attributes. They are refined from automatic extraction ("raw MD) to comply to the o2r-schema. Within the workflow, the user of the o2r platform is to review the raw MD and provide additions or modifications. The MD broker will translate the corrected raw MD to o2r MD.
{
"metadata":{
"o2r":{ "title":"ERC title", ... }
}
}
edit: we dont need anythin beyond this point. e.g. zenodo MD is one subset of the metadata
json key.
~~
This element contains recipient-specific metadata for shipments. It is derived from core metadata and updated after manual edits. It can directly be used for shipment purposes.
{
"metadata":{
"shipping":{
"zenodo":{
},
"orcid":{
},
"codemeta":{
},
"datacite":{
}
}
}
}
HAL standardizes links between API responses, see http://stateless.co/hal_specification.html
ERC creation roughly has these steps:
This process must be communicated clearly to the user, especially in the metadata edit form. The buttons should effectively convey the messages "Finish ERC creation" or "Abort ERC creation", so that it is clear that not doing anything in the metadata review (e.g. closing the browser) will actually not create the ERC.
@7048730 Imho this clarification of the process means that we do not need brokering during the first processing (i.e. in the loader), but only need to do it during metadata update, because the metadata update will always be done.
Also, step 2 might take quite a long time and therefore the upload must support asynchronous communication, which is crucial to integrate this into larger architectures. See also http://farazdagi.com/blog/2014/rest-long-running-jobs/ [outcome of discussion with publisher architect]
Here we have two approaches, both of which we should try out (see also this SO answer!
GET /api/v1/workspace
lists all uncompleted workspaces of the current user, or all (including completed ones) if the user has admin levelGET /api/v1/workspace/<id>
provides workspace information, most importantly status
status
processing
reviewable
> processing is completed, metadata is ready for reviewcompleted
> makes a redirect to ERC with HTTP status 303 See Other
eta
, the estimated time to finish. Default eta
is the average of all completed workspaces in the databasecancel
- a link where the workspace processing can be stopped (see below)lastUpdate
- that last update time = also completion timecompendium
- if (and only if) the compendium is createdDELETE /api/v1/workspace/<id>
endpoint (only admins and creating user)
deleted: true
in the compendium object. The API never returns deleted objects, so no need to document this. To retrieve them direct database access is needed.POST /api/v1/compendium
immediately returns with a response, status code HTTP 202
and a Location
header field point to the respective workspace, see also on Location headerPOST /api/v1/compendium?callback=http://callback.org/endpoint
also reply with HTTP 202
and the Location
header, but it would also register a callback which is called once the workspace processing is completed, see belowIf a callback endpoint is provided on creating a new compendium, e.g. POST /api/v1/compendium?callback=http://publisher.com/publication/100/appendix/1
, the endpoint is called with the following operation after the workspace processing is completed:
PUT publisher.com/publication/100/appendix/1
# content of GET /api/v1/workspace/<id>
{
"status": "reviewable",
"compendium": "https://o2r.uni-muenster.de/api/v1/compendium/1234",
"lastUpdate": "2017..."
}
Should we also notify the endpoint if the status changes from reviewable
to completed
?
It is currently unclear how we can also provide the notification via websockets... tbc
Right now, a user clicks "Run analysis" to execute a job for a compendium and there is a "
Ccurrently running analysis" and a "Last finished analysis", in our original poster (2016) we used "one-click reproduce", and the Web API has a resource /job
. One compendium can have multiple jobs, /compendium/<compendium id>/jobs
.
IMHO it would make our API and tools easier to understand if we use the word "reproduction" instead of "job" in the API, and align the UI with this wording. We should add a short note to the API docs and user interface that this is "computational" or "methods reproduction".
@MarkusKonk @edzer @chriskray @7048730 What do you think?
Could be a build cache for image, a download cache, ... let's disable them all with one flag
cache: false
https://swagger.io/specification/#security-scheme-object
We use an apiKey
that is stored as a cookie parameter.
It could even work to reference the OAuth2 endpoint or docs from ORCID?
@Fmazin Is there really no way to get the content under ## User authentication
into the "Authentication" headline right after "About" ?
See branch https://github.com/o2r-project/api/tree/timestamps (last two commits).
If everything is in the OpenAPI spec, then tell @nuest the branch timestamps
can be deleted.
With the new User authentication and authenticated sessions, a better user authentification has been implemented. This makes the current X-API-Key
header obsolete. Endpoints requiring authenticated sessions should reflect this, which would (as of now) only be the POST /api/v1/compendium
endpoint. Also, this needs further implementation in the o2r muncher service.
X-API-Key
from documentation/api/v1/compendium
in o2r muncher - see o2r-project/o2r-muncher#21/api/v1/compendium
The files page should mention that it is not a separate API function, but a description of a answer subset for view single job/compendium
Does it makes sense to use http://tools.ietf.org/html/rfc6570 for describing the URLs?
After logging in an author will be redirected to his landingpage. There, all of his own publications will be listed. Therefore an api endpoint is needed to list all publications of one author (including metadata for each publication).
api/v1/user
> list all orcidsapi/v1/user/<orcid>
> show orcid and name, if logged as admin also show levelPATCH
requestI am not sure Apache is a license that works for a specification, which essentially is a document, not software/code.
I'd suggest to switch to http://creativecommons.org/licenses/by/4.0/ but am open for opinions and links to useful resources on this matter.
(Note to self: OpenSearch uses a CC license: http://www.opensearch.org/Specifications/License)
api/v1/environment
returns
{
"architecture": [
"amd64"
],
"os": [
{
"name": "linux",
"version": "5.4.0-48-generic"
}
],
"container_runtimes": [
{
"name": "Docker Engine - Community",
"api_version": "1.40",
"version": "19.03.13"
}
],
"erc": {
"manifest": {
"capture_image": "o2rproject/containerit:geospatial-0.6.0.9003",
"base_image": "rocker/geospatial:3.6.2",
"memory": 2147483648
},
"execution": {
"memory": 4294967296
}
}
}
Implemented in o2r-project/o2r-muncher#123, but not yet documented here. Will catch up with that ater switchingto the Open API spec.
Right now we don't check the content types of requests, because the whole API is all JSON. Nevertheless, we should use the correct content types and return errors if wrong content types are used.
This could be part of a "general" section, because it applies to all APIs.
How does the RDA Collections WG relate to the "collections" our API provides?
https://github.com/RDACollectionsWG/specification
https://rdacollectionswg.github.io/apidocs/#/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.