Coder Social home page Coder Social logo

bulk-data-server's People

Contributors

bhinebaugh avatar dependabot[bot] avatar gotdan avatar jmandel avatar vlad-ignatov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bulk-data-server's Issues

Token with system/.rs scope gets permission denied for downloading files

Hi, thanks for this super helpful resource.

We were getting permission denied trying to download data from the authenticated server, and tracked it down to this line https://github.com/smart-on-fhir/bulk-data-server/blob/master/lib.js#L375

Our token only had system/*.rs, which should (?) be a superset of system/*.read. When I changed our token to include the system/*.read scope, the downloads worked.

Easy to work around but wanted to file an issue for other people who might run into this.

I think this person in Zulip may have encountered the same issue https://chat.fhir.org/#narrow/stream/179170-smart/topic/SMART.20App.20Launcher.20HTTP.20401/near/305210084

If the initial request uses auth... all subsequent requests should, too

Right now the following works, and shouldn't:

  1. Kick off an export using an authorization header
  2. Fetch status and data without an authorization header

I had a bug in my client code where I forgot to include an authz header in my data fetch requests, and they worked anyway -- which made it hard to discover my bug :-)

Implement streaming client for the imported files

  • Download stream computing the download progress
  • Estimate remaining time
  • Batch download tasks
  • Transform byte streams to JSON FHIR resources
  • Validate NDJSON
  • Count the number of resources
  • Error handling
  • Implement a task queue to control how many tasks are executed in parallel

Build error on sqlite3

I am trying to setup Bulk data Server and I got build errors for sqlite3 when running 'npm i' command. Here is one of the errors

gyp ERR! stack Error: gyp failed with exit code: 1
gyp ERR! stack at ChildProcess.onCpExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:345:16)
gyp ERR! stack at ChildProcess.emit (events.js:200:13)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)
gyp ERR! System Darwin 19.3.0
gyp ERR! command "/usr/local/Cellar/node/12.5.0/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "configure" "--fallback-to-build" "--module=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v72-darwin-x64/node_sqlite3.node" "--module_name=node_sqlite3" "--module_path=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v72-darwin-x64"
gyp ERR! cwd /Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3
gyp ERR! node -v v12.5.0
gyp ERR! node-gyp -v v3.8.0
gyp ERR! not ok

Any idea how I could fix that?

URL overloading/conflict

Patient/$everything is already an established operation.

How about an alternative of /$bulkdata (a system-level operation) or if you want the operation "compartmentalized" within established FHIR compartments: Group/[id]/$bulkdata

metadata endpoint not FHIR conformant

During the recent CMS Connectathon Bulk Data Track an issue was found with the metadata endpoint to retrieve the CapabilityStatement.

The issue is that the metadata endpoint does not accept the HTTP request Accept Header mime type with the ";charset=UTF-8" parameter. The server sends back a 400 Bad Request with a response payload message of "Only the JSON format is supported".

The metadata request with an HTTP request Accept Header without the ";charset=UTF-8" parameter works fine with either the FHIR JSON "application/fhir+json" or simple JSON "application/json" mime-type. Also, sending the HTTP request Accept-Charset header with a "utf-8" value works fine.

A related observation is that the HTTP response Content-Type mime type returned is the simple JSON with the charset parameter "application/json; charset=utf-8" when the expected value should be at a minimum the FHIR JSON mime-type.

Request Builder

Create UI component that can be used to build a list of files (URL) that should be imported

Implement the "patient" parameter

Not applicable to system level export requests. When provided, the server SHALL NOT return resources in the patient compartments belonging to patients outside of this list. If a client requests patients who are not present on the server (or in the case of a group level export, who are not members of the group), the server SHOULD return details via an OperationOutcome resource in an error response to the request.

Servers unable to support patient SHOULD return an error and OperationOutcome resource so clients can re-submit a request omitting the patient parameter.

Fetching large payloads goes slowly; node pegs the CPU

curl -o /dev/null http://localhost:9443/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMDAwLCJkdXIiOjAsInRsdCI6MTUsIm0iOjEwMDAwLCJyZXF1ZXN0U3RhcnQiOjE1MTcxNTIxMTY3MTEsIm9mZnNldCI6NDAwMDAwMDAsImxpbWl0IjoxMDAwMDAwfQ/fhir/bulkfiles/41.Observation.ndjson

Shows that speed goes ~80-200 kb/s on my link machine and on @gotdan's Mac. Node is pegged to 100% CPU. This is surprising! There may be an opportunity to load more data into memory and issue fewer sqlite calls :-)

Rewrite DocumentReference URLs to the export base url, not the default base URL

Attachment URLs currently point at the configured "base URL" of the FHIR server:

// Rewrite urls in DocumentReference resources. Only url props
// that begin with `/files/` will be converted to absolute HTTP
// URLs to allow the client to directly download bigger files
if (row.resource_json.resourceType == "DocumentReference") {
const url = getPath(row.resource_json, "content.0.attachment.url");
if (url && url.search(/\/attachments\/.*/) === 0) {
row.resource_json.content[0].attachment.url = buildUrlPath(
baseUrl,
base64url.encode(JSON.stringify({
err : sim.err || "",
secure: !!sim.secure
})),
"fhir",
url
);
}
}
}

However, in practice the real Base URL for this server can change based on the parameters one inputs into https://bulk-data.smarthealthit.org/

This means that the attachment URLs are pointing at a different FHIR Base URL than the FHIR server that you initiate export from. For example, this is the URL generated from the website:
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMDAsImR1ciI6MTAsInRsdCI6MTUsIm0iOjEsInN0dSI6MywiZGVsIjowfQ/fhir
and this is the URL that the attachments point at:
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJzZWN1cmUiOnRydWV9/fhir/attachments/DICOM.jpg

Because these looks to a naive piece of code like different FHIR servers, it's generally not safe to send your SMART authentication token to this server. Many FHIR servers host their images on external image servers like S3, and you wouldn't want to give external servers your token, so our code specifically does not send an authentication token to attachment URLs that aren't hosted on the same FHIR server.

Would it be possible to dynamically rewrite this URL to point to the same URL as the one the export came from, rather than the default configured one?

Support repeated kick-off parameters

According to the spec:

A client MAY repeat kick-off parameters that accept comma delimited values multiple times in a kick-off request. The server SHALL treat the values provided as if they were comma delimited values within a single instance of the parameter.

Bulk Export Delete request returns wrong status code

See the discussion on zulip https://chat.fhir.org/#narrow/stream/179250-bulk-data/topic/Bulk.20Data.20delete.20request

According to Bulk Data 2.0, the DELETE request (2.5.4):

"Following the delete request, when subsequent requests are made to the polling location, the server SHALL return a 404 Not Found error and an associated FHIR OperationOutcome in JSON format."

That means such sequence shall be supported:

Export Request - Response 202
Delete Request - Resposne 202
Status Polling Request - Response 404

During our testing, we found that SmartHealthIT reference server returns 400 for the sequence, and it returns 404 when the subsequenet request is another DELETE request.

Bulk Data Delete Requests to delete exported files

After a bulk data request has been started, a client MAY send a DELETE request to the URL provided in the Content-Location header to cancel the request. If the request has been completed, a server MAY use the request as a signal that a client is done retrieving files and that it is safe for the sever to remove those from storage. Following the delete request, when subsequent requests are made to the polling location, the server SHALL return a 404 error and an associated FHIR OperationOutcome in JSON format.

JWKS URL not working?

Trying authentication with a JWKS url: https://demo.careevolution.com/CNjwtcareevolutioncom.json fails with 401 TypeError: Cannot read property 'keys' of undefined

client ID:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6InJlZ2lzdHJhdGlvbi10b2tlbiJ9.eyJqd2tzX3VybCI6Imh0dHBzOi8vZGVtby5jYXJlZXZvbHV0aW9uLmNvbS9DTmp3dGNhcmVldm9sdXRpb25jb20uanNvbiIsImlzcyI6Imh0dHBzOi8vY2FyZWV2b2x1dGlvbi5jb20iLCJhY2Nlc3NUb2tlbnNFeHBpcmVJbiI6MTUsImlhdCI6MTUzNzE4NTY1OX0.670yVvoTgDMQd3TahrICfYYrVCLd-FQKo2PLQmUs_3Q

Create a progress bar that shows real import progress

  • The component should probably be hidden when no import is running
  • Show as 0% when the import begins
  • Use CSS transitions for animation
  • Use the X-Progress response header for the current status
  • Use the Retry-After header to decide when to make the next request

Missing referenced Encounter in the data

For example:

{"resourceType":"Condition","id":"o1-a02d1fb0-da2f-477b-87eb-08f83853cdbc","clinicalStatus":"resolved","verificationStatus":"confirmed","code":{"coding":[{"system":"http://snomed.info/sct","code":"65363002","display":"Otitis media"}],"text":"Otitis media"},"subject":{"reference":"Patient/163950fe-3224-4607-80e7-0ff35ab44b21"},"context":{"reference":"Encounter/0e787dd7-e5f5-497a-a702-09a7945fc92a"},"onsetDateTime":"2013-09-10T09:10:16+00:00","abatementDateTime":"2014-09-27T09:10:16+00:00","assertedDate":"2013-09-10T09:10:16+00:00"}

references Encounter/0e787dd7-e5f5-497a-a702-09a7945fc92a but there is no encounter with such an ID in the Encounter file. There appear to be a bunch of Condition and Immunization with this problem

Support for `_elements` kick-off parameter

  • Optionality for Server: optional, experimental
  • Optionality for Client: optional
  • Type: string of comma-delimited FHIR Elements

Description

When provided, the server SHOULD omit unlisted, non-mandatory elements from the resources returned. Elements should be of the form [resource type].[element name] (eg. Patient.id) or [element name] (eg. id) and only root elements in a resource are permitted. If the resource type is omitted, the element should be returned for all resources in the response where it is applicable..

Servers are not obliged to return just the requested elements. Servers SHOULD always return mandatory elements whether they are requested or not. Servers SHOULD mark the resources with the tag SUBSETTED to ensure that the incomplete resource is not actually used to overwrite a complete resource.

Servers unable to support _elements SHOULD return an error and OperationOutcome resource so clients can re-submit a request omitting the _elements parameter.

Problem with JWKS url

I am running Inferno test on SMART bulk data server (https://bulk-data.smarthealthit.org) using JWKS url
I got 400 with error

{
  "error": "invalid_client",
  "error_description": "Requesting the remote JWKS returned an error.\nError: unable to verify the first certificate"
}

If I use ClientID registered with manually entered JWK keys (the same key used by JWKS url), the test passed.

Wrong 'resource not supported' error message

The request:

GET https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MX0/fhir/Patient/$export?_type=Practitioner%2CLocation%2CPatient%2CEncounter%2CAllergyIntolerance%2CCondition%2CProcedure%2CImmunization%2CObservation%2CProcedureRequest%2CDiagnosticReport%2CCoverage%2CMedication%2CMedicationRequest%2CMedicationDispense%2CMedicationAdministration%2CExplanationOfBenefit%2CRelatedPerson%2CClaim

fails with:

The requested resource type "true" is not available on this server

I assume that the problem is that one of the resources we are requesting is not supported, but the message does not specify which one

Build error on sqlite3

Related to #15

I downgrade node to 8.17 and still have the same build error

gyp ERR! stack Error: gyp failed with exit code: 1
gyp ERR! stack at ChildProcess.onCpExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:351:16)
gyp ERR! stack at emitTwo (events.js:126:13)
gyp ERR! stack at ChildProcess.emit (events.js:214:7)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:198:12)
gyp ERR! System Darwin 19.4.0
gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "configure" "--fallback-to-build" "--module=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v57-darwin-x64/node_sqlite3.node" "--module_name=node_sqlite3" "--module_path=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v57-darwin-x64"
gyp ERR! cwd /Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3
gyp ERR! node -v v8.17.0
gyp ERR! node-gyp -v v5.0.5
gyp ERR! not ok

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.