The bulk-data-server from smart-on-fhir

Token with system/.rs scope gets permission denied for downloading files

Hi, thanks for this super helpful resource.

We were getting permission denied trying to download data from the authenticated server, and tracked it down to this line https://github.com/smart-on-fhir/bulk-data-server/blob/master/lib.js#L375

Our token only had system/*.rs, which should (?) be a superset of system/*.read. When I changed our token to include the system/*.read scope, the downloads worked.

Easy to work around but wanted to file an issue for other people who might run into this.

I think this person in Zulip may have encountered the same issue https://chat.fhir.org/#narrow/stream/179170-smart/topic/SMART.20App.20Launcher.20HTTP.20401/near/305210084

If the initial request uses auth... all subsequent requests should, too

Right now the following works, and shouldn't:

Kick off an export using an authorization header
Fetch status and data without an authorization header

I had a bug in my client code where I forgot to include an authz header in my data fetch requests, and they worked anyway -- which made it hard to discover my bug :-)

Implement streaming client for the imported files

Download stream computing the download progress
Estimate remaining time
Batch download tasks
Transform byte streams to JSON FHIR resources
Validate NDJSON
Count the number of resources
Error handling
Implement a task queue to control how many tasks are executed in parallel

Support for POST based kick-off requests with Parameters resource

Build error on sqlite3

I am trying to setup Bulk data Server and I got build errors for sqlite3 when running 'npm i' command. Here is one of the errors

gyp ERR! stack Error: gyp failed with exit code: 1
gyp ERR! stack at ChildProcess.onCpExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:345:16)
gyp ERR! stack at ChildProcess.emit (events.js:200:13)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)
gyp ERR! System Darwin 19.3.0
gyp ERR! command "/usr/local/Cellar/node/12.5.0/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "configure" "--fallback-to-build" "--module=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v72-darwin-x64/node_sqlite3.node" "--module_name=node_sqlite3" "--module_path=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v72-darwin-x64"
gyp ERR! cwd /Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3
gyp ERR! node -v v12.5.0
gyp ERR! node-gyp -v v3.8.0
gyp ERR! not ok

Any idea how I could fix that?

Server requires kty in client_assertion header, but this isn't required in the IG

The server appears to be requiring a kty header:

bulk-data-server/token_handler.js

Line 200 in 0923507

// Filter the potential keys to retain only those where the kty and

However, this doesn't appear to be required in the IG:
https://hl7.org/fhir/uv/bulkdata/authorization/index.html#protocol-details and

Implement system-level export (`/$export`)

URL overloading/conflict

Patient/$everything is already an established operation.

How about an alternative of /$bulkdata (a system-level operation) or if you want the operation "compartmentalized" within established FHIR compartments: Group/[id]/$bulkdata

metadata endpoint not FHIR conformant

During the recent CMS Connectathon Bulk Data Track an issue was found with the metadata endpoint to retrieve the CapabilityStatement.

The issue is that the metadata endpoint does not accept the HTTP request Accept Header mime type with the ";charset=UTF-8" parameter. The server sends back a 400 Bad Request with a response payload message of "Only the JSON format is supported".

The metadata request with an HTTP request Accept Header without the ";charset=UTF-8" parameter works fine with either the FHIR JSON "application/fhir+json" or simple JSON "application/json" mime-type. Also, sending the HTTP request Accept-Charset header with a "utf-8" value works fine.

A related observation is that the HTTP response Content-Type mime type returned is the simple JSON with the charset parameter "application/json; charset=utf-8" when the expected value should be at a minimum the FHIR JSON mime-type.

Request Builder

Create UI component that can be used to build a list of files (URL) that should be imported

Add UI checkbox to "require authentication"

Default to false, 401 error on requests when true

Implement the "patient" parameter

Not applicable to system level export requests. When provided, the server SHALL NOT return resources in the patient compartments belonging to patients outside of this list. If a client requests patients who are not present on the server (or in the case of a group level export, who are not members of the group), the server SHOULD return details via an OperationOutcome resource in an error response to the request.

Servers unable to support patient SHOULD return an error and OperationOutcome resource so clients can re-submit a request omitting the patient parameter.

Wrong authentication error responses

The authentication error responses are 401 with a plain text body containing the error message, but according to the OAuth2 specs they should be a 400 with a JSON body containing the error description

Fetching large payloads goes slowly; node pegs the CPU

curl -o /dev/null http://localhost:9443/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMDAwLCJkdXIiOjAsInRsdCI6MTUsIm0iOjEwMDAwLCJyZXF1ZXN0U3RhcnQiOjE1MTcxNTIxMTY3MTEsIm9mZnNldCI6NDAwMDAwMDAsImxpbWl0IjoxMDAwMDAwfQ/fhir/bulkfiles/41.Observation.ndjson

Shows that speed goes ~80-200 kb/s on my link machine and on @gotdan's Mac. Node is pegged to 100% CPU. This is surprising! There may be an opportunity to load more data into memory and issue fewer sqlite calls :-)

Group bundle not valid

Resource fields appear in root of entry item rather than nested in resource object

Update bulk data server to latest version of the auth spec

Support specification at https://github.com/smart-on-fhir/fhir-bulk-data-docs/blob/master/authorization.md . In particular the section "Registering a SMART Backend Service (communicating public keys)" has been added, and the section "Server Obligations for Signature Verification" has been expanded.

Add CORS support to token endpoint

Necessary to support browser based examples

Don't use relative URLs in DocumentReference attachments

We should only use absolute URLs or inline base-64 data

Implement the latest spec

Follow https://github.com/smart-on-fhir/fhir-bulk-data-docs/blob/master/README.md

Rewrite DocumentReference URLs to the export base url, not the default base URL

Attachment URLs currently point at the configured "base URL" of the FHIR server:

bulk-data-server/transforms/dbRowTranslator.ts

Lines 101 to 119 in 0cea993

    
               // Rewrite urls in DocumentReference resources. Only url props 
        
               // that begin with `/files/` will be converted to absolute HTTP 
        
               // URLs to allow the client to directly download bigger files 
        
               if (row.resource_json.resourceType == "DocumentReference") { 
        
                   const url = getPath(row.resource_json, "content.0.attachment.url"); 
        
                   if (url && url.search(/\/attachments\/.*/) === 0) { 
        
                       row.resource_json.content[0].attachment.url = buildUrlPath( 
        
                           baseUrl, 
        
                           base64url.encode(JSON.stringify({ 
        
                               err   : sim.err || "", 
        
                               secure: !!sim.secure 
        
                           })), 
        
                           "fhir", 
        
                           url 
        
                       ); 
        
                   } 
        
               } 
        
           }

However, in practice the real Base URL for this server can change based on the parameters one inputs into https://bulk-data.smarthealthit.org/

This means that the attachment URLs are pointing at a different FHIR Base URL than the FHIR server that you initiate export from. For example, this is the URL generated from the website:
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMDAsImR1ciI6MTAsInRsdCI6MTUsIm0iOjEsInN0dSI6MywiZGVsIjowfQ/fhir
and this is the URL that the attachments point at:
https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJzZWN1cmUiOnRydWV9/fhir/attachments/DICOM.jpg

Because these looks to a naive piece of code like different FHIR servers, it's generally not safe to send your SMART authentication token to this server. Many FHIR servers host their images on external image servers like S3, and you wouldn't want to give external servers your token, so our code specifically does not send an authentication token to attachment URLs that aren't hosted on the same FHIR server.

Would it be possible to dynamically rewrite this URL to point to the same URL as the one the export came from, rather than the default configured one?

Is the readme outdated for required Node version?

Readme states that node works on 7.9, 8 and 9. However, I see there was a commit stating an update to the package.json requiring node >= 20+.

Fix the conformance statement

The type of CapabilityStatement.rest[0].operation[n].definition changed from Reference(OperationDefinition) to Canonical(OperationDefinition) from STU3 to R4.
Patient and group export definitions need to go at resource level. The current CapabilityStatement in the IG is incorrect. The fixed version is at http://build.fhir.org/ig/HL7/bulk-data/branches/tech-correct/

Create an endpoint for dynamic OperationOutcome resources

This would be an endpoint (say /outcome) that relies on few query parameters and replies with an OperationOutcome as JSON.

For reference see:

Support repeated kick-off parameters

According to the spec:

A client MAY repeat kick-off parameters that accept comma delimited values multiple times in a kick-off request. The server SHALL treat the values provided as if they were comma delimited values within a single instance of the parameter.

Bulk Export Delete request returns wrong status code

See the discussion on zulip https://chat.fhir.org/#narrow/stream/179250-bulk-data/topic/Bulk.20Data.20delete.20request

According to Bulk Data 2.0, the DELETE request (2.5.4):

"Following the delete request, when subsequent requests are made to the polling location, the server SHALL return a 404 Not Found error and an associated FHIR OperationOutcome in JSON format."

That means such sequence shall be supported:

Export Request - Response 202
Delete Request - Resposne 202
Status Polling Request - Response 404

During our testing, we found that SmartHealthIT reference server returns 400 for the sequence, and it returns 404 when the subsequenet request is another DELETE request.

Bulk Data Delete Requests to delete exported files

After a bulk data request has been started, a client MAY send a DELETE request to the URL provided in the Content-Location header to cancel the request. If the request has been completed, a server MAY use the request as a signal that a client is done retrieving files and that it is safe for the sever to remove those from storage. Following the delete request, when subsequent requests are made to the polling location, the server SHALL return a 404 error and an associated FHIR OperationOutcome in JSON format.

Return error if XML version of conformance statement is requested

Add support for import from S3 buckets

Add a few ndjson files to S3 Bucket so that we can try that

JWKS URL not working?

Trying authentication with a JWKS url: https://demo.careevolution.com/CNjwtcareevolutioncom.json fails with 401 TypeError: Cannot read property 'keys' of undefined

client ID:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6InJlZ2lzdHJhdGlvbi10b2tlbiJ9.eyJqd2tzX3VybCI6Imh0dHBzOi8vZGVtby5jYXJlZXZvbHV0aW9uLmNvbS9DTmp3dGNhcmVldm9sdXRpb25jb20uanNvbiIsImlzcyI6Imh0dHBzOi8vY2FyZWV2b2x1dGlvbi5jb20iLCJhY2Nlc3NUb2tlbnNFeHBpcmVJbiI6MTUsImlhdCI6MTUzNzE4NTY1OX0.670yVvoTgDMQd3TahrICfYYrVCLd-FQKo2PLQmUs_3Q

Create a component to generate CURL command from the UI

For now, this should be a function that collects inputs from UI, builds the command text and renders it into pre or readonly textarea. Later, if we support multiple targets it may have to be a tab-bar of some kind.

Create a progress bar that shows real import progress

The component should probably be hidden when no import is running
Show as 0% when the import begins
Use CSS transitions for animation
Use the X-Progress response header for the current status
Use the Retry-After header to decide when to make the next request

Missing referenced Encounter in the data

For example:

{"resourceType":"Condition","id":"o1-a02d1fb0-da2f-477b-87eb-08f83853cdbc","clinicalStatus":"resolved","verificationStatus":"confirmed","code":{"coding":[{"system":"http://snomed.info/sct","code":"65363002","display":"Otitis media"}],"text":"Otitis media"},"subject":{"reference":"Patient/163950fe-3224-4607-80e7-0ff35ab44b21"},"context":{"reference":"Encounter/0e787dd7-e5f5-497a-a702-09a7945fc92a"},"onsetDateTime":"2013-09-10T09:10:16+00:00","abatementDateTime":"2014-09-27T09:10:16+00:00","assertedDate":"2013-09-10T09:10:16+00:00"}

references Encounter/0e787dd7-e5f5-497a-a702-09a7945fc92a but there is no encounter with such an ID in the Encounter file. There appear to be a bunch of Condition and Immunization with this problem

Allow (in the UI) public keys that are not base64 encoded

Support for `_elements` kick-off parameter

Optionality for Server: optional, experimental
Optionality for Client: optional
Type: string of comma-delimited FHIR Elements

Description

When provided, the server SHOULD omit unlisted, non-mandatory elements from the resources returned. Elements should be of the form [resource type].[element name] (eg. Patient.id) or [element name] (eg. id) and only root elements in a resource are permitted. If the resource type is omitted, the element should be returned for all resources in the response where it is applicable..

Servers are not obliged to return just the requested elements. Servers SHOULD always return mandatory elements whether they are requested or not. Servers SHOULD mark the resources with the tag SUBSETTED to ensure that the incomplete resource is not actually used to overwrite a complete resource.

Servers unable to support _elements SHOULD return an error and OperationOutcome resource so clients can re-submit a request omitting the _elements parameter.

Problem with JWKS url

I am running Inferno test on SMART bulk data server (https://bulk-data.smarthealthit.org) using JWKS url
I got 400 with error

{
  "error": "invalid_client",
  "error_description": "Requesting the remote JWKS returned an error.\nError: unable to verify the first certificate"
}

If I use ClientID registered with manually entered JWK keys (the same key used by JWKS url), the test passed.

Wrong references in the FHIR data

The references uris are now urn:uuid:<guid>, instead they should be <type>/<guid>.

See smart-on-fhir/fhir-bulk-data-docs#69

Wrong 'resource not supported' error message

The request:

GET https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MX0/fhir/Patient/$export?_type=Practitioner%2CLocation%2CPatient%2CEncounter%2CAllergyIntolerance%2CCondition%2CProcedure%2CImmunization%2CObservation%2CProcedureRequest%2CDiagnosticReport%2CCoverage%2CMedication%2CMedicationRequest%2CMedicationDispense%2CMedicationAdministration%2CExplanationOfBenefit%2CRelatedPerson%2CClaim

fails with:

The requested resource type "true" is not available on this server

I assume that the problem is that one of the resources we are requesting is not supported, but the message does not specify which one

Build error on sqlite3

Related to #15

I downgrade node to 8.17 and still have the same build error

gyp ERR! stack Error: gyp failed with exit code: 1
gyp ERR! stack at ChildProcess.onCpExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:351:16)
gyp ERR! stack at emitTwo (events.js:126:13)
gyp ERR! stack at ChildProcess.emit (events.js:214:7)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:198:12)
gyp ERR! System Darwin 19.4.0
gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "configure" "--fallback-to-build" "--module=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v57-darwin-x64/node_sqlite3.node" "--module_name=node_sqlite3" "--module_path=/Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3/lib/binding/node-v57-darwin-x64"
gyp ERR! cwd /Users/yunweiw/Documents/GitHub/bulk-data-server/node_modules/sqlite3
gyp ERR! node -v v8.17.0
gyp ERR! node-gyp -v v5.0.5
gyp ERR! not ok

	// Rewrite urls in DocumentReference resources. Only url props
	// that begin with `/files/` will be converted to absolute HTTP
	// URLs to allow the client to directly download bigger files
	if (row.resource_json.resourceType == "DocumentReference") {
	const url = getPath(row.resource_json, "content.0.attachment.url");
	if (url && url.search(/\/attachments\/.*/) === 0) {
	row.resource_json.content[0].attachment.url = buildUrlPath(
	baseUrl,
	base64url.encode(JSON.stringify({
	err : sim.err \|\| "",
	secure: !!sim.secure
	})),
	"fhir",
	url
	);
	}
	}
	}

smart-on-fhir / bulk-data-server Goto Github PK

bulk-data-server's People

Contributors

Stargazers

Watchers

Forkers

bulk-data-server's Issues

Description

Recommend Projects

Recommend Topics

Recommend Org