hl7 / bulk-data Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 19.0 1.95 MB

Bulk Data Implementation Guide

License: Other

HTML 8.35% Batchfile 16.56% Shell 9.71% GLSL 65.38%

bulk-data's People

Contributors

Stargazers

Watchers

Forkers

1uphealth karenf25 qualitymeasurement keeyanghoreshi rahulmeshram gotdan lynnlaakso tbellj jocotech howardedidin manzhaohui cknaap lmsurpre gingkolane brettmarquard healthedata1 umathivagit michaeldonnelly

bulk-data's Issues

GF#21041 Consistent references

Resolution from https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=21041&start=0 has not been applied

Non-consecutive section numbers in backend services section 10

GF#21798 We need to include an index of pages at the bottom of the main page

Below http://build.fhir.org/ig/HL7/bulk-data/#resources should be a bullet list of the 5 pages repeated from the titlebar, since some users did not find the titlebar

gforge #21923 - error status code only references 5XX, and should include 4XX also

Banner for the IG reads "This is the Version 0.1.0 Release" - should it be 1.0 now?

Where does this come from - is it a config somewhere? Will it fix itself when deployed to hl7.org?

Add file splitting behavior to server capability documentation section

Update descriptions in OperationDefinitions to match text of export operation page

OperationDefiniton "official URL"s appear incorrect (has /us as part of path)

http://build.fhir.org/ig/HL7/bulk-data/OperationDefinition-export.html

The official URL for this operation definition is:
http://hl7.org/fhir/us/bulkdata/OperationDefinition/export

GF#21935 We need a table of parameters for response

Resolution from https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=21935 has not been implemented.

Request url in completion manifest is less useful for POST requests

For POST requests, the url returned as the "request" key in the completion manifest does not contain enough information to retry the request. Need to decide if this is a problem in practice and if so, how to remediate.

Build fail due to publish box

Exception in thread "main" java.lang.Error: The auto-build infrastructure does not publish IGs that contain HTML pages without the publish-box present. For further information, see note at http://wiki.hl7.org/index.php?title=FHIR_Implementation_Guide_Publishing_Requirements#HL7_HTML_Standards_considerations

EC384 should say ES384

In two spots we have a typo where EC384 should say ES384.

Change "ValueSet" to "ValueSet Resources" in index, section 1.3

Resource Deletions

Clients can use the _since parameter to retrieve only resources that have been updated after a point in time. However, there is currently no way for a client to determine if a previously retrieved resource needs to be deleted without re-pulling all available data and performing a comparison (this could be across many GB of data). Examples: delete requests that need to be propagated to downstream systems or data ascribed to the wrong patient that needs to be removed.

gforge #21682 requested addition of public health use case to overview page, but was added backend services page

Suggest we leave the backend services page as is, but add the following to section 1 on overview page too:
"Public health surveillance systems that do not require real-time exchange of data."

Update OperationDefinitions to match new language for _type etc

On authorization page, should the public health use case be merged with the other examples?

GF#21847

Add https://tools.ietf.org/html/rfc7240 to the underlying standards section on export.html

Bulk export should be invocable via HTTP POST

http://hl7.org/fhir/uv/bulkdata/STU1/export/index.html#request-flow shows that the $export operation is invocable via HTTP GET, but I don't think it says anything about POST.

Meanwhile, the base FHIR specification indicates that operations can be invoked via HTTP POST: https://www.hl7.org/fhir/operations.html#executing

Additionally, its says that

Operations may be invoked using a GET, with parameters as HTTP URL parameters, if:

there are only simple input parameters - i.e. no complex datatypes like 'Identifier' or 'Reference', and

and the operation does not affect the state of the server

Based on that, I think its OK to continue supporting $export via GET, but I'd like to see the bulkdata specification indicate that POST must be supported as well (and maybe even should be preferred given that GETs should be idempotent/repeatable).

There was some discussion on the bulk data argonaut call on this front and, although no one was a big fan of the FHIR Parameters object, this is the wrapper that FHIR specifies for passing parameters while invoking extended operations like $export via POST.

On Export page, Patient compartment text uses SHALL in one place and SHOULD in another

Think both shall be SHOULD?

For non-system-level requests, the Patient Compartment SHOULD be used as a point of reference for recommended resources to be returned.

Technical corrections to bulk data conformance requirements

FHIR-24434
http://hl7.org/fhir/uv/bulkdata/operations/index.html: "To declare conformance with this IG, a server should include the following URL in its own CapabilityStatement.instantiates: http://www.hl7.org/fhir/bulk-data/CapabilityStatement-bulk-data.html" should say "http://hl7.org/fhir/uv/bulkdata/CapabilityStatement/bulk-data (see http://hl7.org/fhir/uv/bulkdata/CapabilityStatement-bulk-data.json.html for details).
Also for technical correction: https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=23864

Would be good also to add a parenthetical on DELETE like "after an export is complete, a server MAY use DELETE as a signal that a client is done retrieving files and that it is safe for the sever to remove those from storage."
Also for technical correct: https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=24912

Move _typeFilter into parameters table

gforge #21957 - operation descriptions still don't reference async pattern

Update each operation to state as part of the description: "The FHIR server MUST support invocation of this operation using the FHIR Asynchronous Request Pattern.

Update each operation parameters list. Continue to include the _outputFormat parameter inline to provide a holistic view but condense the "Documentation " adding a reference to FHIR async.html. "The format for the requested bulk data files to be generated as per FHIR Asynchronous Request Pattern. Defaults to application/fhir+ndjson.

Header lists "BulkDataAccess IG: STU1 Ballot #1"

I don't know the HL7 mechanics here - should we change this to ballot #2?

Add optional parameter to specify a cloud bucket as an output location?

Open Questions:

Auth: Do we want to only target servers that have pre-configured write permissions to a bucket, or do we need a way to pass in auth credentials to the server? If so, what would this look like?

Path: Does the we need additional information in addition to the bucket name, like file prefix (eg. to support a "folder" within the bucket that incorporates a timestamp), or service provider (to support use cases where the server is writing to a bucket provided by a different cloud vendor)?

Completion: Should we require that the output manifest file be written to the bucket last so it could be used as an event to trigger followup actions (eg. a de-id or db load) or would we expect clients to use job polling to determine all files have been written?

Provide a list of "tunable parameters" that servers can control

What are the places where a server developer might make a choice that clients need to be aware of? Can we list them in one spot? Examples would include:

Does this server restrict responses to a specific "profile" like US Core or Blue Button?
Does this server support _since for new members in a group?
What outputFormats does this server support?
Does the server support system-wide (or all-patients, or Group-level) export? [This is captured in CapabilityStatement today]

...

Let's build out this list and create a section to track it.

Group Membership Additions

The _since parameter currently limits returned resources to those whose state has changed after the supplied timestamp. In the case of a query like GET [fhir base]/Group/[id]/$export?_since=[timestamp], results will not include resources for patients added to the group after timestamp if their resource modification date is earlier than timestamp, necessitating a pull of all available data for a group on each data request (this could be many GB of data for a large group).

Last paragraph in Backend Services section 6.0 is repetitive

Suggest we delete the last sentence of this paragraph:

To begin the exchange, the client SHALL use the Transport Layer Security (TLS) Protocol Version 1.2 (RFC5246) or a more recent version of TLS to authenticate the identity of the FHIR authorization server and to establish an encrypted, integrity-protected link for securing all exchanges between the client and the authorization server’s token endpoint. All exchanges described herein between the client and the FHIR server SHALL be secured using TLS V1.2 or a more recent version of TLS .

Content under "Bulk Data Status Request" is still disorganized

http://build.fhir.org/ig/HL7/bulk-data/export/index.html#bulk-data-status-request has a table which now repeats the information below it with some variations.

Issues

The 5xx error has an example in the table and an example below in http://build.fhir.org/ig/HL7/bulk-data/export/index.html#response---error-status-1 but they're different. Is there a reason to include both, and is there a reason they're different? The example below perhaps is supposed to be an example associated with the File Request (http://build.fhir.org/ig/HL7/bulk-data/export/index.html#file-request) section instead, since its error string mentions a deleted file
Intro note says "Servers SHOULD supply a Retry-After header with a http date or a delay time in seconds" but this doesn't appear in our examples table or in http://build.fhir.org/ig/HL7/bulk-data/export/index.html#response---in-progress-status (it should probably appear in both places)
"The choice of when to determine that an export job has failed in its entirety (error status) vs returning a partial success (complete status) is left up to the implementer." belongs in the "Note: even if some..." paragraph
Also #21 applies here to http://build.fhir.org/ig/HL7/bulk-data/export/index.html#response---complete-status

File Attachments

Use case:

Handling of binary resources is not directly addressed in the IG leading to a lack of clarity on how servers and clients should handle NDJSON files with resources that contain attachments (eg. DocumentReferences)

Proposed language:

If resources in an output file contain elements of the type Attachment, servers SHALL populate the Attachment.contentType code as well as either the data element or the url element. The url element SHALL be either an absolute url that can be de-referenced to the attachment's content or a relative url that identifies a Binary FHIR Resource included in the output of the export [*see Q1 below].

When the url element is populated with an absolute URL and the requiresAccessToken field in the Complete Status body is set to true, the url location must be accessible by a client with a valid access token, and SHALL NOT require the use of additional authentication credentials. When the url element is populated and the requiresAccessToken field in the Complete Status body is set to false, the url location must be accessible by a client without an access token.

Note that if a server copies files to the bulk data output endpoint or proxies requests to facilitate access from this endpoint, it may need to modify the Attachment.url element when generating the FHIR bulk data output files.

Open Questions:

Do we want to allow servers to represent attachments as relative references to Binary resources included in the export, or should we limit attachments to inline data and de-referenceable absolute URLs?
Server developers: Is the proposed language reasonably implementable?
Client developers: Does the proposed language address your needs with regards to file retrieval?

update_IgnoreWarningfile

change warning comment to

# update and merged PR publish stu2 bulk data #315 10/21/2021

Provide guidance on which Provenance resources to return

Potential behaviors for a patient level or group level bulk export request:

All Provenance resources associated with any resource in the patient compartment. Parallels behavior of other resources.
Most recent Provenance resource associated with any resource in the patient compartment.
All Provenance resources associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned. This seems like it would align with the many common bulk data export use cases.
Most recent Provenance resource associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned.

Potential behaviors for a system level bulk export request:

All Provenance resources associated with any resource in the system. Parallels behavior of other resources. This seems like it would align with the most common full system export use cases.
Most recent Provenance resource associated with any resource in the system.
All Provenance resources associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned. John Moehrke pointed out on Zulip that this would not work for an export of legal medical records audit for safety and records retention compliance.
Most recent Provenance resource associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned.

Open questions:

Should patient level queries have a different default than system level queries?
Should we define this behavior by data set (eg. servers returning USCDI would have one standard behavior and servers returning Blue Button could have another)?
Do we need an additional request parameter to indicate which of the above behaviors is desired or are the use cases common enough to have a single approach? In USCDI REST queries, _revinclude syntax is used for this.