Coder Social home page Coder Social logo

bulk-data's People

Contributors

brettmarquard avatar chris1uphealth avatar gotdan avatar grahamegrieve avatar healthedata1 avatar jmandel avatar kyle1uphealth avatar lynnlaakso avatar ricky1uphealth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bulk-data's Issues

Resource Deletions

Clients can use the _since parameter to retrieve only resources that have been updated after a point in time. However, there is currently no way for a client to determine if a previously retrieved resource needs to be deleted without re-pulling all available data and performing a comparison (this could be across many GB of data). Examples: delete requests that need to be propagated to downstream systems or data ascribed to the wrong patient that needs to be removed.

Bulk export should be invocable via HTTP POST

http://hl7.org/fhir/uv/bulkdata/STU1/export/index.html#request-flow shows that the $export operation is invocable via HTTP GET, but I don't think it says anything about POST.

Meanwhile, the base FHIR specification indicates that operations can be invoked via HTTP POST: https://www.hl7.org/fhir/operations.html#executing

Additionally, its says that

Operations may be invoked using a GET, with parameters as HTTP URL parameters, if:

  • there are only simple input parameters - i.e. no complex datatypes like 'Identifier' or 'Reference', and
  • and the operation does not affect the state of the server

Based on that, I think its OK to continue supporting $export via GET, but I'd like to see the bulkdata specification indicate that POST must be supported as well (and maybe even should be preferred given that GETs should be idempotent/repeatable).

There was some discussion on the bulk data argonaut call on this front and, although no one was a big fan of the FHIR Parameters object, this is the wrapper that FHIR specifies for passing parameters while invoking extended operations like $export via POST.

Technical corrections to bulk data conformance requirements

FHIR-24434
http://hl7.org/fhir/uv/bulkdata/operations/index.html: "To declare conformance with this IG, a server should include the following URL in its own CapabilityStatement.instantiates: http://www.hl7.org/fhir/bulk-data/CapabilityStatement-bulk-data.html" should say "http://hl7.org/fhir/uv/bulkdata/CapabilityStatement/bulk-data (see http://hl7.org/fhir/uv/bulkdata/CapabilityStatement-bulk-data.json.html for details).
Also for technical correction: https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=23864

Would be good also to add a parenthetical on DELETE like "after an export is complete, a server MAY use DELETE as a signal that a client is done retrieving files and that it is safe for the sever to remove those from storage."
Also for technical correct: https://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=24912

gforge #21957 - operation descriptions still don't reference async pattern

Update each operation to state as part of the description: "The FHIR server MUST support invocation of this operation using the FHIR Asynchronous Request Pattern.

Update each operation parameters list. Continue to include the _outputFormat parameter inline to provide a holistic view but condense the "Documentation " adding a reference to FHIR async.html. "The format for the requested bulk data files to be generated as per FHIR Asynchronous Request Pattern. Defaults to application/fhir+ndjson.

Add optional parameter to specify a cloud bucket as an output location?

Open Questions:

Auth: Do we want to only target servers that have pre-configured write permissions to a bucket, or do we need a way to pass in auth credentials to the server? If so, what would this look like?

Path: Does the we need additional information in addition to the bucket name, like file prefix (eg. to support a "folder" within the bucket that incorporates a timestamp), or service provider (to support use cases where the server is writing to a bucket provided by a different cloud vendor)?

Completion: Should we require that the output manifest file be written to the bucket last so it could be used as an event to trigger followup actions (eg. a de-id or db load) or would we expect clients to use job polling to determine all files have been written?

Provide a list of "tunable parameters" that servers can control

What are the places where a server developer might make a choice that clients need to be aware of? Can we list them in one spot? Examples would include:

  • Does this server restrict responses to a specific "profile" like US Core or Blue Button?
  • Does this server support _since for new members in a group?
  • What outputFormats does this server support?
  • Does the server support system-wide (or all-patients, or Group-level) export? [This is captured in CapabilityStatement today]

...

Let's build out this list and create a section to track it.

Group Membership Additions

The _since parameter currently limits returned resources to those whose state has changed after the supplied timestamp. In the case of a query like GET [fhir base]/Group/[id]/$export?_since=[timestamp], results will not include resources for patients added to the group after timestamp if their resource modification date is earlier than timestamp, necessitating a pull of all available data for a group on each data request (this could be many GB of data for a large group).

Last paragraph in Backend Services section 6.0 is repetitive

Suggest we delete the last sentence of this paragraph:

To begin the exchange, the client SHALL use the Transport Layer Security (TLS) Protocol Version 1.2 (RFC5246) or a more recent version of TLS to authenticate the identity of the FHIR authorization server and to establish an encrypted, integrity-protected link for securing all exchanges between the client and the authorization server’s token endpoint. All exchanges described herein between the client and the FHIR server SHALL be secured using TLS V1.2 or a more recent version of TLS .

Content under "Bulk Data Status Request" is still disorganized

http://build.fhir.org/ig/HL7/bulk-data/export/index.html#bulk-data-status-request has a table which now repeats the information below it with some variations.

Issues

File Attachments

Use case:

Handling of binary resources is not directly addressed in the IG leading to a lack of clarity on how servers and clients should handle NDJSON files with resources that contain attachments (eg. DocumentReferences)

Proposed language:

If resources in an output file contain elements of the type Attachment, servers SHALL populate the Attachment.contentType code as well as either the data element or the url element. The url element SHALL be either an absolute url that can be de-referenced to the attachment's content or a relative url that identifies a Binary FHIR Resource included in the output of the export [*see Q1 below].

When the url element is populated with an absolute URL and the requiresAccessToken field in the Complete Status body is set to true, the url location must be accessible by a client with a valid access token, and SHALL NOT require the use of additional authentication credentials. When the url element is populated and the requiresAccessToken field in the Complete Status body is set to false, the url location must be accessible by a client without an access token.

Note that if a server copies files to the bulk data output endpoint or proxies requests to facilitate access from this endpoint, it may need to modify the Attachment.url element when generating the FHIR bulk data output files.

Open Questions:

  1. Do we want to allow servers to represent attachments as relative references to Binary resources included in the export, or should we limit attachments to inline data and de-referenceable absolute URLs?
  2. Server developers: Is the proposed language reasonably implementable?
  3. Client developers: Does the proposed language address your needs with regards to file retrieval?

Provide guidance on which Provenance resources to return

Potential behaviors for a patient level or group level bulk export request:

  • All Provenance resources associated with any resource in the patient compartment. Parallels behavior of other resources.
  • Most recent Provenance resource associated with any resource in the patient compartment.
  • All Provenance resources associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned. This seems like it would align with the many common bulk data export use cases.
  • Most recent Provenance resource associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned.

Potential behaviors for a system level bulk export request:

  • All Provenance resources associated with any resource in the system. Parallels behavior of other resources. This seems like it would align with the most common full system export use cases.
  • Most recent Provenance resource associated with any resource in the system.
  • All Provenance resources associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned. John Moehrke pointed out on Zulip that this would not work for an export of legal medical records audit for safety and records retention compliance.
  • Most recent Provenance resource associated with any resource being returned. If _type is used to restrict the response set, a subset of Provenance resources would be returned. If _type is set to only Provenance, then no resources would be returned.

Open questions:

  • Should patient level queries have a different default than system level queries?
  • Should we define this behavior by data set (eg. servers returning USCDI would have one standard behavior and servers returning Blue Button could have another)?
  • Do we need an additional request parameter to indicate which of the above behaviors is desired or are the use cases common enough to have a single approach? In USCDI REST queries, _revinclude syntax is used for this.

Broken links

I'll keep adding broken or problematic links to this issue as I read through. FYI @gotdan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.