Coder Social home page Coder Social logo

fda / openfda Goto Github PK

View Code? Open in Web Editor NEW
569.0 63.0 131.0 84.12 MB

openFDA is a research project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.

Home Page: https://open.fda.gov

License: Creative Commons Zero v1.0 Universal

JavaScript 6.78% Shell 1.67% Python 74.07% HTML 17.35% Dockerfile 0.12%

openfda's Introduction

openFDA

openFDA is a research project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.

Please note: Do not rely on openFDA to make decisions regarding medical care. Always speak to your health provider about the risks and benefits of FDA-regulated products. We may limit or otherwise restrict your access to the API in line with our Terms of Service.

Contents

This repository contains the code which powers all of the api.fda.gov end points:

  • Python pipelines written with Luigi for processing public FDA data sets (drugs, foods, medical devices, and other) into a JSON format that can be loaded into Elasticsearch.

  • Elasticsearch schemas for the available data sets.

  • A Node.js API Server written with Express, Elasticsearch.js and Elastic.js that communicates with Elasticsearch and provides the api.fda.gov JSON interface (documented in detail at https://open.fda.gov).

Prerequisites

  • Elasticsearch 7
  • Python 3.6 or above
  • Node 14 or above

Packaging

Run bootstrap.sh to download and set up a virtualenv for the openfda python package and to download and set up the openfda-api node package.

Running in Docker

If you intend to try running openFDA yourself, we have put together a docker-compose.yml configuration that can help you get started. docker-compose up will:

  1. Start an Elasticsearch container
  2. Start an API container, which will expose port 8000 for queries.
  3. Start a Python 3 container that will run the NSDE, CAERS, Substance Data, Device Clearance, Device PMA and Device Event pipelines and create corresponding indices in Elasticsearch.

Note: even though the API container starts right away, it will not serve any data until some or all of the pipelines above have finished running. You can curl http://localhost:8000/status to see which endpoints have become available as the pipelines progress or after they have completed running. Once an endpoint becomes available, it can be queried using the standard openFDA query syntax. For example: curl -g 'http://localhost:8000/food/event.json?search=products.industry_name:"Soft+Drink/Water"+AND+reactions.exact:DEHYDRATION&limit=10'

At this point the Python container only runs the NSDE, CAERS, Substance Data, Device Clearance, Device PMA, and Device Event pipelines because most of those are relatively lightweight (except Device Event) and require no access to internal FDA networks. We will add more pipelines in case there is substantial interest from the community. However, the pipelines above provide a good starting point into understanding openFDA internals and/or customizing openFDA.

Linux Users

vm.max_map_count needs to be increased as follows before Elasticsearch can start successfully:

sudo sysctl -w vm.max_map_count=262144
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf

Windows Users

Clone the repository with git clone https://github.com/FDA/openfda.git --config core.autocrlf=input in order to circumvent Docker issues with building images on a Windows computer.

Running unit tests

docker-compose --profile test up test will run Python unit tests.

openfda's People

Contributors

cerdman avatar dependabot[bot] avatar dkrylovsb avatar hansnelsen avatar mattmo avatar merges avatar phiat avatar rjpower avatar violetcrestedwren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openfda's Issues

Reports missing patient.drug.openfda object

I believe we have come across several thousand reports that do not have an object at patient.drug.openfda which is unexpected.

For example, the following query counts the medicinalproduct valuies where there is no patient.drug.openfda object.

 https://api.fda.gov/drug/event.json?search=_missing_:patient.drug.openfda&count=patient.drug.medicinalproduct.exact

Two example specific reports are as follows:

  • https://api.fda.gov/drug/event.json?search=safetyreportid:4597138-5
  • https://api.fda.gov/drug/event.json?search=safetyreportid:8275257-X

It seems like something is happening during the annotation process that maps medicianlproducts to SPLs.

Thanks for addressing this when you get a chance.

Question: Dealing with arrays and primary keys

I am trying to use your API to pull down drug recall data for analysis. I'm hoping someone can help me with a few questions. Since this isn't related to a singular issue or feature request, I'll put all my questions in this one Issue:

1 - The results of API calls and a review of Enforcement Reports on the FDA website suggest there is a one-to-many relationship between the group of fields in each result outside of the openfda field and the contents of openfda. However, outside of event_id (which wasn't in use prior to mid-2012), I don't see a reliable primary key for the non-openfda fields and would need to create my own key based on a composite of all these fields. Do you have another recommendation?

2 - Same question for the openfda fields: Is there a reliable primary key? I see this would be more problematic, since this data is pulled from, I believe, four different sources. I have submitted an issue requesting that you add Recall Number to the API. I'm not sure whether this would be a reliable key.

3 - Finally, your schema says all openfda fields are arrays, but are any of these fields reliably limited to no more than 1 value? For each array, I'll need to identify/create a primary key, then build a relationship between that key and each value of each array. That won't be necessary in cases of one-to-one relationships.

Only gets 1 unii / active ingredient from a drug label with multiple actives

https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory+drug"&limit=1

The generic name is "ASPIRIN AND DIPYRIDAMOLE", which is for 2 uniis:
64ALC7F90C and R16CO5Y76E but only one of the UNIIs is returned. However, both preferred terms are returned:

{
    "unii": [
        "R16CO5Y76E"
    ],
    "substance_name": [
        "DIPYRIDAMOLE",
        "ASPIRIN"
    ]
}

Please add the unii for all preferred terms, as there will always be a 1-to-1 in SPL.

Using drug ingredients in different order or without space between words gets different results

I am researching hormonal birth control and finding different results depending on how the order of the drug is listed. Also, if a space is used between Ethinyl Estradiol or not.
Ethinyl Estradiol+Norelgestromin = 83,341 records
EthinylEstradiol+Norelgestromin =15,106 records
EE+Norelgestromin = 14,260 records

Ethinyl Estradiol+Etonogestrel = 90,482 records
EthinylEstradiol+Etonogestrel = 15,972 records
EE+etonogestrel = 15,117 records
EE+eton = 217 records

Other results that were interesting: NuvaRing is the only product that I know of (although I might be wrong) that uses Ethinyl Estradio AND Etonogestrel and it was interesting to see the various results
Etonogestrel+Ethinyl+Estradiol+nuvaring = 90,495 records
Nuvaring = 7,741 records
Nuva = 15 records

It looks like some records are recorded differently, such as Ethinyl Estradiol, or EthinylEstradiol or EE so they all counted separately, so the total count is not correct. Other issues as to how the drug name is entered appear to be affecting the data as well.

run_annotation_table_pipeline.sh error

I keep getting an error running the scripts.
....

INFO: [pid 19277] Running   ExtractZipRXNorm()
1401847956.439143 worker.py:243 [_run_task] [pid 19277] Running   ExtractZipRXNorm()
ERROR: [pid 19277] Error while running ExtractZipRXNorm()
Traceback (most recent call last):
  File "/home/ubuntu/openfda/_python-env/local/lib/python2.7/site-packages/luigi-1.0.16-py2.7.egg/luigi/worker.py", line 254, in _run_task
    raise RuntimeError('Unfulfilled dependency %r at run time!\nPrevious tasks: %r' % (missing_dep.task_id, self._previous_tasks))
RuntimeError: Unfulfilled dependency 'DownloadRXNorm()' at run time!
Previous tasks: ['DownloadRXNorm()']

...

I've tracked the DownloadRXNorm() to a missing annotation_table/set_data.sh ... Any suggestions?

What's namespace should we use for drug classes

I am Yifan Ning, a programmer in DBMI, University of Pittsburgh. I am working on create mappings in active moiety RDF graph.

Woud you please provides me advice about namespace for drug classes? Such as Mechanism of action (MoA), Physiologic effect(PE) and Established pharmacologic class (EPC). I find documentation for drug classes here "https://open.fda.gov/api/reference/#spl" but there are no namespaces provides.

Appreciate for your any help.

Best wishes,
Yifan

count queries not returning full list of results

Originally from here:

https://opendata.stackexchange.com/questions/2163/how-to-get-total-count-of-adverse-effects-events-by-manufacturer-in-openfda/

I was under the impression that count queries would return the full result set, and thus limit and skip wouldn't be relevant for them. It appears that this is not the case, e.g.:

https://api.fda.gov/drug/event.json?count=patient.drug.openfda.manufacturer_name.exact

Only returns the top 100 entries by default, and we can only get up to the top 1000 via the limit param. Since skip is disabled for count queries, we can't access the full result set.

We should either have count queries return the full result always, or enable skip.

Request: Preserve formatting from SPL XML

The SPL XML files are neatly formatted, but when that data is parsed by openfda, it loses its formatting and therefore usability. It would be nice if the xml paragraphs, for instance in the indications_and_usage section, could be converted to \n newlines.

Drug label - drugs missing

I’m interested in pulling FDA labeling information for various oncology specialty pharmacy products. I’ve noticed that a small but significant number of products do not have labeling information available (they yield an error: NOT FOUND, “No matches found!”) . I’ve attached a sample of drugs (brand name & generic) for which I was unable to pull the labeling. I was just hoping you could provide some insight on what’s going wrong, whether it’s an error in my query or the information just isn’t there yet. For all of the drugs in the list, I’ve tried querying with both the brand name, ex:
https://api.fda.gov/drug/label.json?search=brand_name:Zarxio
and the generic name, ex:
https://api.fda.gov/drug/label.json?search=generic_name:Pegfilgrastim

image

Enhancement request: Recall number

I've been playing with your API to pull down drug enforcement reports, and I noticed that the API does not return the Recall Number field. Any chance of getting this added?

keep it up!

The continued expansion of what's available through this api is important. Please keep it up.

How to get a review page link or review files for a certain drug using API?

I wanna get the review link for a certain drug links via API, e.g, for the drug "AUBAGIO", it's

"http://www.accessdata.fda.gov/scripts/cder/drugsatfda/index.cfm?fuseaction=Search.Set_Current_Drug&ApplNo=202992&DrugName=AUBAGIO&ActiveIngred=TERIFLUNOMIDE&SponsorApplicant=SANOFI%20AVENTIS%20US&ProductMktStatus=1&goto=Search.Label_ApprovalHistory"

And that can be access by search "Abagio", then within the drug page, click on the "Approval History, Letters, Reviews, and Related Documents".

That link refers to the review link I need. So is there any possible way to obtain that using API or programmatically? If there is no way to do that, how I can obtain the review pdf files or file link for each certain drugs? Coz I'm going to fetch the review pdf files for each drugs approved in a certain year. I am thinking that first I can locate a unique link for each drug, and then to get the review files, which are just on the page, is not hard. Or if there is a way that can directly locate the review files would be better.

Thank you!

Server error returned on invalid date search

curl -XGET 'https://api.fda.gov/food/enforcement.json?limit=1&search=report_date:[2000-10-01+TO+2013-09-31]'
{
  "error": {
    "code": "SERVER_ERROR",
    "message": "Check your request and try again"
  }
}

Probably better to return a more helpful response.

Don't return 404 on request with no results

404 is a confusing response here. I've hit the right endpoint and my parameters are valid. It's OK that there are no results. I suggest that you send along the regular response with an empty result set rather than sending a status code that indicates I made a mistake.

https://api.fda.gov/food/enforcement.json?search=report_date:[2015-06-01+TO+2016-03-31]&limit=0
{
  "error": {
    "code": "NOT_FOUND",
    "message": "No matches found!"
  }
}

483 database

Coming from a quality perspective, it would be great to have a data set added to open FDA with 483s written after inspections. Break it down by type of business (medical device, pharma, etc...) where the 483 was identified, maybe even which kind of product were being inspected (class 1,2,3). Would also be great to see the name of the auditor who wrote the 483.

Add substance information to openFDA annotations on drug queries

The SPL substance indexing dataset contains chemical structure and chemical identifier (InChI) data for more than 40,000 drug product ingredients. These data could be served as annotations alongside other drug identifiers in the openFDA sections of drug API results.

The substance indexing files are keyed on the substance UNII, and each chemical is represented by its own structured product labeling document (XML) with its own unique set ID and document ID.

Labels API, improve formatting of result.dosage_and_administration

In the XML files there's formatting that states pretty clearly which part of the text that is headers and what relates to dosage/administration. In the JSON files, all this formatting has been removed to form texts like ['Directions Apply liberally and evenly 15 minutes before sun exposure. Reapply at least every 2 years Use a water resistant sunscreen if swimming or sweating. Children under 6 months of age: Ask a doctor.']
or ['Dosage and Administration: Take up about 1.2g - 1.5g of it and smooth over the entire face.'].

With access to the xml files you can see that the first example could pretty easily be split up into
['Apply liberally and evenly 15 minutes before sun exposure. Reapply at least every 2 years.', 'Use a water resistant sunscreen if swimming or sweating.', 'Children under 6 months of age: Ask a doctor.']

And it would be magnitudes easier to do some kind of language processing on this data.

BLD: Dockerfiles for testing

Where is the “product problem” field in the device AE endpoint?

Please see full question by @GeekNurse and I on SE at https://opendata.stackexchange.com/questions/3543/where-is-the-product-problem-field-in-the-openfda-api

In the MAUDE search tool on FDA.gov there is a standard list of problems in a product problem field. This is not available in the API.

For example:

compare http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfMAUDE/detail.cfm?mdrfoi__id=1693964 where there is a "Device Problem" of "Absorption" to the same record from the openFDA API: https://api.fda.gov/device/event.json?search=BM2HCPM0

ENH: Adverse Event Count / 'Use' Count Heatmap

  • A matrix with rows listing treatments, and columns listing adverse event types (# of events)
    • Shade each cell w/ a log-scale
    • All treatments would require high resolution (but might be cool)
    • A select number of treatment alternatives could be immensely useful
  • Where can we find Prescription Counts (or OTC Sales Numbers)?
    • [edit]
      • denominators for relative risk statistic comparisons
        • manufacturing count
        • sales count
        • usage count

I feel that such charts could represent the primary interests of Public Health in the United States.

NotFilter (boolean 'not' queries)

Can we add a way to boolean (not) a query? elastic.js uses a NotFilter.
Maybe use '!' for boolean NOT. Is there a way to currently do this? (ex: return all events where patient.drug.openfda.manufacturer_name != 'ABC company').

Limit=0 returns one result

Sometimes I just want to get a simple count of search results. Maybe there's a different way to do this but I see no reason why limit=0 should not be respected.

curl -XGET 'https://api.fda.gov/food/enforcement.json?search=report_date:[2000-06-01+TO+2014-03-31]&limit=0'
{
  "meta": {
    "disclaimer": "openFDA is a beta research project and not for clinical use. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated.",
    "license": "http://open.fda.gov/license",
    "last_updated": "2015-05-31",
    "results": {
      "skip": 0,
      "limit": 1,
      "total": 4874
    }
  },
  "results": [
    {
      "recall_number": "F-0283-2013",
      "reason_for_recall": "During an FDA inspection, microbiological swabs were collected and the results found that 21 sub samples in zones 1, 2 & 3 are positive for Listeria Monocytogenes (L.M.), Listeria innocua (L.I.) or Listeria seeligeri (L.S.).  The firm is voluntarily recalling all products manufactured from August 20th to September 10th 2012 due to the possible contamination.  All products with sell by dates on or before 11-OCT. No illnesses have been reported.",
      "status": "Ongoing",
      "distribution_pattern": "MI and OH only.",
      "product_quantity": "520",
      "recall_initiation_date": "20120910",
      "state": "MI",
      "event_id": "63159",
      "product_type": "Food",
      "product_description": "#011 Zucchini Stir,Fry      0.75 pounds",
      "country": "US",
      "city": "Grand Rapids",
      "recalling_firm": "Spartan Central Kitchen",
      "report_date": "20121024",
      "@epoch": 1424553174.836488,
      "voluntary_mandated": "Voluntary: Firm Initiated",
      "classification": "Class II",
      "code_info": "All with sell by dates on or before 15-Sep with UPC 0-11213-90380",
      "@id": "00028a950de0ef32fc01dc3963e6fdae7073912c0083faf0a1d1bcdf7a03c44c",
      "openfda": {},
      "initial_firm_notification": "E-Mail"
    }
  ]
}

Add capacity to search by drug approval status

The drug product labeling API returns mostly approved—and some unapproved—drug products. For example, the drug therafeldamine which DailyMed indicates is an unapproved drug product.

https://api.fda.gov/drug/label.json?search=THERAFELDAMINE
http://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=403af9f8-63bf-41a2-bf3b-20e5b92c389a

DailyMed presumably uses these structured data from the labeling document XML:

<subjectOf>
  <approval>
    <code code="C73627" codeSystem="2.16.840.1.113883.3.26.1.1" displayName="unapproved drug other"/>
    <author>
      <territorialAuthority>
        <territory>
          <code code="USA" codeSystem="2.16.840.1.113883.5.28"/>
        </territory>
      </territorialAuthority>
    </author>
  </approval>
</subjectOf>

Currently, there is no way to identify this drug as approved or unapproved using the openFDA API.

Add a sort by parameter

As a user, I want to query for the 10 most recently initiated food recalls without any other parameters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.