thomastu / pyeia Goto Github PK

An Energy Information Administration (EIA) API python client for researchers who just need data.

Python 100.00%

eia eia-api energy energy-data python

pyeia's Introduction

Hi there 👋

I write software to understand and deploy energy efficiency. Let's chat about the intersection of climate/energy and software!

pyeia's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger pascualy gschivley hortonshelpers pluflou

pyeia's Issues

Status Quo

Currently eia.browser is typically used in the following way:

During __init__, two Query objects are made for making category and series queries. A query is made to EIA's top level category that displays different available datasets.
The "current page" at a given time is defined by a category_id set by eia.browser.browse(). i.e. navigating the eia api is the same as navigating through categories its related categories.
The user continues following a category tree until they reach a branch or leaf where the current page contains a child series, and the user flags some series for export.
When eia.browser.parse is called, a seriesquery is invoked, and post_data is built FROM the eia.browser (not the query). The raw-response is parsed within parse stored as a pandas dataframe.

This is clunky for a lot of reasons, namely, it requires repeated calls to eia.browser.browse that can really be condensed into a single line. It also can make it difficult to

Proposed Changes:

There should instead be a way to give a full path name rather than requiring repeated calls to browse.
Since browse changes the nature of the class from point A to point B, it really should be done in one step regardless. This is a common operation for datasets such as aeo where you often want the same data from different scenarios.
There should be some interface for interacting with individual series data, i.e. the SeriesQuery should do more than just move data to a pandas dataframe.
When exporting or "parsing", there should be more options than a pandas dataframe. A good starter would be to export the raw response, and json.
An interface for interacting with the searchAPI see Issue #4
An interface for generating header information either from the browse-path from the top-level request, or from the series ID. See Issue #5

The v2 api will be deprecated in November 2022. This package should be updated (to the extent possible) to port existing interfaces to their API v2 equivalents. For more information see https://www.eia.gov/opendata/documentation.php

Bad URL handling

Each query class needs to check response to make sure that the response is an expected response. If we enter a bad response, e.g. http://api.eia.gov/category/?api_key=YOUR_API_KEY_HERE&category_id=badcategoryid&out=json, query should alert the user that this is a bad URL and not cache the result.

This will need to be done for each query.

Pseudo-code:

def get(self, identifier):
     url = self.make_url(identifier)
     r = self.make_get_request(url)
     if 'childcategories' not in r['category'].keys():
         print 'bad response!!!' # don't actually use a print statement
         handle_bad_response()
         return None

setup.py

should create a setup.py file when working version is up.

Query Tests

Base Query Tests

There should be tests that check anytime we call requests.get() or requests.post(), it returns a 200 response.
There should be a test to make sure that results are cached in self.history, and that a get request always checks QueryHistory before trying to make a request.
We should test that bad URLs are handled in the query class. It's possible that we get a 200 response even if the URL is bad. This means checking that the response matches the example responses given by the EIA. For example, if we make a query with the Category Query, we need to get back either 'childcategories' or 'childseries' in the json response.

Automatically generate header information

Open for suggestions

Add "history" to baseQuery class

Repeated queries should not make new requests. Responses should be saved either in instance variable, or to some file. This should be configurable.

One option is to have a dictionary with identifier-response key-values. If the user knows a lot of requests will be made such that there might be memory issues, then we should keep a list of only the identifiers, and save responses to a json-file, with the same key-value pairs.

That way, bulk downloads can also be easily parsed with a "history-only" mode or something.

Starter issue

Write a script that extracts residential electricity price trends from the reference scenario to a pandas dataframe.

Some steps to help you get started :

Register for an API key : http://www.eia.gov/beta/api/register.cfm
Get the api endpiont : http://www.eia.gov/beta/api/qb.cfm?category=1372924&sdid=AEO.2015.REF2015.PRCE_REAL_RES_NA_ELC_NA_NA_Y13DLRPMMBTU.A

Programatically, this is done by making a request to that endpoint, saving the json response to some variable, parsing that json response and pulling it into a dataframe.

The output should look similar to GSL LCC / modules / aeo_price_trends.py

Complete Refactoring from Current Code in EETD/EES repo

Need to clean-up and remove any internal references in doc-strings, etc in the current version of the API tool in the EES/EETD repository. This involves separating methods that make queries and build query-urls to their respective Query class. There are some docstrings that mention scott by name as well, should probably make those clean.

proposal for categoryBrowser.browse method

Should do something like the following:

def browse(self, *args, **kwargs):
    for arg in args:
        self.goto(arg)
    cb = kwargs.get('callback')
    cbargs = kwargs.get('callback_args')
    cbkwargs = kwargs.get('callback_kwargs')
    return cb(*cbargs, **cbkwargs)

That way, going to desired childcategory and flag/export can be done in one line.

Suppress debug output

Running the code automatically outputs a debug post to the command line, how can I suppress it?

Example: 2020-11-04 21:01:25.993 | DEBUG | eia.api.base:_post:63 - made POST to https://api.eia.gov/series/ with parameters {'api_key': 'SECRET_APIKEY', 'out': 'json'} and payload
{'series_id': 'ELEC.SALES.AK-IND.M'}

Looks like it is in the _get_data method when you use Post?

Thank you!

Metadata can't be accessed

Hiya! Thanks so much for writing this, very useful and makes doing bulk downloads much more seamless than otherwise. Unfortunately at least on my machine it takes some fiddling to get working. Context, I've installed the package into a venv, using python 3.10 and pip 21.3.1. Both of these are installed in the right places in my venv.

When I call from eia import api, this error occurs: importlib.metadata.PackageNotFoundError. It gets caught in eia/__init__.py: __version__ = metadata.version("eia"). Manually changing this script to reference "pyeia" instead makes it work smoothly, at least for me. Not sure if this issue occurs for other people but wanted to flag it up.

Thanks!

Query Class for Search API Endpoints

Relevant Documentation

Cmd Line interface for Search API

Need to add in some search methods that interact with the browser class. This involves deciding what fields to display from the search API, and then using the Series-Categories Query to get related categories.

"Hidden" Categories in EIA Category Traversal

Note: should flesh this out later, but here's a brief overview of this issue:

The EIA api apparently can have "hidden" child categories that are not returned by the childcategories API. Example: When traversing US ESOD > Day Ahead Demand Forecast, the childcategories api will actually return all nodes 2 links down, skipping the "Regions" and "Balancing Authorities". However at each of the grand child nodes, their parents are correctly listed. This messes up how the mass export browser traverses the category tree given by the API.

Query Class for Updates API Endpoints

Relevant documentation