Coder Social home page Coder Social logo

pyeia's Introduction

Hi there ๐Ÿ‘‹

I write software to understand and deploy energy efficiency. Let's chat about the intersection of climate/energy and software!

pyeia's People

Contributors

dependabot[bot] avatar thomastu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pyeia's Issues

Browser Refactoring Proposals

Status Quo

Currently eia.browser is typically used in the following way:

  1. During __init__, two Query objects are made for making category and series queries. A query is made to EIA's top level category that displays different available datasets.
  2. The "current page" at a given time is defined by a category_id set by eia.browser.browse(). i.e. navigating the eia api is the same as navigating through categories its related categories.
  3. The user continues following a category tree until they reach a branch or leaf where the current page contains a child series, and the user flags some series for export.
  4. When eia.browser.parse is called, a seriesquery is invoked, and post_data is built FROM the eia.browser (not the query). The raw-response is parsed within parse stored as a pandas dataframe.

This is clunky for a lot of reasons, namely, it requires repeated calls to eia.browser.browse that can really be condensed into a single line. It also can make it difficult to

Proposed Changes:

  1. There should instead be a way to give a full path name rather than requiring repeated calls to browse.
    Since browse changes the nature of the class from point A to point B, it really should be done in one step regardless. This is a common operation for datasets such as aeo where you often want the same data from different scenarios.
  2. There should be some interface for interacting with individual series data, i.e. the SeriesQuery should do more than just move data to a pandas dataframe.
  3. When exporting or "parsing", there should be more options than a pandas dataframe. A good starter would be to export the raw response, and json.
  4. An interface for interacting with the searchAPI see Issue #4
  5. An interface for generating header information either from the browse-path from the top-level request, or from the series ID. See Issue #5

Bad URL handling

Each query class needs to check response to make sure that the response is an expected response. If we enter a bad response, e.g. http://api.eia.gov/category/?api_key=YOUR_API_KEY_HERE&category_id=badcategoryid&out=json, query should alert the user that this is a bad URL and not cache the result.

This will need to be done for each query.

Pseudo-code:

def get(self, identifier):
     url = self.make_url(identifier)
     r = self.make_get_request(url)
     if 'childcategories' not in r['category'].keys():
         print 'bad response!!!' # don't actually use a print statement
         handle_bad_response()
         return None

setup.py

should create a setup.py file when working version is up.

Query Tests

Base Query Tests

  • There should be tests that check anytime we call requests.get() or requests.post(), it returns a 200 response.
  • There should be a test to make sure that results are cached in self.history, and that a get request always checks QueryHistory before trying to make a request.
  • We should test that bad URLs are handled in the query class. It's possible that we get a 200 response even if the URL is bad. This means checking that the response matches the example responses given by the EIA. For example, if we make a query with the Category Query, we need to get back either 'childcategories' or 'childseries' in the json response.

Add "history" to baseQuery class

Repeated queries should not make new requests. Responses should be saved either in instance variable, or to some file. This should be configurable.

One option is to have a dictionary with identifier-response key-values. If the user knows a lot of requests will be made such that there might be memory issues, then we should keep a list of only the identifiers, and save responses to a json-file, with the same key-value pairs.

That way, bulk downloads can also be easily parsed with a "history-only" mode or something.

Starter issue

Write a script that extracts residential electricity price trends from the reference scenario to a pandas dataframe.

Some steps to help you get started :

  1. Register for an API key : http://www.eia.gov/beta/api/register.cfm
  2. Get the api endpiont : http://www.eia.gov/beta/api/qb.cfm?category=1372924&sdid=AEO.2015.REF2015.PRCE_REAL_RES_NA_ELC_NA_NA_Y13DLRPMMBTU.A

Programatically, this is done by making a request to that endpoint, saving the json response to some variable, parsing that json response and pulling it into a dataframe.

The output should look similar to GSL LCC / modules / aeo_price_trends.py

Complete Refactoring from Current Code in EETD/EES repo

Need to clean-up and remove any internal references in doc-strings, etc in the current version of the API tool in the EES/EETD repository. This involves separating methods that make queries and build query-urls to their respective Query class. There are some docstrings that mention scott by name as well, should probably make those clean.

proposal for categoryBrowser.browse method

Should do something like the following:

def browse(self, *args, **kwargs):
    for arg in args:
        self.goto(arg)
    cb = kwargs.get('callback')
    cbargs = kwargs.get('callback_args')
    cbkwargs = kwargs.get('callback_kwargs')
    return cb(*cbargs, **cbkwargs)

That way, going to desired childcategory and flag/export can be done in one line.

Suppress debug output

Running the code automatically outputs a debug post to the command line, how can I suppress it?

Example: 2020-11-04 21:01:25.993 | DEBUG | eia.api.base:_post:63 - made POST to https://api.eia.gov/series/ with parameters {'api_key': 'SECRET_APIKEY', 'out': 'json'} and payload
{'series_id': 'ELEC.SALES.AK-IND.M'}

Looks like it is in the _get_data method when you use Post?

Thank you!

Metadata can't be accessed

Hiya! Thanks so much for writing this, very useful and makes doing bulk downloads much more seamless than otherwise. Unfortunately at least on my machine it takes some fiddling to get working. Context, I've installed the package into a venv, using python 3.10 and pip 21.3.1. Both of these are installed in the right places in my venv.

When I call from eia import api, this error occurs: importlib.metadata.PackageNotFoundError. It gets caught in eia/__init__.py: __version__ = metadata.version("eia"). Manually changing this script to reference "pyeia" instead makes it work smoothly, at least for me. Not sure if this issue occurs for other people but wanted to flag it up.

Thanks!

Cmd Line interface for Search API

Need to add in some search methods that interact with the browser class. This involves deciding what fields to display from the search API, and then using the Series-Categories Query to get related categories.

"Hidden" Categories in EIA Category Traversal

Note: should flesh this out later, but here's a brief overview of this issue:

The EIA api apparently can have "hidden" child categories that are not returned by the childcategories API. Example: When traversing US ESOD > Day Ahead Demand Forecast, the childcategories api will actually return all nodes 2 links down, skipping the "Regions" and "Balancing Authorities". However at each of the grand child nodes, their parents are correctly listed. This messes up how the mass export browser traverses the category tree given by the API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.