Coder Social home page Coder Social logo

kleptes's Introduction

κλέπτης

Some world data mining goodness.

  • WHO - pretty good, not super complete, very easy to browse.
  • OECD - a complete nightmare. Takes minutes to figure out if a dataset has any data.
  • World Bank - lots of stuff, consistent structure, a bit tough to search (will improve on this).
  • Eurostat - datasets are complete but the format is horrible. The two functions provided help a lot.
  • Yahoo Finance - the usual.

Requirements

  • Python 3
  • pandas
  • Redis (to cache stuff; defaults to localhost, hardcoded)
  • requests
  • bs4 (used only in one function, don't worry)

It's on PyPI so you can simply pip3 install kleptes.

How do I use this?

It's meant to be used in a Jupyter notebook (or anything interactive, so a Jupyter notebook).

from kleptes import *   # also imports pandas as pd. you'll thank me

who_dims()
who_dims("*indic*")  # supports unix searching because who the fuck knows regexps?

who_dataset("GHO")                # wow results much stuff
who_dataset("GHO", "*suicide*")   # fewer

who_dataset("GHO", "MH_12").head()   # the thing you were looking for


# same-ish stuff for the world bank
wb_inds("*hdi*")        # get indicators like '*hdi*', find one ...
wb_dataset("UNDP.HDI.XD", countries=["mz", "za"])


# and again for eurostat
eus_inds("*empl*")                     # loads of stuff!
eus_inds("*part*time*empl*")           # same
eus_dataset("lfsq_eppga")              # and a dataset


# OECD is a bit more complicated, as the dimension is so complex it deserves
# an object of its own.
oecd_inds("*quarterly*")   # search among indicators (so much shit there)

oecd_dims("QNA")                      # get the dimension of said indicator
oecd_dims("QNA").subject("*gross*")   # figure out possible values of each field
oecd_dataset("qna",
             country=["italy", "fra", "germany", "norw*"],
             subject=["B1_GE"],
             measure="cqr",
             frequency="Q")  # as many kwargs as the dims.


# Yahoo finance is the usual
yf_search("Google")                 # search for a symbol ...
yf_get("GOOGL", days=400)           # get some data

Remember: everything should be searchable.

The exposed functions for now are:

  • WHO: who_dataset, who_dim and who_get (very low level).
  • OECD: oecd_dataset, oecd_dims, oecd_ind.
  • World Bank: wb_inds, wb_dataset and wb_get.
  • Eurostat: eus_inds and eus_dataset.
  • Yahoo Finance: yh_search and yh_get.

They serve different purposes so despite the name they don't do exactly the same thing (e.g. who_dim and oecd_dims do different stuff).

Everything is cached for a while so you won't be flooding the servers unless you want to.

kleptes's People

Contributors

myyc avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

kleptes's Issues

Caching

Hi, I've been playing with this for a few minutes and was wondering the following:

  1. why does redis need to be running ? (for local dev it might be good not to cache, if somehow a connection cannot be made)
  2. I was playing and tried to get some stats using a country name instead of the code (e.g. y = wb_dataset("NY.GDP.MKTP.CD", countries=["Brazil"])), I got the following error:
ValueError                                Traceback (most recent call last)
<ipython-input-7-e8212522ad5c> in <module>()
----> 1 y = wb_dataset("NY.GDP.MKTP.CD", countries=["Brazil"])

/Users/thiago/miniconda2/envs/factutils/lib/python3.5/site-packages/kleptes/wb.py in wb_dataset(ind, countries, d1, d2, frequency, expire, force, raw)
    110                                                          ind=ind,
    111                                                          params=params)
--> 112     ds = wb_get(k, force=force, expire=expire)
    113     return ds if raw else pd.DataFrame(ds)

/Users/thiago/miniconda2/envs/factutils/lib/python3.5/site-packages/kleptes/wb.py in wb_get(key, expire, force, raw)
     62         else:
     63             raise ValueError(
---> 64                 "Not sure how to handle the response: {}".format(j))
     65
     66         v = v or []

ValueError: Not sure how to handle the response: [{'message': [{'key': "Parameter 'country' has an invalid value", 'value': 'The provided parameter value is not valid', 'id': '120'}]}]

which is fair enough, but when I submit the query again using the country code I get another error

In [9]: y = wb_dataset("NY.GDP.MKTP.CD", countries=["br"])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-dca2b150df8b> in <module>()
----> 1 y = wb_dataset("NY.GDP.MKTP.CD", countries=["br"])

/Users/thiago/miniconda2/envs/factutils/lib/python3.5/site-packages/kleptes/wb.py in wb_dataset(ind, countries, d1, d2, frequency, expire, force, raw)
    110                                                          ind=ind,
    111                                                          params=params)
--> 112     ds = wb_get(k, force=force, expire=expire)
    113     return ds if raw else pd.DataFrame(ds)

/Users/thiago/miniconda2/envs/factutils/lib/python3.5/site-packages/kleptes/wb.py in wb_get(key, expire, force, raw)
     36             del r[rkey]
     37         else:
---> 38             return json.loads(r[key].decode("utf-8"))
     39
     40     l = []

/Users/thiago/miniconda2/envs/factutils/lib/python3.5/site-packages/redis/client.py in __getitem__(self, name)
    888         if value:
    889             return value
--> 890         raise KeyError(name)
    891
    892     def getbit(self, name, offset):

KeyError: 'countries/br/indicators/NY.GDP.MKTP.CD?date=1960:2016&frequency=Y'

maybe some error when caching the wrong response ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.