Coder Social home page Coder Social logo

awesome-data's Introduction

Awesome collections on DataHub

The awesome section presents collections of high quality datasets organized by topic.

Home page for awesome collections is located in the frontend repo and should be modified from there. See the live page here:

Collections

awesome-data's People

Contributors

acckiygerman avatar anuveyatsu avatar branko-dj avatar lauragift21 avatar liyubov avatar loleg avatar mikanebu avatar olayway avatar popovayoana avatar rufuspollock avatar svetozarstojkovic avatar tanvirchahal avatar todrobbins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-data's Issues

[super] Inflation / Price Index / PPP / Deflator data

Granularity

  • Spatial:
    • "World" i.e. per country (probably annual time series)
    • Per country: probably want a sub-selection and greater granularity
  • Temporal
    • Year, month, day (?)
    • Span: historical is special. For uniform values probably more recent.

Initial shortlist of datasets to include

  • ISO 2 digit country codes
  • ISO 3 digit country codes
  • Currency Codes
  • Country & World GDP
  • Country and World Populations
  • Language codes
  • Exchange Rates
  • File formats / Mimetypes
  • Unicode codes
  • COFOG - classifications of functions of government
  • CIPFA-budget-headings
  • LGA Metric Types (??)
  • Administrative boundaries e.g. EU NUTS
  • Inflation / Deflators
  • PPP (purchasing power parity)

Original in this google spreadsheet

Please add new suggestions as a new issue in this issue tracker.

Crunchbase (?)

Does this merit inclusion?

Getting the data

http://www.crunchbase.com/ - stats as of Aug 2013

  • ~175k companies, 193k people

Where to get bulk ...

License

cc-by according to http://info.crunchbase.com/docs/licensing-policy/ with a bunch of specific attribution requirements

Oil prices

US EIA has a variety of prices: https://www.eia.gov/dnav/pet/pet_pri_spt_s1_d.htm (US EIA is great as high quality and public domain as fed gov)

There's various types of oil for which we could get prices:

I propose we store:

  • Brent crude
  • WTI

For granularity I'd say it is worth storing all of daily, weekly, monthly and annual but prioritise daily. (note naming conventions: http://data.okfn.org/doc/publish-faq#data-package-name)

Question: Do this as one data package or one data package per oil type? (And if one data package do we store brent and WTI same file or separate files? Ans: yes, separate files).

All in one:

  • Convenient to prepare as data all from same source so scraper easy to run (that said we already have natural gas prices separate ...)

Separate:

  • One data package for one dataset approach.
  • Data package is small and lightweight

My instinct here is in all in one, so data package will look like:

data/wti-day.csv
data/wti-year.csv
data/wti-month.csv
# etc

S&P 500

  • The index value and associated info (as per shiller). Good for this to be historical.
  • Constituents

List of all public health insurances in Germany with contact details

There is no open repository of contact details for health insurances in Germany apart from one PDF listing URLs. Assisted by web scraping we have compiled a complete list with email, address and telephone number. This should be helpful for healthcare system researchers trying to access policies or data from all insurances. There are 137 of them! Does this belong in the registry?

Country Boundaries (vector)

This would be country polygons at crudest scale (e.g. 1:110m). Suggest packaging natural earth data (pd etc).

package name: geo-boundaries-world-110m

Long-term: best way would be to get primary natural earth folks to add in "packaging" - they are already on github - see https://github.com/nvkelso/natural-earth-vector. But we need an exemplar ...

What format should we use?

/cc @jalbertbowden @amercader - thoughts here very welcome :-)

Data

Euribor

http://www.euribor-rates.eu/euribor-rates-by-year.asp

We probably don't need all 15 rates they used to have and which they are now reducing:

Until November 1st 2013 Euribor-EBF published 15 Euribor rates (1-3 weeks en 1-12 months) daily (working days only). As of November 1st 2013 the number of Euribor rates is reduced to 8 (1-2 weeks, 1, 2, 3, 6, 9 and 12 months). This adjustment is a consequence of the problems which arose last couple of years when determining the Euribor rates. An EBA/ESMA report which was published January 2013 recommends to calculate and publish only those Euribor rates which are used by banks on a frequent basis. The rationale being that is easier to calculate a reliable rate if there are many transactions for a specific rate (maturity).

I suggest we record the following rates at monthly intervals (which is what you get from historical data)

  • 1-week
  • 1-month
  • 3-month
  • 1-year

Though may turn out getting all 8 is same effort so may as well.

CO2 "Price" (Emission trading permits)

Where can we get CO2 price and emission trading scheme info? Which regions run emissions trading schemes?

EU data

http://www.eea.europa.eu/data-and-maps/data/european-union-emissions-trading-scheme-eu-ets-data-from-citl-7

Data about the EU emission trading system (ETS). The EU ETS data viewer provides aggregated data on emissions and allowances, by country, sector and year. The data mainly comes from the EU Transaction Log (EUTL). Additional information on auctioning and scope corrections is included.

Airport Codes

Does this merit being included as a reference dataset?

Currency Codes

The links need to be updated - (coincidentally I commented on this in the datahub http://datahub.io/dataset/iso-4217-currency-codes a couple of hours ago)

This table is not really currency codes, its country/currency codes so is denormalized so USD appears in several places as a result. The table is misnamed and less useful as a result.

Oddly too, the reference to a country is by name not by ISO 3166 code. Do you have a policy around linking/foreign keys?

Of course, some folk would use the XML 'package' directly http://www.currency-iso.org/dam/downloads/dl_iso_table_a1.xml :)

CO2 emissions (by country)

Time series of C02 emissions (globally and by country).

Time range: as long as possible and as up to date as possible

Datasets

Think we have multiple:

Sources

http://cdiac.esd.ornl.gov/

Data Should Look Like

Global:

Year, Emissions, .... could have other columns for more fine-grained breakdown

Country:

Year, Country, Emissions, Per Capita Emissions

[meta] Naming Conventions

Establish various naming conventions both for datasets / repos and also for files.

Datasets

For country specific datasets:

{topic}                      # e.g. gdp
{topic}-{2-digit-iso}    # e.g. gdp-us 

For Data Files

Temporal granularity

[...-]year.csv
[...-]quarter.csv
[...-]month.csv
[...-]day.csv

For README

Intro summary paragraph

Headings (all h2)

  • Data - about the data
  • Wrangling - how we had to process the data (maybe we should call Processing)
  • License - about the license

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.