The awesome section presents collections of high quality datasets organized by topic.
Home page for awesome collections is located in the frontend repo and should be modified from there. See the live page here:
Curated list of quality open datasets
Home Page: https://datahub.io/collections
The awesome section presents collections of high quality datasets organized by topic.
Home page for awesome collections is located in the frontend repo and should be modified from there. See the live page here:
This should probably be broken up more:
Monthly prices for a wide range of commodities from IMF
Granularity
Source data series would be one or more of:
Original in this google spreadsheet
Please add new suggestions as a new issue in this issue tracker.
The WHO maintains a listing of known diseases at http://www.who.int/classifications/icd/en/ - the data download is only available upon registration and with a NC license. Is there an open version of this somewhere?
Does this merit inclusion?
http://www.crunchbase.com/ - stats as of Aug 2013
Where to get bulk ...
cc-by according to http://info.crunchbase.com/docs/licensing-policy/ with a bunch of specific attribution requirements
US EIA has a variety of prices: https://www.eia.gov/dnav/pet/pet_pri_spt_s1_d.htm (US EIA is great as high quality and public domain as fed gov)
There's various types of oil for which we could get prices:
I propose we store:
For granularity I'd say it is worth storing all of daily, weekly, monthly and annual but prioritise daily. (note naming conventions: http://data.okfn.org/doc/publish-faq#data-package-name)
Question: Do this as one data package or one data package per oil type? (And if one data package do we store brent and WTI same file or separate files? Ans: yes, separate files).
All in one:
Separate:
My instinct here is in all in one, so data package will look like:
data/wti-day.csv
data/wti-year.csv
data/wti-month.csv
# etc
List of mimetypes / mediatypes / file formats.
There is no open repository of contact details for health insurances in Germany apart from one PDF listing URLs. Assisted by web scraping we have compiled a complete list with email, address and telephone number. This should be helpful for healthcare system researchers trying to access policies or data from all insurances. There are 137 of them! Does this belong in the registry?
This would be country polygons at crudest scale (e.g. 1:110m). Suggest packaging natural earth data (pd etc).
package name: geo-boundaries-world-110m
Long-term: best way would be to get primary natural earth folks to add in "packaging" - they are already on github - see https://github.com/nvkelso/natural-earth-vector. But we need an exemplar ...
What format should we use?
/cc @jalbertbowden @amercader - thoughts here very welcome :-)
http://www.euribor-rates.eu/euribor-rates-by-year.asp
We probably don't need all 15 rates they used to have and which they are now reducing:
Until November 1st 2013 Euribor-EBF published 15 Euribor rates (1-3 weeks en 1-12 months) daily (working days only). As of November 1st 2013 the number of Euribor rates is reduced to 8 (1-2 weeks, 1, 2, 3, 6, 9 and 12 months). This adjustment is a consequence of the problems which arose last couple of years when determining the Euribor rates. An EBA/ESMA report which was published January 2013 recommends to calculate and publish only those Euribor rates which are used by banks on a frequent basis. The rationale being that is easier to calculate a reliable rate if there are many transactions for a specific rate (maturity).
I suggest we record the following rates at monthly intervals (which is what you get from historical data)
Though may turn out getting all 8 is same effort so may as well.
Le reseau actuel des Bus du service public des transports Dakar Dem Dikk.
http://data.un.org/Data.aspx?d=POP&f=tableCode:240
Download is via some javascript-y thing but some dev tools analysis reveals source as:
A complete list of all NYSE stock symbols (plus company name).
TODO: work out what symbol list(s) we want.
Note EDGAR also have a symbol list: http://okfnlabs.org/blog/2014/03/04/sec-edgar-database.html plus see bloomberg list in #25
Where can we get CO2 price and emission trading scheme info? Which regions run emissions trading schemes?
Data about the EU emission trading system (ETS). The EU ETS data viewer provides aggregated data on emissions and allowances, by country, sector and year. The data mainly comes from the EU Transaction Log (EUTL). Additional information on auctioning and scope corrections is included.
See http://www.unece.org/cefact/locode/service/location.html - looks like we would have to scrape (and not sure what the license is ...)
See also #30 (city population time series) - this provides a nice CSV file so maybe we extract from that ...)
Does this merit being included as a reference dataset?
Some initial work in this project including a DB: https://github.com/okfn/publicbodies.org
If you interested in getting involved and helping out creating and maintaining datasets then just add your github username in a comment below plus any relevant info on skills / interests
Hi , can I have acces to add this datapackage https://github.com/aliounedia/senegal-companies to the register ?
Official ISO registrar http://swiftref.swift.com/
There's a list here: http://www.sec.gov/edgar/NYU/cik.coleft.c
Would also be nice to have ticket to CIK (which EDGAR must have as they use in their search).
To do this you probably need to do a search by ticker on edgar standard search and request atom output e.g.
Then parse the atom to grab the CIK. (If you prefer HTML output just omit output=atom).
Similar to #38 (country boundaries)
Name: geo-boundaries-us-10m
http://www.jodidata.org/database/access-database.aspx
Not quite sure what is in there but seems to be oil reserves etc
http://www.iana.org/assignments/media-types
http://svn.apache.org/viewvc/httpd/httpd/branches/2.2.x/docs/conf/mime.types?view=annotate
Suggested Schema
{ id: # mimetype identifier fileextensions: # space separated list (?) link: # link to authoratative mimetype? }
I had a python script kicking around for fetching up-to-date country code standards and putting them all together.
I love the work you are doing on dataprotocols.org so I reorganized it as a datapackage.
This probably duplicates some of the data already included in the registry, so feel free to ignore.
The links need to be updated - (coincidentally I commented on this in the datahub http://datahub.io/dataset/iso-4217-currency-codes a couple of hours ago)
This table is not really currency codes, its country/currency codes so is denormalized so USD appears in several places as a result. The table is misnamed and less useful as a result.
Oddly too, the reference to a country is by name not by ISO 3166 code. Do you have a policy around linking/foreign keys?
Of course, some folk would use the XML 'package' directly http://www.currency-iso.org/dam/downloads/dl_iso_table_a1.xml :)
Standard chart of accounts for government
https://github.com/datasets/crime-uk
Does this merit inclusion?
Time series of C02 emissions (globally and by country).
Time range: as long as possible and as up to date as possible
Think we have multiple:
co2-{geo}
{geo}
which is one of global
or national
co2-fossil-{geo}
where geo is one of global | national | regional
co2-fossil-gridded
- http://cdiac.esd.ornl.gov/epubs/ndp/ndp058/ndp058_v2013.htmlGlobal:
Year, Emissions, .... could have other columns for more fine-grained breakdown
Country:
Year, Country, Emissions, Per Capita Emissions
Establish various naming conventions both for datasets / repos and also for files.
For country specific datasets:
{topic} # e.g. gdp
{topic}-{2-digit-iso} # e.g. gdp-us
Temporal granularity
[...-]year.csv
[...-]quarter.csv
[...-]month.csv
[...-]day.csv
Intro summary paragraph
Headings (all h2)
See also http://openleis.com/ - seems to be a dump at http://openleis.com/legal_entities.json and http://openleis.com/legal_entities.xml (not sure about license)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.