Coder Social home page Coder Social logo

lookup's Introduction

lookup

A repository of journalist's lookup tables. Designed for programmatic access using tools such as agate-lookup.

Anyone may contribute a lookup table by sending a pull request to this repository.

Structure of files

Each folder is a key that can be used for a lookup. Within that folder are CSV files. The name of the CSV file is the name of the value that it maps to. The CSV itself will contain two columns, one with the key and another with the value. For example, usps/state.csv contains a CSV file that looks like this:

usps,state
AL,Alabama
AK,Alaska
AZ,Arizona
...

Sometimes the mapping from a key to value varies over time. For example, NAICS codes change every five years. In this case, a version specifier may be included in the filename. For example, naics/description.2007.csv is the 2007 version of the code mapping and naics/description/2012.csv is the 2012 version.

It may also be useful to be able to map two keys to a single value. For example, you might want to look up population by state and year. In those cases key folders can be nested and the CSV can contain more than one key column. For example, usps/year/population.csv contains a CSV that looks like this:

usps,year,state
AL,2015,4858979
AL,2014,4846411
AL,2013,4830533
...

Metadata format

Each CSV table must be accompanied by a YAML file. That file must have an identical filename, plus the .yml extension. For example, the table fips/state.csv must be accompanied by fips/state.csv.yml. This file should contain the following metadata:

data: A description of the data, including any notes necessary to use it correctly.
version: A description of the specific version of the data.
sources:
  - A list of sources for the data, such as "United States Census Bureau", including URLs whenever possible
contributors:
  - The name <and email of anyone who has contributed to this table>
columns:
  key_column_name: Agate column type, such as "Text" or "Number"
  value_column_name: Agate column type, such as "Text" or "Number"

See naics/description.2007.csv.yaml for an example of a complete metadata file.

Rules for including data

Anyone may submit a pull request to add a table to this repository, however, the following rules will guide inclusion of any data:

  • The data must have journalistic value.
  • The data must be from an authoritative source.
  • The CSV must be in "standardized" CSV format. (Run through in2csv.)
  • All keys must be unique. (No split/combine crosswalks.)
  • All keys must be durable identifiers, not names.
  • All filenames and keys must use snake_case.
  • Periods must not be used in filenames or keys except as defined above.
  • Four digit years must be used everywhere.
  • Each CSV must be 250KB or less.

I found an error!

If people are going to rely on the tables in this dataset then there must be a log every error. If you find an error in any data, please send a pull request with a correction. That same pull request must also add an entry to ERRORS.md describing precisely the nature of the error.

lookup's People

Contributors

onyxfish avatar

Watchers

James Cloos avatar Gerald Rich avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.