Coder Social home page Coder Social logo

gtfs's Introduction

GTFS

Public transportation schedules and associated geographic information for South-East Queensland.

The data is a snapshot and not planned to be kept up-to-date. The main purpose of this repository is to develop a data package and schemas for this dataset.

Data

This data is the General transit feed specification (GTFS) โ€” South East Queensland data published by Transport and Main Roads, Queensland Government, licensed under Creative Commons Attribution sourced on 07 September 2016.

The data follows the GTFS specification and some of its extensions that define a common format for public transportation schedules and associated geographic information. The specification allows some files to be optional. It also allows some columns in the files to be optional. This means that the datapackage.json file and schemas may not work for other GTFS files.

The data is made up of a number of files.

Each data file is defined by a schema. The schemas follow the json table schema specification.

These schemas will be combined into a datapackage.json file to fully describe the data collection. The datapackage.json file will follow the data package specification.

Preparation

The data was downloaded, unzipped, and then uploaded to GitHub.

Two data files (shapes.txt and trips.txt) were too large to load into GitHub. They were truncated and uploaded. They will be adequate to use for testing valid data.

Tests

The focus of the tests is to ensure the schemas are correct. There are already GTFS data validation tools to test the data in more powerful ways than json table schemas allow.

The tests are invalid data that is used to ensure the schema detects all errors (e.g. incorrect types and violated constraints).

Results

The results can be verified using links to Good Tables. Tests include:

  • testing the valid data without a schema
  • testing the valid data with a schema
  • testing the invalid data with a schema

Good Tables doesn't check all types of errors (yet). Somethings not checked include:

  • Foreign keys. (See Good Tables #17, #8)
  • Some constraints (See Good Tables#55)

Automatic Testing

The scripts and .travis.yml file are used to automatically test the data that is defined in datapackage.json. Whenever there is a change to this repository, it triggers Travis to validate the data.

The last automatic test returned datapackage validation

Schemas

The schemas were created using Data Packagist. Using Data Packagist:

  • add some basic information about the data file (name, description, license, etc.)
  • upload the data file

Data Packagist will create a datapackage.json file for you. Download this file.

Good Tables can only use a json table schema for validation (see goodtables-web #65). You can extract the json table schema from the datapackage.json file. It's this bit {fields: [...]}. Save this a separate file.

Edit the schema file with a text editor (e.g. ATOM, jsoneditoronline.org) and add constraints, refine types and formats, etc. You may like to use the json table schema schema to improve your editing experience.

Some constraints use regular expressions to define a pattern. Use a online tool to help create and test a regular expresion e.g. regexr.com or regex101.

View the Data Package

Data packages are about providing machine-readable metadata for your data. You can view a human-readable version of the data package data, and readme files using the Data Package Viewer. There are a couple of issues with the viewer including providing an incorrect link to the metadata data.okfn.org-new #9.

License

All items in this repository, apart from the data, are licensed under Creative Commons Attribution 4.0.

gtfs's People

Contributors

stephen-gates avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

cutr-at-usf muckt

gtfs's Issues

Split test data into smaller sets

GoodTables only returns a limited number of errors so the test data will need to be split into smaller sets.

Consider naming standard for test types:

  • type and format
  • constraints
    • required
    • unique
    • minimum, maximum
    • minLength, maxLength
    • pattern
    • emun
      -missing values
  • primary and foreign keys, duplicates
    -structural errors
    • Undeclared header: if you do not specify in a machine readable way whether or not your CSV has a header row
    • Ragged rows: if every row in the file doesn't have the same number of columns
    • Blank rows: if there are any blank rows
    • Stray/Unclosed quote: if there are any unclosed quotes in the file
    • Whitespace: if there is any whitespace between commas and double quotes around fields
    • Empty column name: if all the columns don't have a name
    • Duplicate column name: if all the column names aren't unique

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.