Coder Social home page Coder Social logo

bdp2ckan's Introduction

BDP2CKAN

A proof of concept command line script capable of submitting a Budget Data Package to a CKAN instance. The CKAN instance must be able to store metadata extras. This should use the python datapackage library but doesn't because currently that library doesn't support budget data packages (but the work done for this could be used to improve the datapackage library). Anyways, onwards to the important stuff.

Installation

As this is a proof of concept, this does not provide a nice python setup. You have to do manual stuff to get it working but it's not the end of the world; it's still kind of standard. Create a virtualenv (recommended) and pip install the requirements.

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

To see how it works type:

python bdp2ckan.py --help

But it's very simple. You basically just provide a schema file (we provide it for you because we're super nice), the CKAN host, your API key for that CKAN host and a link to the datapackage.json descriptor file.

python bdp2ckan.py --schema schemas/0.3.0.json --host 'http://localhost:5000' --apikey <my-awesome-api-key> --organization <organization-id-or-name> https://raw.githubusercontent.com/os-data/boost-armenia/master/datapackage.json

License

bdp2ckan is available under the GNU General Public License, version 3. See LICENCE for more details.

bdp2ckan's People

Watchers

 avatar  avatar

bdp2ckan's Issues

Currencies not handled very well

The currency is a property of the measures and is now only grabbed by looping over all measures and extracting the currencies and then submitting that as metadata on a dataset level. The script should look at what resources use the currency and put the currency as metadata on the resource, instead of the dataset. That might also be non straightforward because a resource might have multiple currencies as well. Need to figure out if this is something we need to worry about.

Schema always assumes source has an object value

In the examples of the budget data package specification it is clear that it's possible to define the source with a string, not only an object.

The schema can only evaluate:

{
    "year": {
        "source": "source field name"
    }
}

but not

{
    "year": "source field name"
}

License identifiers

Licenses in CKAN are defined by a license.json file which may or may not be compatible with the licenses supported in Budget Data Package (the OD list). We need to figure out how to handle license identifiers properly.

Budget Data Package schema relies on common pattern for source

The budget data package schema assumes that the specification will use source in the mapping to say where the source column is but that is only a common pattern used for budget data packages. It's not clear what would be the official way (although it's very likely to be source).

CKAN API keys

This is not an immediate issue for this commandline script but may be an issue with how it will be used. The OpenSpending CKAN instance does not make it easy to figure out ones API key and may end up not even using that but an external service. This can be made to work with a proof of concept but needs to be addressed soon.

Multiple licenses

CKAN does not support multiple licenses which Budget Data Package does support. We need to figure out how to properly support the licenses property.

The schema can't do "license" xor "licenses"

The data package specification (and budget data package as well) says that the descriptor can either use license OR licenses but not both. This can't be captured in the schema and is really weird for the data package as well because data packages allow additional properties. So this is an example of trying to capture:

You can have any additional properties EXCEPT license IF you already have licenses OR license IF you already have license but you don't have to have either of those because these are optional fields.

DimensionType is not properly defined

The Budget Data Package specification only talks about DimensionType in an example in the spec instead of as part of the spec itself and in the example comment names "entity", "classification", "program" and "etc." as possible values. The spec schema obviously ignores "etc." but restricts itself to the other three possible values which are clearly not the definite list.

Having to use patternProperties in schema is messy

Because dimension and measurement keys are unknown the schema has to rely on patternProperties which is very messy. Especially with the mix between measures and dimensions which need to use a negative lookahead. It's a non issue, but still very messy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.