Coder Social home page Coder Social logo

alphagov / backdrop Goto Github PK

View Code? Open in Web Editor NEW
18.0 52.0 11.0 3.84 MB

Storing and querying data for the Performance Platform

License: MIT License

Python 88.51% CSS 0.16% Shell 1.71% Gherkin 9.34% JavaScript 0.17% Makefile 0.11%
performance-platform

backdrop's Introduction

Backdrop

Build Status

Dependency Status

Code Health

What is it?

Backdrop is a datastore built with Python and MongoDB. It is made up of two separately deployable APIs for reading and writing data over HTTP. The plan is to be able to gather data from a variety of sources and then aggregate and compare this data in useful ways.

  • Data is grouped into data sets.
  • Data is stored by posting JSON to the write API.
  • Certain types of data are identified by reserved keys eg events are objects containing a timestamp.
  • Reserved keys start with an underscore eg { "_timestamp": "2013-01-01T00:00:00Z }"
  • Data is retrieved using HTTP GET requests against the read API.
  • Data can be manipulated in a few useful ways with HTTP query strings eg /$DATA_GROUP/$DATA_TYPE?period=month for monthly grouped data.
  • Backdrop is in constant development, so the best place to find examples and features are the feature tests

Getting set up

This assumes you are using the Performance Platform development environment and so have Python and MongoDB installed.

  1. Check that you have virtualenv installed, if not sudo apt-get install python-virtualenv.
  2. If you don't have virtualenvwrapper installed, create a virtualenv using virtualenv venv and source venv/bin/activate to enable it.
  3. Navigate to the top level backdrop directory and run ./run_tests.sh. This will create a new virtualenv (if virtualenvwraper is installed), install all dependencies and run the tests.
  4. source venv/bin/activate to enable the virtualenv if you didn't do this in step 2.
  5. Copy backdrop/write/config/development_environment_sample.py to development_environment.py (if you want to) and edit as needed.

Starting the app

  1. ./run_development.sh will start backdrop read and write on ports 3038 and 3039 respectively
  2. Confirm you're up and running by requesting http://www.development.performance.service.gov.uk/_status

To start just the read or write applications:

  1. ./start-app.sh takes two arguments: app (read or write) and port
  2. ./start-app.sh read 3038 and/or ./start-app.sh write 3039

Testing

Run tests with ./run_tests.sh

Splinter tests are not run in Travis or Jenkins due to their instability.

Requesting data

Requests return a JSON object containing a data array.

GET /data/$DATA_GROUP/$DATA_TYPE will return an array of data. Each element is an object.

GET /data/$DATA_GROUP/$DATA_TYPE?collect=score&group_by=name will return an array. In this case, each element of the array is an object containing a name value, a score array with the scores for that name and a _count value with the number of scores.

GET /data/$DATA_GROUP/$DATA_TYPE?filter_by=name:Foo returns all elements with name equal to "Foo".

GET /data/$DATA_GROUP/$DATA_TYPE?filter_by_prefix=name:Foo returns all elements with name beginning with "Foo".

Other parameters:

  • start_at (YYYY-MM-DDTHH:MM:SS+HH:MM) and end_at (YYYY-MM-DDTHH:MM:SS+HH:MM)
  • period ("week", "month")
  • sort_by (FIELD:ascending)
  • limit (integer)

Useful tools

Sync data from environment

Copy data from an environment to the local Backdrop database (should be run on your host machine): bash tools/replicate-db.sh performance-mongo-1.integration

You may need to setup your ssh config correctly for this to work

To sync to the govuk dev vm, you can pass govuk_dev as the 2nd argument to this script -

bash tools/replicate-db.sh performance-mongo-1.integration govuk_dev

Emptying a dataset

To empty a dataset, get its token from stagecraft. Then run the following curl command

curl -X PUT -d "[]" https://{backdrop_url}/data/<data-group>/<data-type> -H 'Authorization: Bearer <token-from-stagecraft>' -H 'Content-Type: application/json'

Remove single entry from dataset

curl -X DELETE https://{backdrop_url}/data/<data-group>/<data-type>/<data_set_id> -H 'Authorization: Bearer <token-from-stagecraft>'

Transformers

Transformers run as part of the backdrop-transformer-procfile-worker service.

sudo service backdrop-transformer-procfile-worker status

Triggering a transform manually

A transform occurs when data is written to in Backdrop. The transform applies calculations to the data and writes the results to a second dataset.

You may wish to trigger a transform manually if data is missing from a output data set.

Tranforms are configured in Stagecraft via the API or Django admin application.

  1. Log in to the Stagecraft Django admin application to obtain a bearer token for the source data set:

    a. Select 'Data sets' from the 'Datasets' section in the main menu. b. Search for the source data set c. Make a note of the data group and data type for the data set you wish to transform d. Click on the name of the data set e. Copy the bearer token from the form field

  2. Run the following command, replacing the fields in capitals:

    curl -H 'Authorization: Bearer <INSERT BEARER TOKEN HERE>' -H 'content-type: application/json' -d '{"_start_at": "2012-01-01T00:00:00Z", "_end_at": "2015-03-20T00:00:00Z"}' https://www.performance.service.gov.uk/data/<DATA GROUP>/<DATA TYPE>/transform
    

Celery worker

Backdrop uses celery for running tasks on data post write - these can be found in backdrop/transformers/tasks/

To process these tasks, you must run the worker - this can be done with the following command

celery worker -A backdrop.transformers.worker -l debug

Troubleshooting

The logs for RabbitMQ can be found in: /var/log/rabbitmq/ If there are any problems running transforms, this should be the first place to look.

backdrop's People

Contributors

abersager avatar alexmuller avatar ambrozic avatar annapowellsmith avatar blairboy362 avatar easternbloc avatar fawkesley avatar frabcus avatar gtrogers avatar guykoth avatar henrytk avatar jabley avatar jcbashdown avatar leelongmore avatar leenagupte avatar mattbostock avatar mattrco avatar maxfliri avatar nick-gravgaard avatar nickgravgaard avatar norm avatar pbadenski avatar phss avatar robyoung avatar roc avatar rossjones avatar timmow avatar tlwr avatar tombooth avatar yolinas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

backdrop's Issues

What does this repo do?

I don't know what backdrop is or its role in the performance platform. Please could the Readme be updated with a short overview?

Support filtering based on numeric values

Given a request like this:

/data/data-group/data-type?flatten=true&collect=costs%3Asum&group_by=dept&start_at=2014-04-01T00%3A00%3A00Z&period=quarter&end_at=2014-07-01T00%3A00%3A00Z

Which returns something like this:

{
  "data": [
    {
      "_count": 1.0,
      "_end_at": "2014-07-01T00:00:00+00:00",
      "_start_at": "2014-04-01T00:00:00+00:00",
      "costs:sum": 765432,
      "dept": "bat"
    },
    {
      "_count": 1.0,
      "_end_at": "2014-07-01T00:00:00+00:00",
      "_start_at": "2014-04-01T00:00:00+00:00",
      "costs:sum": 0.0,
      "dept": "baz"
    },
    {
      "_count": 1.0,
      "_end_at": "2014-07-01T00:00:00+00:00",
      "_start_at": "2014-04-01T00:00:00+00:00",
      "costs:sum": 123456,
      "dept": "bar"
    },
    {
      "_count": 1.0,
      "_end_at": "2014-07-01T00:00:00+00:00",
      "_start_at": "2014-04-01T00:00:00+00:00",
      "costs:sum": 7896,
      "dept": "foo"
    }
  ]
}

It would be useful to apply a filter parameter of costs:sum > 0 to exclude the 0 value from the response.

Casting dates backwards and forwards is like ping pong

When a user uploads an Excel file, we return the date as a string (parse_excel.py#27). When we then apply an Excel filter to the date, we parse the string to a date (evl_upload_filters.py#48).

If we really need to store in Mongo as a string rather than a date, could we do the conversion once (just before we store it)?

(Possibly a question for @robyoung)

Rotate log files

It's important that backdrop (whilst running on PaaS) is able to rotate its logfiles so that it doesn't fill the available spave and lock up the application.

Missing requirements?

I had to add these to requirements.txt to get it to build:

amqp==1.4.6
anyjson==0.3.3
markupsafe==0.23

I would create a pull-request, but the versions were just guesses so I may not be correct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.