Coder Social home page Coder Social logo

zimmerman-team / duct Goto Github PK

View Code? Open in Web Editor NEW
5.0 11.0 2.0 224.25 MB

DUCT is Django application which allows user to convert CSV files to a harmonised datastore modelled on the SDMX standard with a standardised output.

License: GNU Affero General Public License v3.0

Python 87.85% HTML 10.55% Shell 0.62% CSS 0.69% Dockerfile 0.29%
data datastore convert-csv-files data-conversion django-application sdmx-standard

duct's Introduction

DUCT: Data Universal Conversion Tool

License: AGPLv3

DUCT is Django application which allows user to convert CSV files to a harmonised datastore modelled on the SDMX standard with a standardised output. It provides two different API' to interface the data either to convert (PUT) and extract data in your bespoke IU (GET). DUCT makes use of The Django REST API as a base API and on top of this, it provides GraphQL to connect to your datastore for data modelling, data conversion, data integration and data interfacing.

DUCT has been build as part of Zoom, a Data platform for data informed strategy in combating the AIDS epidemic in cooperation with Aidsfonds that works towards ending AIDS in a world where all people affected by HIV/AIDS have access to prevention, treatment, care and support and HumanityX who are supporting organisations in the peace, justice and humanitarian sectors to adopt digital innovations in order to increase their impact on society.

Auth0

DUCT uses auth0 for authenticated access to certain endpoints and certain data.

Requirements

Name Recommended version
Python 3.6.5
PostgreSQL 10.5
virtualenv 16.1
pip 8.1
python-dev --
python3.6-dev --
libpython-dev --
libpython3.6-dev --
rabbitmq 3.7
libsqlite3-dev --
tippecanoe 1.34.6
Supervisor (for deployment) 3.2
nginx (for deployment) 1.14
PostGIS See: installing PostGIS
Ubuntu (Documentation only covers Ubuntu) (16.04)

Set up


  • git clone https://github.com/zimmerman-zimmerman/DUCT.git
  • cd DUCT

If you have Docker installed:

  • Create a file called 'docker_settings.py' in the folder 'DUCT/ZOOM/ZOOM' and add these variables to it(Note: these are basically used for sending email, after data mapping is done):

      from ZOOM.settings import *
    
      DATABASES = {
          'default': {
              'ENGINE': 'django.contrib.gis.db.backends.postgis',
              'NAME': 'zoom',
              'USER': 'zoom',
              'PASSWORD': 'zoom',
              'HOST': 'db',
          },
      }
    
      # SEND EMAIL CONFIG
    
      EMAIL_HOST = 'your_email_host'
      EMAIL_PORT = 'your_email_host_port'
      EMAIL_HOST_USER = 'your_email_host_user'
      EMAIL_HOST_PASSWORD = 'your_email_host_password'
      EMAIL_USE_TLS = True
    
      # TASKS
    
      ZOOM_TASK_EMAIL_CONFIRMATION_ENABLE = True
      ZOOM_TASK_EMAIL_SENDER = 'your_email_sender'
      ZOOM_TASK_EMAIL_RECEIVER = 'your_default_email_receiver'
    
      # DOCKER RABBIT MQ
    
      CELERY_BROKER_URL = 'amqp://rabbitmq'
      CELERY_RESULT_BACKEND = 'amqp://rabbitmq'
    

then run:

docker-compose build docker-compose up

or you can manually install by:

  • sudo sh bin/setup/install_dependencies.sh

  • Run virtualenv <name> -p python3 to create a virtual environment

  • Run source env/bin/activate to activate the virtual environment

  • pip install -r ZOOM/requirements.txt

  • sudo sh bin/setup/sync_db.sh

  • sudo sh bin/setup/create_django_user.sh

  • cd ZOOM/scripts

  • If you want to have geolocations for Netherlands PC4 digit areas in your DUCT please download it here PC4 geo json and add it to your 'DUCT/ZOOM/geodata/data_backup' folder. Note it might take up to 30mins more for the set up project script to finish if you have this file.

  • If you want to have geolocations for Netherlands PC6 digit areas in your DUCT please download it here PC6 geo json and add it to your 'DUCT/ZOOM/geodata/data_backup' folder. Note it might take up to 1 day or more for the set up project script to finish if you have this file.

  • ./setup_project.sh

  • cd ..

  • Create a file called 'local_settings.py' in the folder 'DUCT/ZOOM/ZOOM' and add these variables to it(Note: these are basically used for sending email, after data mapping is done):

      # SEND EMAIL CONFIG
    
      EMAIL_HOST = your_email_host
      EMAIL_PORT = your_email_host_port
      EMAIL_HOST_USER = your_email_host_user
      EMAIL_HOST_PASSWORD = your_email_host_password
      EMAIL_USE_TLS = True
    
      # TASKS
    
      ZOOM_TASK_EMAIL_CONFIRMATION_ENABLE = True
      ZOOM_TASK_EMAIL_SENDER = your_email_sender
      ZOOM_TASK_EMAIL_RECEIVER = your_default_email_receiver
     ```
    
  • Also in your local_settings.py ^ you can add this variable 'POCESS_WORKER_AMOUNT=' and specify a number of desired process workers. Basically this is used for big geodata processing, data that contains more than 40000 data points. Of course the amount of process workers to be used would very much be dependant on your machine, it should never exceed the amount of cores your machine has, and of course if you use to many process workers(like 20) it might work slower in comparison to using less process workers(like 4) because of pre process initiation tasks. The default of this variable is already set to 2.

  • In 'DUCT/ZOOM' folder create a file called '.env' and add these variables to it(mainly used for specific DUCT endpoints that can only be accessed with a user signed in via auth0 api):

     AUTH0_DOMAIN=your_auth_domain
     API_IDENTIFIER=your_auth_api_identifier
    
  • In 'DUCT/ZOOM' folder create a folder called 'media' and inside that one create a folder called 'tmpfiles'(If these were not already created)

  • Start your rabbitmq service

  • python manage.py runserver

  • Reactivate your virtual environment if it was deactivated, then in the folder 'DUCT/ZOOM' run the celery worker -A ZOOM worker -l info

  • Reactivate your virtual environment if it was deactivated, then in the folder 'DUCT/ZOOM' run the celery beat -A ZOOM beat -l info

...and visit 0.0.0.0:8000.

This will start a development environment (using Django's development server) for DUCT.

Note, that DUCT Docker image will be pulled from Docker HUB and not built locally.

Extra Info

  • Make sure that your tippecanoe executable is in '/usr/local/bin/tippecanoe' OR if you have it somewhere else, make sure to add the variable 'TIPPECANOE_DIR' to your local_settings.py pointing to the directory containing tippecanoe executable. For the recommended/default case it would look like TIPPECANOE_DIR = '/usr/local/bin/'

Documentation


clone the project

sudo apt-get install git
git clone https://github.com/zimmerman-zimmerman/DUCT.git;
cd DUCT;

Install dependencies

Install all the dependencies in the bin/setup/install_dependencies.sh folder.

sudo sh bin/setup/install_dependencies.sh

Install a python virtual environment

sudo apt-get install python-pip;
sudo pip install virtualenvwrapper;
export WORKON_HOME=~/envs;
/usr/local/bin/virtualenvwrapper.sh;
source /usr/local/bin/virtualenvwrapper.sh;
mkvirtualenv zoom;
workon zoom;

Install pip packages

cd ZOOM
pip install --upgrade pip;
pip install -r ZOOM/requirements.txt;

Configuration

Create a database

sudo -u postgres bash -c "psql -c \"CREATE USER zoom WITH PASSWORD 'zoom';\""
sudo -u postgres bash -c "psql -c \"ALTER ROLE zoom SUPERUSER;\""
sudo -u postgres bash -c "psql -c \"CREATE DATABASE zoom;\""

Migrate the database, create a superuser, and run the server (for production, we use nginx/gunicorn).

cd ZOOM/scripts
sh setup_project.sh
cd ../
python manage.py createsuperuser
python manage.py runserver

Eventually, you could add your modifications to the Django configuration in a new file at ZOOM/local_settings.py

Endpoints Overview

Rest endpoints

URL Code Loc
/api/indicators/ api.indicator.views.IndicatorList
/api/mapping/ api.mapping.views.MappingJob
/api/mapping/get_data api.mapping.views.get_data
/api/mapping/status api.mapping.views.MappingJobResult
/api/metadata/ api.metadata.views.FileListView
/api/metadata/pk/ api.metadata.views.FileDetailView
/api/metadata/sources/ api.metadata.views.FileSourceListView
/api/metadata/sources/pk/ api.metadata.views.FileSourceDetailView
/api/metadata/upload/ api.metadata.views.FileUploadView
/api/validate/ api.validate.views.Validate
/api/validate/check_file_valid/ api.validate.views.check_file_valid
/api/error-correction/ api.error_correction.views.ErrorCorrectionView

GraphQL

URL Code Loc
/graphql graphene_django.views.GraphQLView
Query Code Loc
allMappings gql.mapping.schema.Query
allIndicators gql.indicator.schema.Query
datapointsAggregation gql.indicator.schema.Query
fileSource gql.metadata.schema.Query
allFileSources gql.metadata.schema.Query
file gql.metadata.schema.Query
allFiles gql.metadata.schema.Query
country gql.geodata.schema.Query
allCountries gql.geodata.schema.Query
geolocation gql.geodata.schema.Query
allGeolocations gql.geodata.schema.Query
Mutation Code Loc
mapping gql.mapping.mutation.Mutation
indicator gql.indicator.mutation.Mutation
fileSource gql.metadata.mutation.Mutation
file gql.metadata.mutation.Mutation

About the project


Can I contribute?


Yes please! We are mainly looking for coders to help on the project. If you are a coder feel free to Fork the repository and send us Pull requests!

Running the tests


Django Rest API

The rest API endpoints can be tested by:

python manage.py test api.<Test Choice>

Below is an example of a test that can be run

python manage.py test api.mapping.tests.test_file_manual_mapping

GraphQL

The GraphQL enpoints can also be tested by:

python manage.py test gql.<Test Choice>

Below is an example of a test that can be run

python manage.py test gql.tests.test_mapping

duct's People

Contributors

bryanph avatar dependabot[bot] avatar eimis avatar evilurge avatar hatimkh avatar kjod avatar luminhan-zz avatar martizs avatar stephanoshadjipetrou avatar taufik-hidayat avatar vincentvw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duct's Issues

Add geodata app from OIPA as separate app

It will be used in this application as look up lists for geographic data. In the first instance just for Country lookups.

Todo:
-Add the OIPA/geodata app to this application .
-Add RegionVocabulary as a model (in OIPA its in a different app).
-Please create managements and make sure the used libraries that are not important for this zoom csv mapper are not imported. (only to do is remove the autocomplete_light.py file I think).
-Add custom django-admin management tasks to run the scripts in geodata/importer. See OIPA/task_queue/tasks.py for what the tasks should do. And this docs for how to do it in case you never created them; https://docs.djangoproject.com/en/1.10/howto/custom-management-commands/

IATI data for Scatterplot demo

photo 2017-01-23 14_51

As per our discussion:

We need to be able select some IATI data we would like to add to the scatterplot demonstration.

Specifications

IATI Sectors selection

STD control including HIV/AIDS
Social mitigation of HIV/AIDS

Currency: USD

Then:

Grouped by:

Disbursement
Commitment
Expenditure
Incoming Fund

Then: grouped by year

In order for the comparison in the scatterplot, we would need seperate country data as well. In the X or Y Selection Dropdown we should only be able to select the following:

Disbursement on STD control including HIV/AIDS
Commitment on STD control including HIV/AIDS
Expenditure on STD control including HIV/AIDS
Incoming Fund on STD control including HIV/AIDS

Disbursement on Social mitigation of HIV/AIDS
Commitment on Social mitigation of HIV/AIDS
Expenditure on Social mitigation of HIV/AIDS
Incoming Fund on Social mitigation of HIV/AIDS

As per the JSON blob structure currently used in the demonstrator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.