Coder Social home page Coder Social logo

rienafairefr / backend Goto Github PK

View Code? Open in Web Editor NEW

This project forked from matchid-project/backend

0.0 0.0 0.0 7.9 MB

Backend (Docker & API) for matchID project

Home Page: https://matchID-project.github.io

License: GNU Lesser General Public License v3.0

Python 88.00% Makefile 11.23% Dockerfile 0.77%

backend's Introduction

Introduction

This project aims to offer a backend to the matchID project.

Full documentation is available at https://matchid-project.github.io.

The main objective is to process one or many datasets of civil states and identify multiple matches (at least two!) of a same person.

The backend basically offers the possibility to cook a dataset with a recipe, leading to a new dataset. The recipe can be cooked "live" or in background, offering the possibility of live-reranking (with machine learning or not) on top of an elasticsearch.

A recipe book for preparing names, birth location, fuzzy match an rescore is integrated and can be fully customized for your use-case.

It's full-api designed (no cli!) and based on Flask RESTPlus. The scalability relies on single server multiprocessing for the Pandas adn scikit-learn python part, and cloud scalability of elasticsearch for large fuzzy-match use-cases. It aims to offer capability to match two datasets with dozens millions of records in a day on a 1U server. Further developments will be still needed for full-cloud scalability.

For now the code is considered to be still in "beta" development, and still needs some steps of refactoring and documentation to reach production readiness.

This package integrates a simple page application in Vue web-app for helping developing your use-case (single user designed, so not to deserve a Lab of data-scientists), and a Docker configuration for accelerating your use-case design.

Main use cases

  • live search and bulk-search identities in a dataset (take benefit from elasticsearch and offers the possibility of re-ranking)
  • find common identities between two datasets
  • deduplicate idendities into one dataset

Running it

Automatization (and thus documentation) is for now a future achievement

First clone the backend

git clone https://github.com/matchID-project/backend

matchID uses make and Docker to accelerate installation of dependencies. You'll first have to install Docker and docker-compose.

Now you just have to start matchID:

cd backend
make start

This should :

  • download the frontend
  • build it
  • start the backend
  • start elasticsearch (required)

Going to your browser to check everythings works fine.

As it starts many components your computer may encounter low memory [if <8Go]. Just go to next section to see how to still run matchID.

You can now add you own data, but we strongly to follow the tutorial and downloading the sample use case :

make download-example

We recommand now you to follow the Tutorial.

Frequent running problems

stop matchID

make stop

supported components

The list of the supported components is :

  • backend : the api and the engine
  • frontend: the single-page-application in Vue.js to develop recipes
  • elasticsearch: the famous search engine used for fuzzy matching
  • postgres : not needed but useful for lower memory configuration in further
  • kibana: useful for elasticsearch data analysis

You can start all the components like this:

make start-all

Each component can be started or stopped alone, this will for example stop postgres:

make postgres-stop

And this will start kibana :

make kibana

For example, make start is equivalent to make backend frontend elasticsearch.

check health of components

Each docker components logs its actions to log/docker-component.log. So you can easily check heath of all components like this:

tail -f log/docker-*.log

Nginx didn't launch

matchID use the 80 port by default. If you have another web service it may cause conflict.

Just edit docker-components/docker-compose-run-frontend.yml and change the docker ports "80:80" to "8080:80" to change the exposition port to 8080.

You will restart the frontend like this

make frontend-stop frontend

Clean all and retry

To clean everything

make clean docker-clean

Developpement mode

If you want to contribute to the developpement, you'll be able to fork the repo on GitHub and to lauch the dev mode (you'll perhaps have to do a make docker-clean first):

make start-dev

backend's People

Contributors

benoitguigal avatar cristianpb avatar rhanka avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.