Coder Social home page Coder Social logo

ebi-metagenomics / genome-search Goto Github PK

View Code? Open in Web Editor NEW
1.0 5.0 0.0 6.21 MB

Microservice API for searching fragments against indexed genomes, using COBS. Provides the "search by gene" feature on MGnify's MAG catalogues.

Home Page: https://www.ebi.ac.uk/metagenomics

License: Apache License 2.0

Python 91.63% Dockerfile 8.37%
cobs genome-sequence-comparison hug-api metagenomics podman

genome-search's Introduction

Code style: black Docker Repository on Quay Tests

COBS metagenomics genome search

Web and API built on top of COBS.

Purpose

Search gene fragments against MGnify’s Genome Catalogues (MAGs and Isolates).

Tech stack

Based on:

  • COBS
  • Hug as the Http API provider and CLI interface
  • Docker for containerisation

Both the API (for handling queries) and CLI (for generating indices) run in a Docker. This is mostly so that we can build COBS with a good version of cmake, and because the conda and pip packages for COBS do not contain all of the latest updates.

Usage

docker pull quay.io/microbiome-informatics/genome-search
docker run -it quay.io/microbiome-informatics/genome-search

Note that the Quay.io-built image will probably not run on MacOS. You can however build it locally, as below.

Dev setup

Requirements

You must have Docker or Podman or Singularity installed, as well as Python3.6+ installed.

Install development tools (including pre-commit hooks to run Black code formatting).

pip install -r requirements-dev.txt
pre-commit install

Docker build

Build the Docker image

docker build -t mgnify-cobs-genome-search .

Docker run to use the CLI

Invoke the CLI to build an index (e.g. to rebuild the test fixtures index)

 docker run -v $(PWD)/tests/fixtures:/opt/local/data -it mgnify-cobs-genome-search -c index create /opt/local/data/catalogues/marine1.0 /opt/local/data/indices/marine1.0 --clobber True

And to run a search:

 docker run -v $(PWD)/tests/fixtures/indices:/opt/local/data -it --env COBS_CONFIG=config/local.yaml mgnify-cobs-genome-search -c search search --seq CATTTAACGCAACCTATGCAGTGTT
TTTCTTCAATTAAGGCAAGTCGAGGCACT --catalogues_filter marine1.0
# 
# {'query': 'CATTTAACGCAACCTATGCAGTGTTTTTCTTCAATTAAGGCAAGTCGAGGCACT', 'threshold': 0.4, 'results': [{'genome': 'MGYG000296002', 'score': 24}]}

Docker run to serve the API

docker run -v $(PWD)/tests/fixtures/indices:/opt/local/data -v $(PWD)/config/local.yaml:/opt/local/config/local.yaml -p 8000:8000 -it --env COBS_CONFIG=config/local.yaml mgnify-cobs-genome-search

If you’re running something else on port 8000 (like the MGnify API), use -p 8001:8000 instead to expose the COBS API on port 8001 of your machine.

Call the API with a request like:

curl --location --request POST 'http://127.0.0.1:8001/search' \
--header 'Content-Type: application/json' \
--data-raw '{
    "seq": "CATTTAACGCAACCTATGCAGTGTTTTTCTTCAATTAAGGCAAGTCGAGGCACTATGTAT",
    "catalogues_filter": "marine1.0"
    }'

Running tests

There is a small test suite which runs inside the Docker container. A separate tests/Dockerfile exists for this purpose.

docker build -t mgnify-cobs-genome-search-tests -f tests/Dockerfile .
docker run -t mgnify-cobs-genome-search-tests

During development/debugging, it is usually convenient to mount the src/ directory with a volume bind to the docker container, e.g. by adding -v "$(PWD)/src/":"/usr/src/app/src" to any of the above docker run commands. This means you do not need to rebuild the docker image every time you change a source file.

Running in production

This service can be deployed on a webserver using Nginx, Podman, and certbot. E.g. to set up an Ubuntu 20 VM on Embassy:

# Check out the repo using git or the gh cli

# Create a config at /home/ubuntu/cobs/cobs.yaml

# Use podman to run the container
sudo apt update
sudo apt install podman
sudo podman pull quay.io/microbiome-informatics/genome-search:cobs
sudo podman run -e COBS_CONFIG=/home/ubuntu/cobs/cobs.yaml --mount type=bind,source=/home/ubuntu/cobs/,destination=/home/ubuntu/cobs -p 8000:8000 --name cobs --detach quay.io/microbiome-informatics/genome-search:cobs
#  The reason you sudo these commands is that the root user owns a different set of containers to ubuntu.
#  If you pull the image as ubuntu, it won't update the image used by root (systemd) 

# Generate a systemd service to keep podman up
podman generate systemd --new --name cobs > cobs.service
sudo cp cobs.service /etc/systemd/system/
sudo systemctl enable cobs
sudo systemctl start cobs

# Set up nginx
sudo apt install nginx

# Set up certbot for SSL
sudo snap install core; sudo snap refresh core
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot
sudo certbot --nginx -d cobs-genome-search-01.mgnify.org

# Set up the nginx conf
sudo cp /home/ubuntu/this/repo/path/webserver_configs/nginx.conf /etc/nginx/sites-enabled/default

# Start nginx
sudo service nginx start

A health-check command might be needed, because sometimes the web service terminates inside the container for myriad reasons. Just add --healthcheck-command 'CMD-SHELL curl http://localhost:8000/search || exit 1' --healthcheck-interval=10s to the pomdan run... command.

Certificate renewal

If certbot fails to automatically renew the SSL certificate (e.g. because HTTP ingress is limited to a certain CIDR), you can briefly open the firewall and then run: sudo certbot renew --cert-name cobs-genome-search-01.mgnify.org

genome-search's People

Contributors

mberacochea avatar sandyrogers avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.