mg-rast / mg-rast Goto Github PK

View Code? Open in Web Editor NEW

44.0 11.0 28.0 16.09 MB

The MG-RAST Backend -- the API server

Home Page: http://www.mg-rast.org

License: BSD 2-Clause "Simplified" License

Makefile 0.30% Perl 74.46% Shell 0.14% JavaScript 19.24% Python 2.99% CSS 1.21% PLpgSQL 0.26% R 1.38% Dockerfile 0.04%

metagenomics metagenomic-analysis metagenome-statistics

mg-rast's Introduction

MG-RAST source code

This is the repository for the MG-RAST metagenome analysis system. Take a look at MG-RAST.

WARNING

Don't try this at home.

LICENSE

MG-RAST is made available under a BSD type LICENSE, see the LICENSE file for details.

Please note: The MG-RAST team is dedicated to supporting the

server at http://www.mg-rast.org, we are not resourced to help with local installations. So as much as we'd like to we can't help with local installations of this software.

REQUIREMENTS

Hardware

MG-RAST is a pipeline, an archive, a complex web interface and several other tools. The entire systems was designed for a Linux/Unix system. We run it on a dedicated small cluster for the server infrastructure and heavily utilize CLOUD computing resources.

Systems-Software

MySQL
Cassandra
Perl
Python
R
Apache
NGINX

For the bioinformatics software and databases used in MG-RAST please see our manual: http://help.mg-rast.org

INSTRUCTIONS

type make

API server

Build image and push to dockerhub:

git clone https://github.com/MG-RAST/MG-RAST.git
cd MG-RAST
docker build -t mgrast/api-server:dev .

docker push mgrast/api-server:dev

Get config: (private mcs git repo, for details see fleet unit)

if cd /home/core/mgrast-config; then git pull; else cd /home/core/ ; git clone [email protected]:mgrast-config.git ; fi

Download data

docker run -t -i --name api -v /media/ephemeral/api-server-data:/m5nr mgrast/api /MG-RAST/bin/download_m5nr_blast.sh
docker rm api

Start container:

docker run -t -i --name api  -v /home/core/mgrast-config/services/api-server:/api-server-conf -v /media/ephemeral/api-server-data:/m5nr -p 80:80 mgrast/api-server /usr/local/apache2/bin/httpd -DFOREGROUND -f /MG-RAST/conf/httpd.conf

mg-rast's People

Contributors

Stargazers

Watchers

mg-rast's Issues

Broken link

In the README.md file, it's written: "For the bioinformatics software and databases used in MG-RAST please see our manual: ftp://ftp.mg-rast.org/manual.pdf"
However, the link ftp://ftp.mg-rast.org/manual.pdf no longer works.

metadata/update corrupts unicode fields

I've made some test fixtures in the form of valid metadata spreadsheets with and without unicode.

Unicode in the spreadsheets will pass the validator, but will be corrupted when calling metadata/update with [email protected]

The tests can be run with

py.test test_metadata_update.py

No Documentation

No documentation is provided on how to build/run this project.

/search update needs admin override

The /search resource will update the elasticsearch database (and indexes) and is used when metadata is updated or when jobs are created or destroyed.

/search requires consistent job data, however, and fails to update some jobs. Jobs that have corrupt project_id or were deleted but still have elasticsearch records cannot be updated.

API needs a /search force update for admins, perhaps a DELETE HTTP request, that removes elasticsearch records without requiring consistent job data.

Cryptic error if no memcached server running

 py.test -k get_proj_metadata

{"ERROR":"resource request failed\nBad arg length for Socket::pack_sockaddr_in, length is 0, should be 4 at /usr/lib/x86_64-linux-gnu/perl/5.24/Socket.pm line 157.\n\n"}

This error is caused by memcached not running, and originates from metagenome.pm:

line 266 $self->return_cached();

The code that implements it looks like it was intended to be optional and fall back to not croaking, but this is not what it does.

Some calls can be rescued by setting nocache=1, but running a memcached server on port 11211 fixes this problem.

project.pm corrupts unicode

The project editor submits edits to the project metadata via project/updatemetata. It does not correctly update projects when the project metadata contains unicode. This seems to be the fault of the API in src/MGRAST/lib/resources/project.pm updatemetadata is being called but the multipart form data is being interpreted by perl as latin-1.

According to
https://stackoverflow.com/questions/26634469/how-to-use-utf-8-in-cgi-scripts?rq=1 :
and
https://stackoverflow.com/questions/25981812/how-to-use-utf-8-in-a-perl-cgi-bin-script
It is the "-utf8" pragma in CGI that will cause inputs to be interpreted as UTF8:

use CGI '-utf8';

The API-testing has a test for this problem:

py.test test_project_updatemetadata.py

Questions about the JSON data vs Graphed data in the online server (Plus Shannon discrepancy)

HI,

I am writing my thesis partly with MG-Rast (and getting a taste of metagenomic analysis as stressing as it has been) and i realized an inconsistency in the "hits" on the Json and the "hits" on the graphed data, after re-reading the documentation i got to the conclusion it must be divided (or a reducing of some kind) by the abundance. My issue is that i cannot find the abundance in any file, and as such i cannot correlate one with the other, and in the documentation, they talk about an avundace column in the tables... but i dont have any table in the server, and those made with the API-JSON do not show abundance.

Here is obvious the discrepancies, and even the relevance is different. I tried checking withe CSV file that one can download per statistic, but the information is not particularly understandable

Another issue is that assuming that the species list present in the JSON are not modified later, then there is something weird with the Shannon index operation: i calculated the Shannon independently and i got the same value MG-Rast gives me if i use ln instead of log10 (and in the documentation it is explained that the alpha diversity is given by 10^Shannon).

I am probably just be mixing it all up, in which case i am sorry but i can´t find my answer anywhere else.

I kindly beg for you help and thank you for your time,

Rebeca H

excel support needs maintenance

validate_metadata looks like it uses the openpyxl library for xlsx and xlrd for older xls formatted spreadsheets. This is not the case. In fact, the MG-RAST openpyxl parser never works, and xlsx files are read by xlrd as a fallback.

An update to xlrd in December 2020 dropped support for the newer xlsx format
https://xlrd.readthedocs.io/en/latest/changes.html
and that broke all the validate_metadata tests.

PR #1449 pins the version of xlrd to 1.2.0 to prevent xlsx support from breaking immediately, but the code to handle xlsx needs to be fixed.

MG-RAST site search shows much more results than api

MG-RAST site search shows > 1000 results
http://www.mg-rast.org/mgmain.html?mgpage=search&search=antarctica

while api search shows only 22
http://api.mg-rast.org/metagenome?country=Antarctica&limit=30

What am I doing wrong?)

Thank you in advance.

Cryptic error if no elasticsearch running

py.test -k test_apix_metagenome

{"ERROR":"resource request failed\nUnsupported object class 'Project' in database 'WebAppBackend'. at /MG-RAST/site/lib/PPO/DBMaster.pm line 444.\n\n"}

This error originates from lines 840-843 of metadata.pm, updating elasticsearch when no elasticsearch server is running. The database object is a red herring.

840 # update elasticsearch
841 foreach my $mgid (@$added) {
842 $self->upsert_to_elasticsearch_metadata($mgid);
843 }

test bug

this is an empty bug, fix it

MG-RAST database

I wonder if it's possible to retrieve the total length of every protein that has returned a match. So far, I have only found info on where the hit starts and where it ends, but not the total protein length.

Also, is it possible to use as the protein database only a subset of the actual database, preventing some proteins to return a match, as it would only be looking for a hit in a smaller database?

Thank you.