Coder Social home page Coder Social logo

mg-rast / mg-rast Goto Github PK

View Code? Open in Web Editor NEW
44.0 11.0 28.0 16.09 MB

The MG-RAST Backend -- the API server

Home Page: http://www.mg-rast.org

License: BSD 2-Clause "Simplified" License

Makefile 0.30% Perl 74.46% Shell 0.14% JavaScript 19.24% Python 2.99% CSS 1.21% PLpgSQL 0.26% R 1.38% Dockerfile 0.04%
metagenomics metagenomic-analysis metagenome-statistics

mg-rast's Introduction

MG-RAST source code

This is the repository for the MG-RAST metagenome analysis system. Take a look at MG-RAST.

WARNING

Don't try this at home.

LICENSE

MG-RAST is made available under a BSD type LICENSE, see the LICENSE file for details.

Please note: The MG-RAST team is dedicated to supporting the

server at http://www.mg-rast.org, we are not resourced to help with local installations. So as much as we'd like to we can't help with local installations of this software.

REQUIREMENTS

Hardware

MG-RAST is a pipeline, an archive, a complex web interface and several other tools. The entire systems was designed for a Linux/Unix system. We run it on a dedicated small cluster for the server infrastructure and heavily utilize CLOUD computing resources.

Systems-Software

  1. MySQL
  2. Cassandra
  3. Perl
  4. Python
  5. R
  6. Apache
  7. NGINX

For the bioinformatics software and databases used in MG-RAST please see our manual: http://help.mg-rast.org

INSTRUCTIONS

type make

API server

Build image and push to dockerhub:

git clone https://github.com/MG-RAST/MG-RAST.git
cd MG-RAST
docker build -t mgrast/api-server:dev .

docker push mgrast/api-server:dev

Get config: (private mcs git repo, for details see fleet unit)

if cd /home/core/mgrast-config; then git pull; else cd /home/core/ ; git clone [email protected]:mgrast-config.git ; fi

Download data

docker run -t -i --name api -v /media/ephemeral/api-server-data:/m5nr mgrast/api /MG-RAST/bin/download_m5nr_blast.sh
docker rm api

Start container:

docker run -t -i --name api  -v /home/core/mgrast-config/services/api-server:/api-server-conf -v /media/ephemeral/api-server-data:/m5nr -p 80:80 mgrast/api-server /usr/local/apache2/bin/httpd -DFOREGROUND -f /MG-RAST/conf/httpd.conf

mg-rast's People

Contributors

danielolson5 avatar droppenheimer avatar folker avatar jaredbischof avatar jaredwilkening avatar sage-service-user avatar teharrison avatar wgerlach avatar wilke avatar wltrimbl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mg-rast's Issues

Broken link

In the README.md file, it's written: "For the bioinformatics software and databases used in MG-RAST please see our manual: ftp://ftp.mg-rast.org/manual.pdf"
However, the link ftp://ftp.mg-rast.org/manual.pdf no longer works.

metadata/update corrupts unicode fields

I've made some test fixtures in the form of valid metadata spreadsheets with and without unicode.

Unicode in the spreadsheets will pass the validator, but will be corrupted when calling metadata/update with [email protected]

The tests can be run with

py.test test_metadata_update.py 

No Documentation

No documentation is provided on how to build/run this project.

/search update needs admin override

The /search resource will update the elasticsearch database (and indexes) and is used when metadata is updated or when jobs are created or destroyed.

/search requires consistent job data, however, and fails to update some jobs. Jobs that have corrupt project_id or were deleted but still have elasticsearch records cannot be updated.

API needs a /search force update for admins, perhaps a DELETE HTTP request, that removes elasticsearch records without requiring consistent job data.

Cryptic error if no memcached server running

 py.test -k get_proj_metadata 

{"ERROR":"resource request failed\nBad arg length for Socket::pack_sockaddr_in, length is 0, should be 4 at /usr/lib/x86_64-linux-gnu/perl/5.24/Socket.pm line 157.\n\n"}

This error is caused by memcached not running, and originates from metagenome.pm:

line 266 $self->return_cached();

The code that implements it looks like it was intended to be optional and fall back to not croaking, but this is not what it does.

Some calls can be rescued by setting nocache=1, but running a memcached server on port 11211 fixes this problem.

project.pm corrupts unicode

The project editor submits edits to the project metadata via project/updatemetata. It does not correctly update projects when the project metadata contains unicode. This seems to be the fault of the API in src/MGRAST/lib/resources/project.pm updatemetadata is being called but the multipart form data is being interpreted by perl as latin-1.

According to
https://stackoverflow.com/questions/26634469/how-to-use-utf-8-in-cgi-scripts?rq=1 :
and
https://stackoverflow.com/questions/25981812/how-to-use-utf-8-in-a-perl-cgi-bin-script
It is the "-utf8" pragma in CGI that will cause inputs to be interpreted as UTF8:

use CGI '-utf8';

The API-testing has a test for this problem:

py.test test_project_updatemetadata.py

Questions about the JSON data vs Graphed data in the online server (Plus Shannon discrepancy)

HI,

I am writing my thesis partly with MG-Rast (and getting a taste of metagenomic analysis as stressing as it has been) and i realized an inconsistency in the "hits" on the Json and the "hits" on the graphed data, after re-reading the documentation i got to the conclusion it must be divided (or a reducing of some kind) by the abundance. My issue is that i cannot find the abundance in any file, and as such i cannot correlate one with the other, and in the documentation, they talk about an avundace column in the tables... but i dont have any table in the server, and those made with the API-JSON do not show abundance.

image

Here is obvious the discrepancies, and even the relevance is different. I tried checking withe CSV file that one can download per statistic, but the information is not particularly understandable
image

Another issue is that assuming that the species list present in the JSON are not modified later, then there is something weird with the Shannon index operation: i calculated the Shannon independently and i got the same value MG-Rast gives me if i use ln instead of log10 (and in the documentation it is explained that the alpha diversity is given by 10^Shannon).

I am probably just be mixing it all up, in which case i am sorry but i can´t find my answer anywhere else.

I kindly beg for you help and thank you for your time,

Rebeca H

excel support needs maintenance

validate_metadata looks like it uses the openpyxl library for xlsx and xlrd for older xls formatted spreadsheets. This is not the case. In fact, the MG-RAST openpyxl parser never works, and xlsx files are read by xlrd as a fallback.

An update to xlrd in December 2020 dropped support for the newer xlsx format
https://xlrd.readthedocs.io/en/latest/changes.html
and that broke all the validate_metadata tests.

PR #1449 pins the version of xlrd to 1.2.0 to prevent xlsx support from breaking immediately, but the code to handle xlsx needs to be fixed.

Cryptic error if no elasticsearch running

py.test -k test_apix_metagenome

{"ERROR":"resource request failed\nUnsupported object class 'Project' in database 'WebAppBackend'. at /MG-RAST/site/lib/PPO/DBMaster.pm line 444.\n\n"}

This error originates from lines 840-843 of metadata.pm, updating elasticsearch when no elasticsearch server is running. The database object is a red herring.

840 # update elasticsearch
841 foreach my $mgid (@$added) {
842 $self->upsert_to_elasticsearch_metadata($mgid);
843 }

MG-RAST database

I wonder if it's possible to retrieve the total length of every protein that has returned a match. So far, I have only found info on where the hit starts and where it ends, but not the total protein length.

Also, is it possible to use as the protein database only a subset of the actual database, preventing some proteins to return a match, as it would only be looking for a hit in a smaller database?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.