Coder Social home page Coder Social logo

gmrepoprogrammableaccess's People

Contributors

cshine0907 avatar evolgeniusteam avatar zhujiaying1998 avatar zoexfq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gmrepoprogrammableaccess's Issues

mapping microbiome->phenotype labels by run/sample id

Hi, I have downloaded the microbial abundances and phenotype information. But I can't find a data dictionary or a way to map the run/sample ids to the microbiome+phenotype data (e.g., I want to be able to map the microbial abundances to the phenotypes of each run/sample so I can have labeled data for testing a machine learning classifier). How can I do that by using the programmable access tool?

Cannot access to GMrepo 10 november 2022.

Hi,

Is there is some maintenance task of the GMrepo? I'm trying to enter from all the day from different browsers but in all the cases, the session is time out and does not appear the GMrepo main page. Also I cannot access to specific projects...

Someone else have the same problem?

I've googled if exists some incidence but I did not find any issue and therefore I write here. Apologise.

Thanks,

HTTP 500 error when fetching abundance through API

Hi, I get an HTTP 500 error when getting data through Python API for certain projects. The following code is an example to reproduce this error.

import json
import requests

mesh_id = "D008103"
project_id = "PRJNA431746"
query = {"mesh_id": mesh_id, "project_id": project_id}

# Query data
url = 'https://gmrepo.humangut.info/api/getMicrobeAbundancesByPhenotypeMeshIDAndProjectID'
post_result = requests.post(url, data=json.dumps(query))

print(post_result)
# <Response [500]>
print(post_result.text == "")
# True

Curated projects

Is there a way to get a list of curated projects through the API?

relative abundance to counts

Hello, and thanks for your repository.
I have been looking at the data, and I see that all abundances accessible are relative abundance. I haven't found any way to download counts data. Is count data available??
Also, as I am interested in count abundances, looking for a way to calculate it from relative abundance, I found a "nr_reads_sequenced," and I wonder what that number represents. For example, when it is metagenomic data, nr_reads_sequenced means all the oligonucleotides sequenced or bins?

Query full abundance of certain run

Hi I follow the API docs of Get relative species/genus abundances for a sample/run but only retrieved run information.

code

query = {"run_id":"ERR475468"}  
url = 'https://gmrepo.humangut.info/api/getRunDetailsByRunID'
data = requests.post(url, data=json.dumps(query)).json()

## --get run List
run = data.get("run")

## --get DataFrames
species = pd.DataFrame(data.get("species"))
genus = pd.DataFrame(data.get("genus"))

reponse data

{'run': {'project_id': 'PRJEB6070',
  'original_sample_description': 'Potential of fecal microbiota for early stage detection of colorectal cancer',
  'run_id': 'ERR475468',
  'experiment_type': 'Amplicon',
  'instrument_model': 'Illumina',
  'nr_reads_sequenced': None,
  'host_age': 74,
  'sex': None,
  'BMI': 27,
  'country': 'France',
  'longitude': None,
  'latitude': None,
  'loaded_uid': 54204,
  'QCStatus': 0,
  'QCMessage': 'a single taxon  unknown  account for 100 percent of abundance, which is too much!!',
  'Original_Project_description': 'Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved >45% relative to the FOBT while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early and late-stage cancer and could be validated in independent patient and control populations (N=335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host-microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients accompanied by an increase of lipopolysaccharide metabolism. '},
 'phenotypes': [{'disease': 'D006262', 'term': 'Health'}],
 'phenotypes_exist': True}

Could you please help me retrieve full info?

API and website report different abundance profiles

Retrieving an abundance profile using the API and the website return different abundance profiles.

When using getRunDetailsByRunID as described in the documentation, the abundance profile for sample ERR475468 is as follows:

scientific_name relative_abundance
Others 33.1793295
Unknown 30.255
Ruminococcus bromii 11.7594
Faecalibacterium sp. MC_41 5.4017
Bacteroides vulgatus 3.96235
[Eubacterium] eligens 3.09785
Bacteroides uniformis 2.79168
Escherichia coli 2.6621
Sphingomonas sanguinis 2.39532
Dialister invisus 2.33688
Sphingomonas paucimobilis 2.15839

However, when downloading the relative species abundance table as a TSV from the website, the abundance profile for sample ERR475468 is as follows:

relative_abundance scientific_name
30.255 Unknown
11.7594 Ruminococcus bromii
5.4017 Faecalibacterium sp. MC_41
3.96235 Bacteroides vulgatus
3.09785 [Eubacterium] eligens
2.79168 Bacteroides uniformis
2.6621 Escherichia coli
2.39532 Sphingomonas sanguinis
2.33688 Dialister invisus
2.15839 Sphingomonas paucimobilis
2.04215 Oscillibacter valericigenes
1.88589 Blautia obeum
1.88018 Streptococcus salivarius
1.09825 Methanobrevibacter smithii
0.971213 Streptococcus mutans
etc etc

When downloading data from the website, the taxonomic breakdown of the "Others" group is reported. I am interested in 100s of samples and don't want to download their profiles manually.

How can I retrieve the full taxonomic profile programatically?

Incomplete curated projects

Hi there,

This is an extension to #8, where getCuratedProjectsList method was added to the API.

Here's the code I used to fetch the list of curated project IDs.

def get_curated_project_ids():
    query = {}
    url = 'https://gmrepo.humangut.info/api/getCuratedProjectsList'
    content = requests.post(url, data=json.dumps(query))

    project_id_set = set([x["project_id"] for x in content.json()])
    return project_id_set

Upon running this code, I manually verified if the curated project IDs are included in the output. For example, PRJEB1775 is a project involving metagenomics samples with diarrhea. However,

pid_set = get_cureated_project_ids()
"PRJEB1775" in pid_set
# False

Is it possible that getCuratedProjectsList returns an incomplete list of project IDs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.