Coder Social home page Coder Social logo

complexphenotypes's Introduction

dbGaPdb

A searchable database of sequencing and phenotype data

Hackathon Team: David McGaughey, Filip Cvetkovski, Michelle Miron, Robert Butler, Sean King, Luning Hoa, Sean Davis and Ben Busby

Intro

The Complex Phenotypes database is a relational database that enables users to find what data sets are available for download based on the phenotype and type of data they are interested in from NCBI's Sequence Read Archive and Genotypes and Phenotypes databases. These are the largest public repositories of phenotpye and sequencing data. However, currently finding data of interest by phenotype is challenging. Searchable Complex Phenotypes is a way to make metadata more easily accessible.

This repository contains a R package that allows you access all pulic metadata to explore what data is available. You can do this in two ways. One is to the query the database in R and second is to use a shiny app to query the database.

Example Query

Quick start

  1. Query examples in R
  2. shiny app via R studio or web

Web Query

Installation

Dependencies

Further Use

complexphenotypes's People

Contributors

davemcg avatar dcgenomics avatar fcve avatar michellemiron avatar seandavi avatar seanking94 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

complexphenotypes's Issues

Migrating R package to "clean" repository?

Does anyone mind if I move the R package to a new repo? This will reduce file size (important for fast install) and put the R package as the top-level directory making development a bit easier.

dbGaP study info dump json file is available on ftp

The following is the email sent to David. Put it over here for the record.
James


From: James L. Hao [email protected]
Date: Wed, Aug 16, 2017 at 8:04 PM
Subject: Re: empty files ftpDownload dbgapr
To: "McGaughey, David (NIH/NEI) [E]" [email protected]

Hi David,

  1. The dbGaP study info dump json is available now through the following ftp. The
    ftp://ftp.ncbi.nlm.nih.gov/dbgap/r-tool/public_datadump/

You may go through the sample file again. It includes 4 studies. 2 of them are root studies, another 2 are sub-studies.
sample_dbgap_study_info_dump_pretty.json

The fields 'is_root', 'has_child', and 'has_parent' can help to identify parent-child relationship.

The sample and subject count of chip info will be added later.

  1. I looked into the empty ftp files issue of phs000803.v1.p1. It turns out that the respective database tables are not loaded. It happens occupationally because of all kind of reasons. You may simply ignore them in this case. It should be populated sometime later in most of cases.

If you search for phs000803 through the Advanced Search (URL below), you will see 0 variable returned, which confirms that the variable related table is indeed empty.
https://www.ncbi.nlm.nih.gov/projects/gapsolr/facets.html

Please do not hesitate to write back if you have any questions.

Keep in touch.

Cheers,
Luning

Writing dbGap metadata includes a newline, which splits a record

phs000007/phs000007.v13/supplemental_data/phs000007.v13_study_variable_code_value.txt.gz

readLines('phs000007/phs000007.v13/supplemental_data/phs000007.v13_study_variable_code_value.txt.gz')[14002:14005]

Note how the last record is split into two lines. This breaks reading the record as a tsv file. Records may need to be written as csv with quotes or have newlines stripped before writing. Alternatively (and this might be better), if dbGaPR pulls data directly into R without file creation, just use that rather than writing files.

[1] "7\t13\tphs000007.v13\t4117\t1\tphv00004117.v1\t1\tNORMAL\t2"
[2] "7\t13\tphs000007.v13\t4117\t1\tphv00004117.v1\t2\tPOSSIBLE DEMENTIA\t1"
[3] "7\t13\tphs000007.v13\t4117\t1\tphv00004117.v1\t3\tFACTORS SUCH AS ILLITERACY, NOT"
[4] " FLUENT IN ENGLISH, OR DEPRESSION THAT CAUSES POOR TESTING\t4"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.