Coder Social home page Coder Social logo

chordomics's Introduction

Build Status

Chordomics

Screenshot

Chordomics is a tool to visualize and interpret linked data, such as from metagenomics or metaproteomics where both taxonomic and functional data is obtained.

Installation

To install from the commandline, download and run the install script:

curl -o run_chordomics.R  https://raw.githubusercontent.com/KevinMcDonnell6/chordomics/master/run_chordomics.R && Rscript run_chordomics.R

To install from an R session, one must first have the devtools package installed as follows:

install.packages("devtools")

Then to download the chordomics package use the following command:

devtools::install_github('KevinMcDonnell6/chordomics')

Running Chordomics

Preparing your metaproteomics data

The input data must be .csv format, with column names. The output from programs like MPA^[https://github.com/compomics/meta-proteome-analyzer] is ideal; It will look for columns "Proteins", containing one or more Uniprot accessions.

Preparing your metagenomics data

You will need a run ID from MG-RAST^[http://www.mg-rast.org/].

Using the Chordomics viewer

Next, launch the app!

chordomics::launchApp()

Once the app is running, follow the steps to preprocess, load, and view your data!

For detailed step-by-step walkthroughs look into the "Walkthroughs" folder above.

Data Processing

The App can handle both MG-RAST data as well as MetaProteomeAnalyzer (MPA) files.

For metagenomic or metatranscriptomic data first upload your samples to MG-RAST. Then enter your MG-RAST ID into chordomics.

For metaproteomics data upload your MPA files one at a time for processing,

Visualising the data

The Plot tab is where the user is able to view the data they have loaded into the App.

  • Clicking Load Example Data shows already processed data for the user to experiment with.
  • The datasets can be viewed together (default) or individually by selecting the name of the dataset on the left panel.
  • Selecting a taxonomic rank from the panel changes the rank shown on the plot.
  • Selecting a taxonomic group on the chord diagram (e.g. "Bacteria" for the example data) selects only that taxon. Changing the rank now allows the user to view the subtaxa of their selection.
  • Similarly the functions can have a hierarchical structure. The example data is labelled with functional categories ("COG_Category") and their COG ("COG_Name"). This can be applied to other annoatations such as KEGG, if given the appropriate headings by the user.

What it is doing

with metaproteomics data

Given a metaproteomics csv file, Chordomics gets functional data from UniProt, which is then saved to a chordomics folder in your home directory. This makes it easier to re-run analyses. Certain check are performed to make sure all the required fields have data, and the cleaned data with the COG annotations is returned to be downloaded.

with metagenomic/metatransciptomic data

Given an MG-RAST ID (usually starting with "mgm"), the taxonomy and function annotations are downloaded. Be warned -- this can take a long time. For now, please only use datasets from assembled metagenomes, rather than just reads. The datases are combined -- retaining only the sequences for which both functional and taxonomic annotations are available. The COGs are assigned, NCBI taxids are linked, and the data is returned to be downloaded.

Troubleshooting

Chordomics input data

Chordomics currently requires the input to have at least one taxonomic column ("Superkingdom","Kingdom","Phylum","Class","Order","Family","Genus","Species"), and one or more of the following: "COG_Category" "COG_Name". This consistency allows us to handle the hierarchical nature of both the functional and taxonomic data. If you wish to display different types of data, we incorporated SVG downloading via Crowbar^[http://nytimes.github.io/svg-crowbar/].

Metaproteomic utility input data

The fields in the csv file should be quoted, as the lists of Uniprot accessions are also comma-separated in the output from the MPA. So, ensure the files are quoted, commas are used as the separator, and commas are also used as the within-field seprator. The following headers are required:

"Superkingdom","Kingdom","Phylum","Class","Order","Family","Genus","Species", "Proteins"

"Proteins" should contain one or more UniProt accessions, separated by commas; any extra headers are ignored.

Metagenomic/metatranscriptomic utility input data

If your selected MG-RAST id is running slowly, it is likely due to the time it takes to download the data files. Sadly, MG-RAST does not provide any given file with both taxonomic and functional information, so we have to download both and merge them. Try with a small dataset first, such as "mgm4762935.3".

chordomics's People

Contributors

kevinmcdonnell6 avatar nickp60 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.