Coder Social home page Coder Social logo

lncrnas's Introduction

The lncRNA landscape of Candida yeast pathogens

This GitHub page and directory contains all scripts and data files necessary to fully reproduce the results of the paper by Hovhannisyan and Gabaldon "The lncRNA landscape of Candida yeast pathogens".

The directory contains several sub-directories for different types of analyses, which we describe below. Each sub-directory contains annotated scripts, data files and README.txt files describing the entire analysis. If analyses are done in R, the sub-directories also contain sessionInfo.txt files listing all R packages and versions. Note that most of the analyses are computationally very demanding and, if done from start to end, the project might require ~10 terabytes of space. Hence for some of the steps we highly recommend using HPC resources.

Below you can find general descriptions for each sub-directory in the order by which they need to be executed.

  1. data_fetching_and_QC_and_strand_detection - this is the initial part of the entire workflow containing scripts and data files necessary to obtain, perform QC and trimming of all public RNA-Seq datasets used in our study. Note that downloading and processing the data is very time and space consuming since it encompasses >2600 RNA-Seq samples.
  2. mapping_assembly_lncRNA_prediction - the next step of the workflow performs (among other analyses) mapping of RNA-Seq data to the corresponding reference genomes (which are located in reference_genomes_and_annotations folder), genome-guided transcriptome assembly, transcriptomes merging and coding potential assessments, which altogether constitute the process of lncRNA predictions. Again this step is computationally very expensive and ideally requires HPC resources.
  3. DE_analysis - this is the third step of the workflow (although the results of these analyses are described at the end of the paper) which performs differential expression analysis of lncRNAs across the time course of epithelial cell infection. These analyses are performed in R and can be done using an ordinary desktop computer.
  4. family_classification_and_motif_analysis - this folder contains scripts and data for performing lncRNA family classifications based on BLAST reciprocal hits, secondary structure and synteny. Additionally it performs motif enrichment analysis. Structural analysis is computationally very demanding and would require an HPC and 3-4 terabytes of space.
  5. repeat_calling - this folder contains scripts for performing repeat calling and finding the number of lncRNAs overlapping repeats. The final results of this analysis are also available in the next directory to perform final visualization.
  6. ploting_and_network_analysis - the final step of the workflow is used to perform most of the plotting tasks of the entire project, along with functional and network analysis. This step is done in R and can be executed in a normal desktop (although the network analysis might be quite slow).

Each of the above described sub-directories has its own README.txt files with more detailed descriptions on how to execute the scripts and which software are necessary. Note that the entire workflow uses a lot of different software (and obviously many more dependencies), so we would recommend to install and handle them using package managers such as conda.

Also note that for the data analysis convenience we use shortened names of the studied species throughout the workflow, which appear in file/folder names. Those are: calb - C. albicans; ctrop - C. tropicalis; cpar - C. parapsilosis; caur - C. auris; cglab - C. glabrata;

If there are any questions or requests, please email them to [email protected].

lncrnas's People

Contributors

granthov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.