Coder Social home page Coder Social logo

agam-vgsc-report's Introduction

agam-vgsc-report

Build Status

This is a work in progress. It has not been reviewed, approved or endorsed by anyone. If you have any questions, please contact Alistair Miles ([email protected]) or Chris Clarkson ([email protected]).

Contributor guide

Getting started

The following steps describe how to set up a development environment for working on the manuscript and the supporting notebooks.

Step 1: Fork this repository into your own GitHub account.

Step 2: Clone your fork to your local system. E.g.:

$ git clone [email protected]:alimanfoo/agam-vgsc-report.git
$ cd agam-vgsc-report
$ git submodule update --init --recursive
$ git remote add upstream [email protected]:malariagen/agam-vgsc-report.git

...replacing 'alimanfoo' with your GitHub username.

Step 3: Install dependencies (Miniconda, TeXLive):

From the repo working directory, run:

$ ./agam-report-base/install/install-conda.sh

This will install Miniconda into the deps directory within the repository root directory.

Then run:

$ ./agam-report-base/install/install-texlive.sh

This will install TexLive into the deps directory within the repository root directory.

Building the manuscript

To build the manuscript, from the repository root directory, run:

$ source env.sh
$ ./latex.sh

This should rebuild the file main.pdf.

Data dependencies

To build supporting data for the manuscript, you will need some files from the Ag1000G FTP site copied to your local filesystem. The data build assumes you have a mirror of the necessary files under a directory called ngs.sanger.ac.uk within the repository root directory.

Working on the manuscript

The following steps describe how to do some work on the manuscript and contribute the work back to the MalariaGEN (upstream) repository.

Step 1: Make sure your master branch is synchronized with upstream master:

$ git checkout master
$ git pull
$ git fetch upstream
$ git rebase upstream/master
$ git push

Step 2: Create a branch to put your work in, e.g.:

$ git checkout -b edit-results-section
$ git push -u origin edit-results-section

...replacing "edit-results-section" with an appropriate branch name for the work you want to do.

Step 3: Do some work, then add, commit and push, e.g.:

$ # edit main.tex
$ git add main.tex
$ git commit -m 'corrected typo in results paragraph 1'
$ git push

Step 4: When the work is ready for review, build the manuscript locally:

$ source env.sh
$ ./latex.sh

Then make sure all local changes are committed and pushed up to your remote branch:

$ git status
$ # if anything to commit...
$ git commit -a -m 'rebuild'
$ git push

Then go to github.com and create a pull request from the branch on your repository to malariagen/agam-vgsc-report master branch.

Running a Jupyter notebook server

The install script will install Miniconda locally and create an environment with various scientific Python packages installed. To launch a Jupyter notebook server:

$ source env.sh
$ jupyter notebook

Running the repo notebooks

Due to dependencies between the notebooks in this repository, if you would like to re-run the analyses from this projects you will need to download the Ag1000g phase1 data required from ftp then run the python notebooks within the "notebooks/" directory in the following sequence:

run data notebooks first in the following order

  1. data_phasing_extra_phase1.ipynb - uses mvncall to phase two multiallelics and N1570Y SNP filtered out of PASS dataset.

  2. data_combined_haplotypes.ipynb - combines the haplotype data with extras phased by mvncall.

  3. data_variants_phase1.ipynb - extracts data on all VGSC mutations.

  4. data_misc.ipynb - brings some small data files directly into Git repo.

  5. table_variants_missense.ipynb

run the following three artwork notebooks next in this order - they generate some data files used later

  1. artwork_hierarchical_cluster_vgsc.ipynb - perform hierarchical clustering based on technique used in Ag1000g Nature paper and produces figure.

  2. artwork_median_joining_networks.ipynb - performs median joining network analysis and produces figures.

  3. artwork_ehh_decay.ipynb - performs ehh analysis using hierarchical clustering haplotype clusters.

once the above have been run the following notebooks can then be run in any order

artwork_ld.ipynb - runs LD analysis and produces heatmap figure.

artwork_hapfreq_map.ipynb - generates network cluster map artwork.

artwork_assay_design.ipynb - analyses number of SNPs required in genetic assay to define haplotype groups and generates figure.

analyse_Dxy_using_hierarchical_clusters.ipynb - analyses divergence between clusters and produces figures.

supp_tables.ipynb - generates supplementary data tables.

analyse_moving_haplotype_homozygosity.ipynb - does what it says on the tin.

table_variants_missense_display.ipynb - tables for LaTeX

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.