Coder Social home page Coder Social logo

tadkeys / tabsat Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 3.0 481.54 MB

Targeted Amplicon Bisulfite Sequencing Analysis Tool

Shell 5.15% Python 1.06% Dockerfile 0.02% Perl 47.33% HTML 10.56% R 0.09% Makefile 1.39% Batchfile 0.07% C++ 25.26% C 0.14% Smarty 8.44% CSS 0.09% GAP 0.38%
genome fastq dockerfile methylation bisulfite bisulfite-sequencing targeted-sequencing

tabsat's Introduction

NEW VERSION OF TABSAT

Please check out our new version of TABSAT.
-> https://tabsat.ait.ac.at



TABSAT

TABSAT - Targeted Amplicon Bisulfite Sequencing Analysis Tool - is a tool for analyzing targeted bisulfite sequencing data generated on an Ion Torrent PGM / Illumina MiSeq. It performs

  • Quality Assessment
  • Alignment using Bismark
  • Result aggregation into a table
  • Visualization as lollipop plots

Available as

  • Fully configured Docker image Dockerfile - see usage information below.
  • Source code

Collaboration

Please contact us if you need help running your analyses. Also we have developed an extended version for our collaborators with the following additional features:

  • Interactive web-based visualization
  • Download FASTA of target regions
  • Strand specific CpGs
  • Automatic mapping of primers
  • Restriction enzyme positions
  • Start using web frontend
  • Pattern visualization and analysis

Publication

TABSAT is published:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0160227

Example usage

${TABSAT} -l NONDIR -g hg19 -q 20 -m 10 -p 0.8 -r 0 -t target.csv -a tmap -o output_dir input.fastq

-t Targetlist in CSV format example [mandatory] - Strand can be "+", "-", "+/-"
-e Sequencing library - SE/PE (PE reads must be called *_1.fastq, *_2.fastq)
-g Genome (hg19, mm10)
-l Library mode of bisulfite experiment
-a [optional] Specify the aligner that should be used
-m [optional] This parameter is used for filtering reads that are shorter than the given threshold.
-q [optional] Bases that are below the given threshold are removed from the 3’ end of the reads (read trimming)
-p [optional] Percent of target covered by a read for pattern creation. This value specifies the percent of the target that needs to be covered by a read to include it for pattern analysis.
-r: [optional] Minimum number of mapped reads that need to be present at each CpG site.
-s: [optional] Sorted list of samples that is used to specify the order in the lollipop plots.
-o Output directory
-d Directory of inputfiles (absolute path); if not specified, the input files are added at the end [optional]

Examples

Test with input file directory

tabsat -l NONDIR -g hg19 -t target.csv -d test_input_dir -a tmap -o test_output_dir

Test with separate input files

tabsat -l NONDIR -g hg19 -t target.csv -o test_output_files xy.fastq abs.fastq

Test data

Test data is available here

Installation

$ tabsat/reference/prepareReference.sh
  • Prepare the CpG file
apt-get install p7zip-full
7za e tabsat/tools/ait/all_cpgs_only_pos_hg19.7z
7za e tabsat/tools/ait/all_cpgs_only_pos_mm10.7z
  • Install Perl modules
    • Cairo.pm
    • Switch.pm
  • Run 'install' script in tabsat folder (installs SAMtools, Bedtools) ./install

Run example

Command line

  • After installation go to tabsat/tools/zz_test
  • Execute
./test_tabsat_tmap.sh
  • Inspect output at tabsat/tabsat_test_output

Docker

  • Build the docker file
    docker build -t tabsat:v1 .

  • Run it
    docker run -t --name tabsat -d tabsat:v1

  • Connect to docker
    docker exec -ti tabsat /bin/bash

  • Stop container
    docker stop tabsat

  • Remove container
    docker rm tabsat

  • Remove image
    docker rmi tabsat:v1

tabsat's People

Contributors

jkrainer avatar spabinger avatar tadkeys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tabsat's Issues

analysis paired end using tmap

Hi,
these days, i got paired end ampliseq data.
so i want to run bismark using tmap to align to reference.
but i guess the bismark script of tools/bismakr_tmap does not support paired end when using tmap.

So i wanna modify something in that bismark script.
would you recommend anything?
or is there some possibility to run tmap to align paired end?

Best Regards.
Jeongmin

Hard coding of "NONDIR" in patternmap.sh

Stephan,

I think I found something that should be modified in the patternmap.sh script. The following line appears in there:
SAMPLE_C="${INDIR}/COVERAGE_NONDIR_${ALIGNER}/MethylSubpopulations/Output/SampleComparison.txt

I was doing a "DIR" run so my outputs had gone to "COVERAGE_DIR_${ALIGNER}", therefore the following error message appeared at the pattern map stage (I assume due to the hard coding of "NONDIR" in the line above):

cp /media/MyBook/progs/tabsat-master/fhm_test_output_dir_all2/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/Output/SampleComparison.txt /media/MyBook/progs/tabsat-master/fhm_test_output_dir_all2/Patternmap/All_targets.txt
cp: cannot stat ‘/media/MyBook/progs/tabsat-master/fhm_test_output_dir_all2/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/Output/SampleComparison.txt’: No such file or directory

Thought you would like to know.

John

update documentation

Please provide a list of pre-requisites including:

  • Cairo.pm and Switch.pm perl modules
  • need to compile bedtools in the bedtools folder
  • other?

bugs in samtools command lines (main tabsat script)

In the main tabsat scripts, the command lines (l. 347 & 366) for samtools sort should be corrected:

samtools sort "${current_sam}_removed_cov_one.bam" "${current_sam}_removed_cov_one_sorted"

into:

samtools sort "${current_sam}_removed_cov_one.bam" > "${current_sam}_removed_cov_one_sorted.bam"

and,

samtools sort "${current_sam}_removed_cov_one.bam" "${current_sam}_removed_cov_one_sorted"

into

samtools sort "${current_sam}_removed_cov_one.bam" > "${current_sam}_removed_cov_one_sorted.bam"

I don't know if this might come from changes in samtools syntaxes, but currently it does not work with samtools 1.3.1 wich is installed on our server.

Permission denied while accessing Docker.

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/tabsat/json: dial unix /var/run/docker.sock: connect: permission denied

Unresponsive script

While running it on the test data, script reaches a point where no more STDERR or STDOUT is produced. This is the last line of the combined STDERR and STDOUT:

---- Crete target lists in patternmap
-Patternmap:` removing *.log, *.target, *.jsons
.../tabsat/tabsat_test_output_miseq/Patternmap
.../tabsat/tools/Patternmap/patternmap.sh: line 181: $i: ambiguous redirect
.../tabsat/tools/Patternmap/patternmap.sh: line 182: $i: ambiguous redirect

Script keeps running for hours without any new output. A top command shows a tr tool being run

bug in patternmap.sh script

In the patternmap.shscript, the variable SAMPLE_C contains the COVERAGE_NONDIR_tmappath, which does not exist if bowtie2 is used as aligner: SAMPLE_C="${INDIR}/COVERAGE_NONDIR_tmap/MethylSubpopulations/Output/SampleComparision.txt"

This makes the test_tabsat_miseq.shscript to crash. An ALIGNER variable should be defined (see also this issue: #4).

Install not quite correct

Hi. I've been trying to get the pipeline installed and feel like I put all the right pieces in place, but it seems a few things may not be right still, and I'm stumped. I have attached output streams from both test_tabsat_miseq.sh and test_tabsat_mouse.sh here. In the miseq test, it seems it runs fine but the .csv files produced in the latter stages are all empty except for the header line which (I think) messes up the final stages. In the mouse example it seems to have an error at the lollipop stage. In both cases, it seems to be missing some expected output files at the end, I'm presuming due to the earlier issues.

Also, sometimes running either example it seems to throw a samtools error at the sort stage even though the target file has been generated. This doesn't always happen and seems weird to me, but I ahae observed this outcome a few times now. If the sort fails then obviously downstream processes get compromised. Fortunately, this doesn't seem to happen frequently.

Can you please look over the attached files and see if you can get any hint of what might not be right with my configuration? And if you have any insight into the samtools error that would be great too.

Finally, if it's of any use for you to know, I'm running in a RedHat 7 environment with 24G of RAM, but I am doing the work on an external hard drive due to lack of space on the hard drives of the machine.

Thanks,
John Martinson

miseq.test.txt
mouse.test.txt

Script fails when not installed in $HOME

Script fails to call a few perl scripts from .../tabsat/tools/MethylSubpop/subpopulations.sh if not nstalled in $HOME
The reason is it expects that the preceding (installation) path to be $HOME/tabsat/...

unable to install in custom folder

The folder paths are hard-coded in multiple scripts, which renders impossible to install tabsat somewhere else than in /home, whithout manually editing all the .sh scripts. I would suggest to make a unique configuration file which will load all the relevant paths and variables. Using the export command might useful.

bugs in samtools command lines (check_quality.sh script)

In the check_quality.sh script, the following lines (l. 63 & l. 66) give an error:

samtools sort -o ${BAM_FILE} aa | ${INTERSECTBED} -v -a - -b ${QUALITY_DIR}/target_list.bed > ${NON_INTERSECT_BAM}

samtools sort -o ${BAM_FILE} aa | ${INTERSECTBED} -a - -b ${QUALITY_DIR}/target_list.bed > ${INTERSECT_BAM}

If I understood well, it should be replaced by:

samtools sort ${BAM_FILE} | ${INTERSECTBED} -v -a - -b ${QUALITY_DIR}/target_list.bed > ${NON_INTERSECT_BAM}

samtools sort ${BAM_FILE} | ${INTERSECTBED} -a - -b ${QUALITY_DIR}/target_list.bed > ${INTERSECT_BAM}

error running test

I have the following error when running the test for miseq (in bold). It might come from the (...)-zz_test/SRR3296596_1.fastq which should not contain the '-' before 'zz_test'. I have tried to figure out why this character was inserted, but I couldn't find so far.

  • CMD subpopulations: /home/gcristofari/tools/tabsat/tools/MethylSubpop/subpopulations.sh -i /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations -p 0.7 -t /home/gcristofari/tools/tabsat/tools/zz_test/target_list_miseq.csv
    Starting with methylation pattern analysis
    Output will be saved in /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/Output
    Whole Target for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Intermediate Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Paste intermediate Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Intermediate Subpops for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Final Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Paste final Positions for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Final Subpops for /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/COVERAGE_NONDIR_bowtie2/MethylSubpopulations/SRR3296596_trimmed_1.fastq_bismark_bt2_pe.sam
    Comparision of first and last methylation positions in all samples
    Finding methylation subpopulations
    Done with workflow
    mv: cannot stat '/home/gcristofari/tools/tabsat/tools/-zz_test/SRR3296596_1.fastq /home/gcristofari/tools/tabsat/tools/zz_test/SRR3296596_2.fastq': No such file or directory

  • CMD patternmap: /home/gcristofari/tools/tabsat/tools/Patternmap/patternmap.sh -i /home/gcristofari/tools/tabsat/tabsat_test_output_miseq -s /home/gcristofari/tools/tabsat/tabsat_test_output_miseq/copied_inputs -t /home/gcristofari/tools/tabsat/tools/zz_test/target_list_miseq.csv

test script generates error

When starting the test script ./test_tabsat_tmap.sh

I've got the following error (note that I've got the same error running tabsat manually on the same dataset or using the script for MiSeq):

#################################

Welcome to TABSAT!

github.com/tadKeys/tabsat

#################################

  • Library is SE
  • Using aligner: tmap
  • MIN_READ_QUAL: 21
  • MIN_READ_LEN: 9
  • Maximum read length not specified. Setting it to 100000
  • MAX_READ_LEN: 100000
  • PERCENT_TARGET: 0.7
  • READ_CUTOFF: 4
  • Sort list not specified. Setting it to ''
  • SORT_LIST:
    Traceback (most recent call last):
    File "/Users/gcristof/tabsat/tools/ait/check_target_list.py", line 4, in
    import create_final_table
    File "/Users/gcristof/tabsat/tools/ait/create_final_table.py", line 150
    print "Prefilling strand information buffer with " + str(len(key_list)) + " items"
    ^
    SyntaxError: invalid syntax
    Target list is not in the correct format

Documentation update

It is not indicated in which folder the the reference genome should be saved once downloaded. In tabsat/reference ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.