Coder Social home page Coder Social logo

tandem-repeats-merger's Introduction

Tandem Repeats Merger

Set of scripts for modifying output of Tandem Repeats Finder (TRF).

Finds candidate telomeric sequences in NGS data output of TRF.

Tested on Ubuntu 16.04 with Python 2.7.

Either you can run TRM along with TRF starting with the .fasta files, or if you already have NGS output data from TRF, you can run the TRM only.

This version is primarly used for Galaxy's toolshed repository definition. But can be used on command-line as well, just follow the README.

How to run together with TRF

  1. Place your data in .fasta format into one folder (e.g. ./data/)

  2. Download Tandem Repeats Finder from https://tandem.bu.edu/trf/ and place it into this folder. If your binary is not named trf407b.linux64 or you want to use different path than $PWD, modify iterateTRF.sh.

Better solution is to use conda with the env.yaml configuration file. Just call conda create env -f ./env.yaml -n trm-env and after the installation process call source activate trm-env.

  1. Change the variable dataDir inside the ./scripts/runAllWithTRF.sh to point into your directory with inpout data. You may also want to change the default name of output data (variable shortName). In the very same script, one can see the default settigns of other input parameters. They can be changed inside the script or sent from command line as follows: ./runAllWithTRF.sh 3 4 2 7 7 80 10 50 15 2 90 0 -h. It will create specific folder structure.

How to run without TRF

  1. Assuming you already have TRF's NGS output data, you should place them into ./scripts/res/TRF\_res directory with .dat extension.

  2. You may also change the variable myDir inside the ./scripts/runAllNoTRF.sh script so you can place your input data accordingly into ${myDir}/TRF\_res directory.

  3. This particular script has much less input paramaters to set. They can be changed inside the script or sent from command line as follows: ./runAllNoTRF.sh 3 4 90 0. It will create specific folder structure.

Explain the input parameters

All the input parameters are contained together in the runAllWithTRF.sh script so we use here the explanation from there (so far, they must be used in the specified order and in the right place):

  • minNumberOfRepeats="3" ... min number of repeats
  • minLengthOfPattern="4" ... min length of repeating pattern
  • trf_match="2" ... TRF's matching weight
  • trf_mism="7" ... TRF's mismatching penalty
  • trf_delta="7" ... TRF's indel penalty
  • trf_pm="80" ... TRF's match probability (whole number)
  • trf_pi="10" ... TRF's indel probability (whole number)
  • trf_min="50" ... TRF's minimum alignment score to report
  • trf_max="15" ... TRF's maximum period size to report
  • trf_longest="2" ... TRF's maximum TR length expected (in millions)
  • readLength="90" ... for restrZeros.py
  • relOccur="0" ... if yes, the value must be 1 otherwise it is preset to 0
  • trf_html="" ... TRF's html output; if you want to supress it change the value to '-h'

Explain specific output folder structure

  • res ... predifined output directory name (can be changed in the variable myDir in the scripts runAllWithRTF.sh and runAllNoTRF.sh)
    • parsed
      • dataset_6484_ppr.txt ... intermediate file
      • dataset_6485_ppr.txt ... intermediate file
      • dataset_6486_ppr.txt ... intermediate file
      • res
        • dataset_6484_ppr_sorted.txt ... intermediate file
        • dataset_6485_ppr_sorted.txt ... intermediate file
        • dataset_6486_ppr_sorted.txt ... intermediate file
        • joined_fixed_pairedReverseComplement_merged_sorted_FINAL.txt ... FINAL output file with reverse-complement-paired sequences of tandem repeats with number of occurrences in the input datasets
        • joined_fixed_pairedReverseComplement_merged_sorted.txt ... intermediate file
        • joined_fixed_pairedReverseComplement_merged.txt ... intermediate file
        • joined_fixed_pairedReverseComplement.txt ... intermediate file
        • joined_fixed.txt ... intermediate file
        • joined_fixed_without_pairedReverseComplement_sorted_FINAL.txt ... FINAL output file sorted according to the number of occurrences of tandem repeats in the input datasets
        • joined_fixed_without_pairedReverseComplement_sorted.txt ... intermediate file
        • joined_fixed_without_pairedReverseComplement.txt ... intermediate file
        • joined.txt ... intermediate file
    • TRF_res ... directory containing all TRF outputs (either it is filled automatically (case of runAllWithTRF.sh), or you must copy your input here (case of runAllNoTRF.sh)
      • dataset_6484.dat ... NGS data from TRF
      • dataset_6485.dat ... NGS data from TRF
      • dataset_6486.dat ... NGS data from TRF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.