Coder Social home page Coder Social logo

findtelomeres's Introduction

What does this script do?

This is a tool for finding telomeric repeats (TTAGGG/CCCTAA) in FASTA files.

What does this script NOT do?

It will only look for telomeres at the start and end of the sequences. It only looks for variations of the TTAGGG/CCCTAA repeats.

How does it do that?

It takes a FASTA file as input and goes through the sequences in it one by one. It ignores N's (unknown bases) at the start and the end of each sequence.

For each sequence, it will look at the first (last) 50 nts and assess how much of this sequence is covered by telomeric repeats. This is deliberately flexible to allow for sequencing errors and sequence/length variation of telomeric motifs. More specifically, if >= 50% of the first (last) 50 nts are covered by telomeric repeats, it will call a telomere.

The default settings of 50% (-c/--cutoff) and 50 nts (-w/--window) seem to work well for most use cases. Some telomeres can be very short or vary from the canonical TTAGGG/CCCTAA motif. With these parameters they will likely be recovered. However, the parameters can be set differently.

The telomeric motifs that are used in the search are these regular expressions: C{2,4}T{1,2}A{1,3} and T{1,3}A{1,2}G{2,4}. They can be changed by editing one line in the script to suit other needs.

Installation and usage

The script is written in Python 3 and requires BioPython (https://biopython.org/wiki/Download).

After installing Python 3 and BioPython, run the script as follows:

usage: FindTelomeres.py FASTA_FILE

For example:

python FindTelomeres.py test.fasta

This will output:

##########
2 sequences to analyze for telomeric repeats (TTAGGG/CCCTAA) in file test.fasta
##########

tig00000045 (contig with one telomere)           Forward (start of sequence)     acCTAACCTAACCTAACCTAACCCTAACCTAACCCTAACTAACCTAACCT
tig00001011 (contig with two telomeres)          Forward (start of sequence)     cctaacctaaccctaaacctaaacccaaccccCTAACCCTAACCAACCTA
tig00001011 (contig with two telomeres)          Reverse (end of sequence)       TTAGGGTTAGGTGGTTTAGGTTAGGGTTAGAGTAGTGAGGTTaggttagg

findtelomeres's People

Contributors

janasperschneider avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

findtelomeres's Issues

Use for yeast genome

Hi Can this be used for Yeast genome? I ran it on the reference genome of my yeast but no telomeric repeat found. How can it be adapted for yeast?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.