Coder Social home page Coder Social logo

kn-bibs / dotplot Goto Github PK

View Code? Open in Web Editor NEW
13.0 7.0 2.0 215 KB

Simple visualisation tool for sequences' similarity in bioinformatics

License: GNU Lesser General Public License v3.0

Python 99.93% Shell 0.07%
dotplot visualisation protein-sequence bioinformatics gene-similarity

dotplot's People

Contributors

agreal1118 avatar bartma11 avatar behoston avatar chmielowyzboj avatar kinga322 avatar krassowski avatar maciosz avatar magickris93 avatar maryniak95 avatar pasliwa avatar pjanek avatar rlatawiec avatar sienkie avatar szymek2137 avatar vaira123 avatar xidron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

sienkie jackywu

dotplot's Issues

Sequence label in matplotilb does not work well with window_size option

When plotting:

./dotplot.py --fasta 1.fa 2.fa --gui

and

./dotplot.py --fasta 1.fa 2.fa --window_size 10 --gui

in both cases we really consider all residues/nucleotides etc, so full sequence should be shown in both cases. In the second case, the sequence is trimmed to the dimensions of plotted matrix, which shouldn't be the case.

Let user decide if sequence should be displayed

Currently we display sequences as labels on sides of plot if the sequence is shorter than 100 (and only if chosen drawer is matplotlib).

We could add a command line argument drawer.show_sequences which would allow users to overwrite this behavior and specify if sequences should be shown.

Also adding sequences to other drawers would be a nice-to-have feature; the problem will be if window_sizes != 1 will be chosen - then we should refrain from adding sequences in unicode and ascii mode (for simplicity and because those modes are not of the foremost priority).

Add axes labels

At the moment nothing indicates which sequence is on which axis. We should use at least a basic label with the sequence's name.

Loading non-FASTA plain text files

As of right now our program only accepts text in the FASTA format. We'd like to be able to load any plain text files as long as it meets certain criteria (i.e is a valid sequence). In such situation we also need to provide a way for users to input basic information about the sequence (e.g name of the sequence).

Add basic description to built-in help

It would be nice to have some more general information visible after typing:
python3 dotplot.py -h
Only small changes are needed: to start with one can add description parameter to ArgumentParser copying and modifying some texts from README file.

We need to have a licence

Today we don't have any licence. It should be there to encourage contribution from outside programmers.

Fetch sequence by gene

We could add more data sources and enable user to download sequences be gene name (HGNC and Ensembl). We will retrieve the canonical sequence for given gene.

Add exemplar sequence identifiers to chooser

It would be great if we could display an example or two of sequence identifier (for each online source) in a window where user can choose a sequence to download. It might be interactive (so shown only after user's choice on given radio button) and implemented by changing text in some new QLabel (e.g. below the radio buttons).

Graphical and command-line interface for sequence retrieval

Right now we have implemented functions allowing us to fetch sequences directly from online databases (#23). We should expose some interface so users could use that easily. Two things to be done here:

  • add several options to argument parser
  • create GUI widgets (maybe dedicated window?) with adequate options

"Undo" option

That's self explanatory. It would be nice to offer to the user an option to undo/redo previous actions. We could add two entries in menu and create a "state" object which would hold all the current configuration and have a list of those states (or even better, a queue of fixed length).

Use block elements from UTF-8 charset to display different shades of dotplot

Since implementation of window_size we are able to generate "gradients" to show partial matches of fragments of sequences in "windows" of specified size. You can see the effect running:

./dotplot.py --fasta 1.fa 2.fa --gui --window_size 2

It will be good to have this functionality available also in UTF8 drawer. We can use characters described on Wikipedia: Block Elements page, like: ▓, ▒, ░ to show different percentages (right now we have █ if 1, else [space]; we want to have ranges defined for different fractions of 1).

Add appropriate shebang to main file

Right now we cannot invoke ./dotplot.py properly because we don't have a shebang pointing to the python3 location.

This is an easy task: one should find out the most generic (cross-platform compatible) shebang ("google is your friend") and place it at the very beginning of the dotplot.py file.

Later it would be great if we can test it both with Linux and MacOS.

We should generate graphics

Now Dotplot generates ASCII plots. We should enable creating more sophisticated graphics. Because it looks nicer, that's why.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.