Coder Social home page Coder Social logo

dib-lab / kspider Goto Github PK

View Code? Open in Web Editor NEW
8.0 5.0 1.0 21.94 MB

A simple yet powerful sequence clustering tool.

Home Page: https://dib-lab.github.io/kSpider/

License: MIT License

CMake 2.83% C++ 74.31% Shell 0.71% Python 21.56% SWIG 0.58%
dna-sequences dna protein skipmers clustering

kspider's Introduction

Logo

@dib-lab/kSpider

Ubuntu Open Issues GitHub PyPI - Wheel Maintained PyPI - Python Version

πŸ“– Table of Contents

-----------------------------------------------------

➀ Table of Contents

-----------------------------------------------------

➀ Introduction

kSpider is a user-friendly command line interface program to perform sequence clustering. First, it creates an index using kProcessor for the source sequences. Second, it constructs a pairwise containment matrix through a single iteration over the index. Finally, it builds a graph from the pairwise matrix and applies a connected-components graph algorithm to extract the clusters with a user-defined containment threshold.

Documentations are hosted at https://dib-lab.github.io/kSpider

-----------------------------------------------------

➀ Quick Installation (pip)

pip install kSpider

-----------------------------------------------------

➀ Manual build / Development

Install dependencies

sudo apt-get install g++ swig cmake python3-dev zlib1g-dev libghc-bzlib-dev python3-distutils libboost-all-dev
git clone https://github.com/dib-lab/kSpider.git
cd kSpider
git submodule update --init --recursive
cmake -Bbuild
cmake --build build
bash build_wrapper.sh

-----------------------------------------------------

➀ Authors

You? Tamer Mansour
Mohamed Abuelanin Tamer Manosur

-----------------------------------------------------

➀ License

Licensed under MIT License.

kspider's People

Contributors

mr-eyes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

dbretina

kspider's Issues

updated usage example

I tried to follow the usage example outlined at https://dib-lab.github.io/kSpider/, but the instructions no longer work. Specifically, the indexing step seems to have changed from kSpider index_kmers... to kSpider index, with many of the arguments in the example no longer options to the new command. Would you be willing to provide updated instructions for how to cluster with kSpider? My use case is clustering isoforms in a de novo transcriptome when we have no knowledge of which genes each isoform/contig encodes. All of my transcripts are in a single FASTA file and I would like to predict which encode the same isoforms by clustering.

CI Improvement

Instead of building the whole project for each python version, build once and produce multiple wheels.

Add sourmash as a plugin

Since kSpider's command line interface is implemented in Python, we can add sourmash as a plugin.

Convert the clustering Python code to C++

The clustering script is currently written in Python, inefficient in large pairwise matrices. The adopted clustering technique relies on the graph-connected components.

I can consider working with retworkx, and I hope it's thread safe. I can also consider other C++ graph libs.

Split parsing and indexing/pairwise

This will unify the input data type for kSpider.

Sourmash sigs, FASTA, FASTQ will be converted into binary file. The binary files will be used then either to perform the brute-force comparison or indexing then kSpider original pairwise comparisons.

Explore other clustering methods

kSpider supports a single method of clustering, which is graph clustering. Each sample/genome represents a node, and each edge represents a containment percentage between the two connected samples. An edge should only be created if it's above the predefined user threshold. Connected components of the undirected graph are considered to be clusters.

I will list here other clustering methods that might work.

Update docs

kSpider now supports new modes and features that need to be documented.

Add support for clustering Sourmash Signatures

Thoughts,

I don't think we need to add Sourmash API as a dependency, we just need to implement the signatures/zipped_signatures reader and use the hash values as kmers. If the signatures contain count, we can use it for count-based trimming (i.e. removing singletons). Will suppose the signatures are already scaled down but will add it also as an option.

CC @ctb

updated installation instructions

Hello! I recently tried to install kspider and wasn't able to using the installation instructions. From my mac (M1 running rosetta), I created a new conda env and then tried to pip install kspider. I got errors from missing dependencies and ended up with issues related to clang. Would you be willing to provide updated installation instructions or a conda environment file with all of the dependencies need to get kspider to install from pip?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.