Coder Social home page Coder Social logo

akiyamalab / ghostz-gpu Goto Github PK

View Code? Open in Web Editor NEW
10.0 2.0 6.0 6.44 MB

A GPU-accelerated sequence homology search tool using database subsequence clustering

License: BSD 2-Clause "Simplified" License

Makefile 0.62% Cuda 24.90% HTML 65.50% CSS 0.45% JavaScript 0.55% TeX 0.01% C++ 7.81% C 0.16%

ghostz-gpu's Introduction

logo

GHOSTZ-GPU is a homology search tool which can detect remote homologues like BLAST and is about 5-7 times more efficient than GHOSTZ by using GHOSTZ.

GHOSTZ-GPU outputs search results in the format similar to BLAST-tabular format.

Requirements

  • gcc => 4.3
  • Boost >= 1.55.0
  • CUDA >= 6.0

Installation

  1. Download the archive of GHOSTZ-GPU from this repository.
  2. Extract the archive and cd into the extracted directory.
  3. Run make command.
  4. Copy ghostz-gpu binary file to any directory you like.

Commands:

$ tar xvzf ghostz-gpu.tar.gz
$ cd ghostz-gpu
$ make BOOST_PATH=Boost CUDA_TOOLKIT_PATH=CUDA
$ cp ghostz-gpu /AS/YOU/LIKE/

Boost and CUDA are directories where they are installed, respectively.

Usage

GHOSTZ-GPU requires specifically formatted database files for homology search. These files can be generated from FASTA formatted DNA/protein sequence files.

Users have to prepare a database file in FASTA format and convert it into GHOSTZ-GPU format database files by using GHOSTZ-GPU db command at first. GHOSTZ-GPU db command requires 2 args ([-i dbFastaFile] and [-o dbName]). GHOSTZ-GPU db command divides a database FASTA file into several database chunks and generates several files (.inf, .ind, .nam, .pos, .seq). All generated files are needed for the search. Users can specify the size of each chunk. Smaller chunk size requires smaller memory, but efficiency of the search will decrease.
For executing homology search, GHOSTZ-GPU aln command is used and that command requires at least 2 args ([-i qryName] and [-d dbName]).

Example

$ ghostz-gpu db  -i ./data/db.fasta -o exdb
$ ghostz-gpu aln -q d -t p -i ./data/queries.fasta -d exdb -o exout

Command and Options

db: convert a FASTA file to GHOSTZ format database files

ghostz-gpu db [-i dbFastaFile] [-o dbName] [-C clustering][-l chunkSize]
    [-L clusteringSubsequenceLength]  [-s seedThreshold]
Options:
(Required)
  -i STR    Protein sequences in FASTA format for a database
  -o STR    The name of the database
(Optional)
  -C STR    Clustering, T (enable) or F (disable) [T]
  -l INT    Chunk size of the database (bytes) [1073741824 (=1GB)]
  -L INT    Length of a subsequence for clustering [10]
  -s INT    The seed threshold [39]
  -a INT	The number of threads [1]

aln: Search homologues of queries from database

ghostz-gpu aln [-i queries] [-o output] [-d database] [-v maxNumAliSub]
  [-b maxNumAliQue] [-h hitsSize] [-l queriesChunkSize] [-q queryType]
  [-t databaseType] [-F filter] [-a numThreads] [-g numGPUs]
Options:
(Required)
  -i STR    Sequences in FASTA format
  -o STR    Output file
  -d STR    database name (must be formatted)
(Optional)
  -v INT    Maximum number of alignments for each subject [1]
  -b INT    Maximum number of the output for a query [10]
  -l INT    Chunk size of the queries (bytes) [134217728 (=128MB)]
  -q STR    Query sequence type, p (protein) or d (dna) [p]
  -t STR    Database sequence type, p (protein) or d (dna) [p]
  -F STR    Filter query sequence, T (enable) or F (disable) [T] 
  -a INT    The number of threads [1]
  -g INT    The number of GPUs [the number of available GPUs]

Search results

GHOSTZ-GPU outputs the tab-deliminated file as search results.

Example)

query0  subject0        100     25      0       0       1       75      1       25      2.75456e-15     60.4622
query0  subject6        100     10      0       0       46      75      16      25      2.58417e-05     27.335
query1  subject0        100     24      0       0       2       73      1       24      1.36707e-14     58.151
query1  subject6        100     9       0       0       47      73      16      24      0.000128251     25.0238
query3  subject6        100     14      0       0       34      75      12      25      3.60591e-07     33.4982
query3  subject0        100     10      0       0       46      75      16      25      2.58417e-05     27.335
query4  subject6        100     14      0       0       42      1       12      25      3.60591e-07     33.4982
query4  subject0        100     10      0       0       30      1       16      25      2.58417e-05     27.335

Each column shows;

  1. Name of a query sequence
  2. Name of a homologue sequence (subject)
  3. Sequence Identity
  4. Alignment length
  5. The number of mismatches in the alignment
  6. The number of gap openingsin the alignemt
  7. Start position of the query in the alignment
  8. End position of the query in the alignemnt
  9. Start position of the subject in the alignment
  10. End position of the subject in the alignment
  11. E-value
  12. Normalized score

References

Shuji Suzuki, Masanori Kakuta, Takashi Ishida, Yutaka Akiyama. "GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering", PLOS ONE 11(8): e0157338. https://doi.org/10.1371/journal.pone.0157338, 2016.

Copyright © 2015 Akiyama_Laboratory, Tokyo Institute of Technology, All Rights Reserved.

ghostz-gpu's People

Contributors

akiyamalab-web avatar gossan53 avatar metavariable avatar shu65 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ghostz-gpu's Issues

Porting to Sycl

Are you interested in having an (SYCL)[https://www.intel.com/content/www/us/en/developer/tools/oneapi/training/dpc-essentials.html#gs.bnjiaf] port of ghostz-gpu as a new backend?

With the SYCL backend, we'd like to extend the existing functionalities of the ghostz-gpu, by enabling the application to leverage the multi-core accelerator devices of Nvidia, AMD, and Intel vendor platforms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.