eslam-samir-ragab / sequence-database-curator Goto Github PK

This program dereplicates and/or filter nucleotide and/or protein database from a list of names or sequences (by exact match).

Home Page: https://sites.google.com/pharma.cu.edu.eg/eslam-ibrahim/github-and-softwares/sddc-program

License: GNU General Public License v3.0

Python 100.00%

bioinformatics curator database fasta fastq gene genetics genomics metagenomics ncbi nucleotide protein proteomics redundancy sequence

sequence-database-curator's People

Contributors

Stargazers

Watchers

Forkers

ahmedmkamel erzhu419 bioone aliyoussef96 almir07

sequence-database-curator's Issues

Can i konw the estimated time of the tool running on big data

hello ~
i am running the tool on the 2G portein data,and it looks have took about 1h . it's still working or meeting some error ?

Best wishes.

general usage question (2.5 GB nucleotide fasta file)

I stumbled across this program while looking for alternatives to fastx toolkit and CD-HIT for removing duplicates from fasta files. My fasta file is very large (~2.5 GB) and contains many genes of interest. I'm working on a system that has up to 40 core-nodes and 754 GB of Memory to run any individual job with, but we have a time limit of 72 hours.

This was my command:
module load sddc/3.0
eval "$(/util/common/python/py37/anaconda-2020.02/bin/conda shell.bash hook)"
conda activate sddc
python /util/common/sddc/3.0/Sequence-database-curator/sddc.py -in GENES.fasta -out GENES_set_sddc.fasta -n -mode derep -org_order

In your experience, should your program be able to de-replicate my fasta file with about ~925,000 sequences within my time limit of 72 hours? I have a feeling that it wouldn't be able to based on the time calculation on the main tab, but I wanted to ask. If not, do you recommend any other options for this kind of job. Thank you

eslam-samir-ragab / sequence-database-curator Goto Github PK

sequence-database-curator's People

Contributors

Stargazers

Watchers

Forkers

sequence-database-curator's Issues

Can i konw the estimated time of the tool running on big data

general usage question (2.5 GB nucleotide fasta file)

Why not make it compatible with Linux/Windows and Python 2/3?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent