A random protein sequence dataset can be useful for various sequence analysis, in particular to evaluate and correct the analysis for background noise. Herein, we offer a tool that can generate a dataset of random viral protein sequences.
python seqRandomizer.py [-h] [-o OUTPUT] [-l SEQLEN] [-n SEQNUM]
In the usage case below, the seqRandomizer
tool is applied to generate a random viral protein sequence dataset in the folder result
, named random_vprot.fasta
consisting of 1,000 sequences of length 1,000 amino acids. The amino acid composition of the random sequences is based on all reported viral sequence retrieved from the NCBI Protein database (as of May 2021; allVirus080521.fasta
).
python seqRandomizer.py -o random_vprot.fasta -l 1000 -n 1000
Argument | Parameter | Type | Required | Description |
---|---|---|---|---|
-h | help | N/A | FALSE | Show this help message and exit |
-o | output | String | TRUE | Path of the output file to be created |
-l | seqlen | Integer | TRUE | The length of random protein sequences to be generated |
-n | seqnum | Integer | TRUE | The number of random protein sequences to be generated |
- This seqRandomizer is specific to the protocol describing the step-by-step utility of UNIQmin.
Or would like a feature added? Or maybe drop some feedback? Just open a new issue or send an email to us ([email protected]).