There are of course several great and widely used packages of bioinformatics helper programs out there. Some of these include the likes of seqtk, fastX-toolkit, and bbtools โ all of which I use regularly and have helped me do the things I was trying to get done. But there are always more tasks that crop up that may not yet have a helper program or script already written that we can find.
bit is a collection of one-liners, short scripts, and programs that run in a Unix-like command-line environment that I have been adding to over several years. Anytime I need to write something to perform a task that has more than a one-off, ad hoc use, I consider adding it here. This includes things like:
Purpose | Script(s) |
---|---|
quickly summarizing nucleotide assemblies | bit-summarize-assembly |
splitting a fasta file based on headers | bit-parse-fasta-by-headers |
renaming sequences in a fasta | bit-rename-fasta-headers |
re-ordering a fasta file | bit-reorder-fasta |
pulling out sequences from a fasta by their coordinates | bit-extract-seqs-by-coords |
pulling amino-acid or nucleotide sequences out of a GenBank file | bit-genbank-to-AA-seqs , bit-genbank-to-fasta |
counting the number of bases per sequence in a fasta file | bit-count-bases-per-seq |
calculating variation in each column of a multiple-sequence alignment | bit-calc-variation-in-msa |
filtering a table based on wanted IDs | bit-filter-table |
downloading NCBI assemblies in different formats by just providing accession numbers | bit-dl-ncbi-assemblies |
searching the (stellar) Genome Taxonomy Database by taxonomy and getting their NCBI accessions | bit-get-accessions-from-GTDB |
getting full lineage info from a list of taxon IDs (making use of the also stellar TaxonKit) | bit-get-lineage-from-taxids |
filtering KOFamScan results | bit-filter-KOFamScan-results |
getting information about a specific GO term | bit-get-go-term-info |
summarizing GO annotations | bit-summarize-go-annotations |
summarizing kraken2 outputs in a table with counts of full taxonomic lineages, and combining multiple samples | bit-kraken2-to-taxon-summaries , bit-combine-kraken2-taxon-summaries |
combining bracken outputs and adding full taxonomic lineage info | bit-combine-bracken-and-add-lineage |
generating color/mapping/data files for use with trees being viewed on the Interactive Tree of Life site | bit-gen-iToL-map , bit-gen-iToL-colorstrip , bit-gen-iToL-text-dataset , bit-gen-iToL-binary-dataset |
And other just convenient things that are nice to have handy, like removing soft line wraps that some fasta files have (bit-remove-wraps
), and printing out the column names of a TSV with numbers (bit-colnames
) to quickly see which columns we want to provide to things like cut
or awk
๐
Each command has a help menu accessible by either entering the command alone or by providing -h
as the only argument. Once installed, you can see all available commands by entering bit-
and pressing tab twice.
bit runs in a Unix-like environment and is recommended to be installed with conda as shown below.
If you are new to the wonderful world of conda and want to learn more, one place you can start learning about it is here ๐
Due to increasing program restrictions as bit has grown, it's easiest to install it in its own environment as shown below (though I still put it in my base environment when I can given how much I rely on it ยฏ\_(ใ)_/ยฏ):
conda create -n bit -c conda-forge -c bioconda -c defaults -c astrobiomike bit
conda activate bit
Each command has a help menu accessible by either entering the command alone or by providing -h
as the only argument. Once installed, you can see all available commands by entering bit-
and pressing tab twice.
If you happen to find bit useful in your work, please be sure to cite it ๐
Lee M. bit: a multipurpose collection of bioinformatics tools. F1000Research 2022, 11:122. https://doi.org/10.12688/f1000research.79530.1
You can get the version you are using by running bit-version
.
If you are using a program in bit that also leverages another program, please be sure to cite them too. For instance, bit-get-lineage-from-taxids
uses TaxonKit, and bit-slim-down-go-terms
used goatools. For cases where a bit script relies on other programs like those, it will be indicated in the help menu of the bit program.
For phylogenomics, checkout GToTree ๐