bifido's Introduction
wget http://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt Retrieve the assembly summary file from the NCBI online database. grep "Bifidobacterium longum" assembly_summary_refseq.txt | awk 'BEGIN {FS="\t"};{print $8,$20}' There are 184 B. longum genomes available. There are 23 B. longum infantis genomes available. grep "Bifidobacterium longum subsp. infantis" assembly_summary_refseq.txt | awk 'BEGIN {FS="\t"};{print $20}' > b_infantis_ftps.txt Write just the ftp addresses to a file for easy genome downloading. while read line; do wget --recursive --no-host-directories --cut-dirs=6 $line; done < b_infantis_ftps.txt This downloads all the genomes for all the B. longum infantis species. mkdir genome | mv GCF* genome This moves all downloaded genomes into a separate folder named genome. gunzip */*protein.faa.gz | cat */*protein.faa > all_protein_genomes.faa This concatenates all the protein genomes in fasta format into one file named all_protein_genomes.faa. gunzip */*genomic.fna.gz | cat */*genomic.fna > all_dna_genomes.faa This concatenates all the dna genomes in fasta format into one file named all_dna_genomes.faa. makeblastdb -in all_dna_genomes.faa -dbtype 'nucl' -out blast_all_dna_genomes.faa -parse_seqids This creates a blast database of the all_dna_genomes.faa file. time blastn -db blast/blast_all_dna_genomes.faa -query hmo_genes.faa -out blast_output This blasts the thirty HMO genes against the b. infantis genomes. time blastn -db blast/blast_all_dna_genomes.faa -query hmo_genes.faa -out blast_output_fmt -outfmt 7 This blasts the thirty HMO genes against the b. infantis genomes into a tabular output file. time blastn -db blast/blast_all_dna_genomes.faa -query hmo_genes.faa -out blast_output_fmt -max_target_seqs -outfmt 7 This blasts the thirty HMO genes against the b. infantis genomes into a tabular output file, increasing the maximum number of aligned sequences to keep. No difference from the previous output. ssh -F $HOME/engaging-cluster/linux/config eofe4.mit.edu This logs into the engaging cluster. The alias engaging performs this same task. ncbi-blast-2.8.1+/bin/makeblastdb -in echo/batch001/all_dna_genomes_echo.faa -dbtype nucl -out echo/batch001/blast/blast_all_dna_genomes_echo.faa -parse_seqids This creates a blast database of all the ECHO samples in batch 001. sed -i '' 's/>/>GCF_000020425.1_ASM2042v1!/g' GCF_000020425.1_ASM2042v1_genomic.fna This adds the genome filename to each sequence in the gneomic.fna file.
bifido's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.