Coder Social home page Coder Social logo

bifido's Introduction

wget http://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
Retrieve the assembly summary file from the NCBI online database.

grep "Bifidobacterium longum" assembly_summary_refseq.txt | awk 'BEGIN {FS="\t"};{print $8,$20}'
There are 184 B. longum genomes available. There are 23 B. longum infantis genomes available.

grep "Bifidobacterium longum subsp. infantis" assembly_summary_refseq.txt | 
awk 'BEGIN {FS="\t"};{print $20}' > b_infantis_ftps.txt
Write just the ftp addresses to a file for easy genome downloading.

while read line; do wget --recursive --no-host-directories --cut-dirs=6 $line; 
done < b_infantis_ftps.txt
This downloads all the genomes for all the B. longum infantis species.

mkdir genome | mv GCF* genome
This moves all downloaded genomes into a separate folder named genome.

gunzip */*protein.faa.gz | cat */*protein.faa > all_protein_genomes.faa
This concatenates all the protein genomes in fasta format into one file named all_protein_genomes.faa.

gunzip */*genomic.fna.gz | cat */*genomic.fna > all_dna_genomes.faa
This concatenates all the dna genomes in fasta format into one file named all_dna_genomes.faa.

makeblastdb -in all_dna_genomes.faa -dbtype 'nucl' -out blast_all_dna_genomes.faa -parse_seqids
This creates a blast database of the all_dna_genomes.faa file.

time blastn -db blast/blast_all_dna_genomes.faa -query hmo_genes.faa -out blast_output
This blasts the thirty HMO genes against the b. infantis genomes.

time blastn -db blast/blast_all_dna_genomes.faa -query hmo_genes.faa -out blast_output_fmt -outfmt 7 
This blasts the thirty HMO genes against the b. infantis genomes into a tabular output file.

time blastn -db blast/blast_all_dna_genomes.faa -query hmo_genes.faa -out blast_output_fmt -max_target_seqs -outfmt 7
This blasts the thirty HMO genes against the b. infantis genomes into a tabular output file, increasing
the maximum number of aligned sequences to keep. No difference from the previous output.

ssh -F $HOME/engaging-cluster/linux/config eofe4.mit.edu
This logs into the engaging cluster. The alias engaging performs this same task.

ncbi-blast-2.8.1+/bin/makeblastdb -in echo/batch001/all_dna_genomes_echo.faa -dbtype nucl
-out echo/batch001/blast/blast_all_dna_genomes_echo.faa -parse_seqids
This creates a blast database of all the ECHO samples in batch 001.

sed -i '' 's/>/>GCF_000020425.1_ASM2042v1!/g' GCF_000020425.1_ASM2042v1_genomic.fna
This adds the genome filename to each sequence in the gneomic.fna file.

bifido's People

Contributors

ltso3 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.