In this repository, you will find the workflow of the analyzes made in the Diversity and specificity of rhizobia from nodules of tropical tree-legumes article.
In this study we analized 9 bacterial strains, isolated from nodules of three different leguminous plants and different geographical locations:
Strain | Source nodule plant isolation | Geographical locations |
---|---|---|
CCGB01 | Lysiloma sp. | Cuernavaca, Morelos. México |
CCGB20 | Lysiloma sp. | Cuernavaca, Morelos. México |
CCGB12 | Lysiloma sp. | Cuernavaca, Morelos. México |
CCGUVB4N | Inga vera | Xico, Veracruz. Mexico |
CCGUVB14 | Inga vera | Xico, Veracruz. Mexico |
CCGUVB23 | Inga vera | Xico, Veracruz. Mexico |
CCGUVB1N3 | Inga vera | Xico, Veracruz. Mexico |
B51278 | Lysiloma divaricatum | Edimburgo, Escocia |
B51279 | Lysiloma divaricatum | Edimburgo, Escocia |
|-- analysis
|-- data
|-- figures
|-- logs_results
|-- metadata
|-- src
|-- .gitignore
|-- LICENSE
|-- README.md
analysis/
Contains results of each analysis in full version.
data/
Directory with raw sequences and symbolic links of clean sequences sorted by long and short reads.
figures/
Directory with images of the results obtained.
logs_results/
This contains the reports of each analysis in order to summarize the results.
metadata/
Contains tables with particularities of each sample, tables with information about data downloaded and used for some analyzes and tables with some summary results.
src/
Contains all scripts to perform each analysis
.gitignore
File with information not visible in the repository due to space issues, for example the data/ directory. This information will be available when the data is made public and we will put the access numbers, so that they are available.
LICENSE
File with license specifications.
README
File with workflow and repository details.
bash src/00.md5sum.sh
In this step we checked the quality of the short reads with FastQC. We also eliminated adapters and poor quality reads with TrimGalore v0.6.7.
bash src/01.qc_preprocessing.sh
Tablas de lecturas antes y despues de la limpieza
The sequenced genomes were assembled in two ways:
- Hybrid assembly: When you had long reads (MinION) and short reads (Illumina). This was done with Unicycler v0.4.8
- Simple assembly: When only short reads (Illumina) were obtained. This was done with SPAdes v3.13.1
#Hybrid assembly
bash src/02.hybrid_assembly.sh
#Simple assembly
bash src/02.spades_assembly.sh
The hybrid assembly of 23-inga strain showed plasmid putative. So, we verified it all strains with RFPlasmid v0.0.18.
bash src/02.find_plasmids.sh
- Plasmid assembly: 23-inga and 1N3-inga strains were plasmid positives. We assembled the plasmids from these two strains with plasmidSPAdes v3.13.1and include one of the strains that does not have a plasmid, as a negative control.
bash src/02.plasmid_spades_assembly.sh
- Separate plasmid from chromosome: Map plasmid assembly to reads of 23-inga and 1N3-inga strains with bowtie2, minimap2, samtools and bedtools software; to separate plasmid of chromosome.
bash src/02.plasmid_separate.sh
- Chromosome assembly again: After clean plasmid, assembly chromosome.
bash src/02.assembly_chr_fltr.sh
The prediction and annotation of genes was done using the online server RAST v2.0 with RAStk toolkit, a complementary annotation was made on the web server EggNogg-Mapper v2.17 and Prokka v1.14.6
The code lines generated by each web service are displayed
cat src/03.annotation_server.sh
bash src/03.annot_prokka.sh