The syn-mrl from zhaotao1987

Pipeline for whole-genome microsynteny-based phylogenetic inference

Our synteny-based phylogenetic reconstruction approach includes four main steps, in turn namely phylogenomic synteny network construction, network clustering, matrix representation, and maximum-likelihood estimation. Together we call our approach ‘Syn-MRL’ for short.

The synteny network construction consists of two main steps: first, all-vs-all reciprocal annotated-protein comparisons of the whole genome using DIAMOND was performed, followed by MCScanX, which was used for pairwise synteny 489 block detection. Parameter settings for MCScanX have been tested and compared before; here we adopt ‘b5s5m25’ (b: number of top homologous pairs, s: number of minimum matched syntenic anchors, m: number of max gene gaps), which has proven to be appropriate by various studies for the evolutionary distances among angiosperm genomes. To avoid large numbers of local collinear gene pairs due to tandem arrays, if consecutive homologs (up to five genes apart) share a common gene, homologs are collapsed to one representative pair (with the smallest E-value). Further details regarding phylogenomic synteny network construction can be found in a tutorial available in the associated GitHub repository (https://github.com/zhaotao1987/SynNet-Pipeline). Each pairwise synteny block represents pairs of connected nodes (syntenic genes), all pairwise identified synteny blocks together form a comprehensive synteny network with millions of nodes and edges. In this synteny network, nodes are genes (from the synteny blocks), while edges connect syntenic genes. For our work, the entire synteny network summarizes information from 7,435,502 pairwise syntenic blocks, and contains 503 3,098,333 nodes (genes) and 94,980,088 edges (syntenic connections). The entire synteny network (database) is clustered for further analysis. We used the Infomap algorithm for detecting synteny clusters within the map equation framework(https://github.com/mapequation/infomap). We have discussed before why Infomap is more appropriate for clustering phylogenomic synteny networks. We used the two-level partitioning mode with ten trials (--clu -N 10 --map -2). The network was treated as undirected and unweighted. Resulting synteny clusters vary in size and composition, which is associated with synteny either being well conserved or rather lineage-/species-specific. A typical synteny cluster comprises of syntenic genes shared by groups of species, which precisely represent phylogenetic relatedness of genomic architecture among species. Here, we classified the entire synteny network into 137,833 synteny clusters.

A cluster phylogenomic profile shows its composition by the number of nodes in each species. We summarize the total information residing in all synteny clusters as a data matrix for tree inference. Phylogenomic profiles of all clusters construct a large data matrix, where rows represent species, and columns as clusters. The matrix was then reduced to a binary presence-absence matrix to obtain the final synteny matrix. Tree estimation was based on maximum-likelihood as implemented in IQ-TREE (version 1.7-beta7) (Nguyen et al., 2014), using the MK+R+FO model. (where “M” stands for “Markov” and “k” refers to the number of states observed, in our case, k =2). The +R (FreeRate) model was used to account for site-heterogeneity, and typically fits data better than the Gamma model for large datasets. State frequencies were optimized by maximum-likelihood (by using ‘+FO’). We generated 1000 bootstrap replicates for the SH-like approximate likelihood ratio test (SH-aLRT), and 1000 ultrafast bootstrap (UFBoot) replicates (-alrt 1000 -bb 1000).

zhaotao1987 / syn-mrl Goto Github PK

syn-mrl's Introduction

Microsynteny-based vs sequence-alignment based phylogenetic reconstruction

syn-mrl's People

Contributors

Stargazers

Watchers

Forkers

syn-mrl's Issues

Error when carrying out Phylogenetic profiling

Do you have a tutorial on how to run SYN-MRL after you have the Synteny Network file?

I am done with running the SyntneyNet but it gives me error saying while running infomap clustering
Infomap SynNetformatted.txt Clustering/ --clu -N 10 -2 --flow-model undirected

Infomap v1.4.1 starts at 2023-10-19 14:37:23
-> Input network: SynNetformatted.txt
-> Output path: Clustering/
-> Configuration: clu
two-level
flow-model = undirected
num-trials = 10

Making the phylogenetic tree

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent