Coder Social home page Coder Social logo

rebuild_lv_sars2_tree's Introduction

Interactive phylogeny of early SARS-CoV-2 that attempts to reproduce Fig 3a of 2024 study by group of Yong-Zhen Zhang

Overview

This repository builds a phylogeny designed to reproduce that in Fig. 3a of Lv et al, Virus Evolution (2024), which reports a phylogeny of SARS-CoV-2 sequences in previously in public databases and new sequences collected by Zhang's group from what Zhang calls "stages 0 and 1" of the outbreak (before March-1-2020).

The phylogeny here was built by trying to exactly reproduce the methods of Lv et al, Virus Evolution (2024) used to make Fig. 3a of that paper, so see that paper for details about the sequences included, their lineage designations, rooting of the tree, etc. However, the methods were not sufficient to completely reproduce every detail, so the tree here is very close but may not be exactly identical to that from the paper. In addition to the metadata from Lv et al, Virus Evolution (2024), sequences noted as being from 2019 in Tables 6 and 7 of the joint WHO-China report are also annotated on this phylogeny.

Note that the best way to root the SARS-CoV-2 phylogeny is still an open topic with substantial uncertainty (see Pipes et al (2021)). There are also open questions about the best way to subsample, de-duplicate, and quality control early SARS-CoV-2 sequences. Further, the high similarity of the sequences mean there is also just limited statistical support to reliably draw conclusions about some aspects of the phylogeny (see the paper "Phylogenetic analysis of SARS-CoV-2 data is difficult" by Morel et al (2020)).

This phylogeny attempts to reproduce the one from the study by Zhang's group (Lv et al, Virus Evolution (2024)) since that is the most recent major study on the topic and includes additional sequence data not in prior analyses. However, further work on rooting and sequence curation are needed (this phylogeny should help with that), and recall the conclusion of Pipes et al (2021) that "These results suggest that phylogenetic evidence alone is unlikely to identify the origin of the SARS-CoV-2 virus and we caution against strong inferences regarding the early spread of the virus based solely on such evidence."

Details

The goal of this repository is to re-build the portion of the phylogenetic tree covering what the paper refers to as Stage 0 and Stage 1 of the outbreak, which is everything prior to March-1-2020 so that it can be rendered interactively using the Nextstrain auspice.us platform. This is the tree shown in Figure 3A of the paper.

Because that tree is not provided in raw form in the paper, this repository tries to repeat the analysis in Lv et al (2024) to produce it.

The conda environment to run the analysis is in environment.yml.

The input data are in ./data/:

The pipeline to build the tree mostly uses the augur pipeline, and is run by the snakemake file in Snakefile.

All the files created by the pipeline are placed in ./results/ subdirectory which is not tracked in this GitHub repo due to GISAID data-sharing restrictions. The final tree is in auspice/rebuild_Lv_SARS2_tree.json, and can be viewed with Nextstrain using the instructions here at the link https://nextstrain.org/community/jbloom/rebuild_Lv_SARS2_tree.

rebuild_lv_sars2_tree's People

Contributors

jbloom avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.