Computational identification of within-host diversity of SARS-CoV-2 and its benefits for mRNA vaccine design

Publication Type:
Thesis
Issue Date:
2022
Full metadata record
The emergence of COVID-19 in late 2019 in Wuhan has had a profound global impact. Despite extensive research on the causative virus, SARS-CoV-2, some questions regarding its origin remain unanswered. This paper investigates the sequencing reads and assembled SARS-CoV-2 reference genome to shed light on these gaps. By retracing the assembly process, evidence of the coexistence of multiple SARS-CoV-2 strains within patients is revealed. The assembly tool, MEGAHIT, tends to overlook alternative assembly routes, necessitating the development of a workflow to rectify this issue and identify multiple strains in COVID-19 patient samples. This workflow includes error correction, relevant read extraction, strain identification, phylogenetic analysis, and protein structure examination. Results indicate the presence of multiple SARS-CoV-2 strains in the sample used for the reference sequence, with differing binding affinity and phylogenetic relationships to the published SARS-CoV-2 reference genome, SARS-CoV, and other variants. These strains exhibit structural differences that impact the protein binding affinity with human ACE2. Consequently, this workflow highlights the importance of considering these variations when designing mRNA vaccines, as the coexisting strains possess different nucleotide sequences from the assembled reference sequence.
Please use this identifier to cite or link to this item: