Detecting the undetected in metagenomics

Advisor:
Cortland Griswold (IB)

The goal of this project is to study and seek to gain information from metagenomic datasets using population genetic theory and tools. Metagenomics involves the sequencing of individuals, particularly microbes, at the community level. It has applications in biodiversity research, agriculture and medicine. In amplicon-based metagenomic studies a question is the extent to which the amplified and sequenced reads are representative of the species present in a community. Species or taxa may be fully or partially unsequenced due to DNA primer mismatches, and therefore undetected. My research group derived theory that predicts properties of DNA sequence variation when lineages go unsequenced in an amplicon-based metagenomic study (https://journals.sagepub.com/doi/full/10.1177/1176934319883612). In this project, a student will perform a comprehensive analysis of amplicon-based metagenomic studies from EBI’s MGnify database (https://www.ebi.ac.uk/metagenomics/), testing for signatures of unsequenced lineages.