Data Mining of Next Generation Sequences To Discover New Viruses And Viral Strains In Grapevine

Accurate genome assembly from short sequencing reads is critical for downstream analysis, for example allowing investigation of viral variants within a sequenced population. However, assembling sequencing data from virus samples into genome sequence is challenging.  This is due mainly to the virome in vegetatively propagated plant crops such as grapevine is highly complex, often comprising a large number of different viruses and viral strains.  These challenges are compounded in the assembly of genomes of plant viruses because many viruses and viral strains co-infect the same plant.  As a result, these co-infections often lead to recombination (artificial or natural).  The proposed research involves the analysis of deep sequencing data (Illumina) obtained in our laboratory.  The goal of this project is to optimize parameters for genome assembly in order to accurately assemble draft genomes of genetic variants of viruses that exist in a sample.  The student may also take on the development of an algorithm to detect recombination in the genome sequences assembled in our laboratory or those already published by others.

This project will include the following objectives:

  1. Data mining to discovery novel viruses and new strains of known viruses from short read lengths obtained from NGS using Illumina technologies.
  2. Development of reliable systems for the accurate genome assembly of multiple viral variants that exist in a single plant sample.
  3. Identification and validation of potential recombinants and sites of recombination from already assembled viral genomes through use of different algorithms developed by others or to be developed in this project.