(External) Machine-learning approaches to predict of pathogen-host protein-protein interactions

Advisor: David Levy-Booth, Dyno Therapeutics 

Proposed co-advisors: Jennifer Geddes McAlister and Andrew Hamilton-Wright

Understanding and predicting pathogen-host protein-protein interactions have wide ranging applications from designing gene therapy vectors that reduce inflammation relative to wild-type vectors, rapid responses to novel pathogens such as SARS-CoV-2 (1), and antifungal drug development (2). Yet, experimental data characterizing pathogen-host interactions are scarce, while methods are expensive and time-consuming. One cause is that pathogen-host interactions are a Big Data problem: characterizing interactions of a novel pathogen with a few thousand candidate proteins against e.g. 26,000 human proteins leads to millions of protein interactions to test experimentally (3). Developing machine learning (ML) approaches to classify and predict protein interactions has the potential to considerably narrow the search space for clinically relevant protein-protein pairs. This project will develop and evaluate ML models to classify proteins based on mass spectrometry data and predict host-pathogen interactions. 

This project can be a one-semester or two-semester remote project.

 

Knowledge and Skills

Students interested in this project should have experience with Python and an interest in ML applications.