Using side-effect machines to classify intrinsically disordered proteins

Advisors:
Steffen Graether (MCB)
Dan Ashlock (Math and Stats)

A side effect machine is a finite state machine that contains counters on its states. Running a DNA sequence through a side effect machine creates a fixed-size collection of numerical features describing the DNA.  This project will use machine learning techniques, in particular the evolutionary algorithm, to locate side effect machines that can classify intrinsically disordered proteins (IDPs). IDPs are proteins that do not have ordered secondary or tertiary structure, breaking the biochemical dogma that the structure of a protein determines its function and vice versa. An example of an IDP family under study includes the dehydrins. This project adds on to an existing long-term project to classify dehydrins, plant proteins that protect from cold and drought damage. If good classifiers are located these will then be used to sieve genetic databases for undiscovered members of dehydrins. This may provide information on their evolutionary origin, which we speculate may have arisen as a frameshift mutation.