Ph.D. Seminar – Masood Zamani
Join us Tuesday, April 26 at 9:30am in Reynolds 219 for the 2nd Ph.D. seminar by Ph.D. Candidate Masood Zamani.
Protein Secondary Structure (PSS) Prediction through a Novel framework of Predicting PSS Transition Sites and New Encoding Schemes.
Rapid progress in genomics has led to the discovery of millions of protein sequences while less than 0.2% of the sequenced proteins' structures have been resolved by X-ray crystallography and NMR spectroscopy which are time consuming, complex and expensive. Using computational models for protein structure prediction at secondary and tertiary levels, with regards to advances in computational resources, provide alternative ways to overcome the gap of protein structure determination. In addition, a handful of new protein folds has been only identified in the past eight years based on the protein classification of CATH and SCOP . It has been shown, that state-of-the-art protein secondary structure (PSS) prediction methods employ machine learning (ML) techniques, compared to early approaches based on statistical information and information theory.
In this study, we developed a two-phase PSS prediction method based on Artificial Neural Networks (ANNs) and Genetic Programming (GP) through a novel PSS transition site prediction method, and new amino acid encoding schemes derived from the genetic Codon mappings, Clustering and Information theory. PSS transition sites predicted from protein sequences represent "linear" structural information which reduces the input space and learning parameters of a PSS prediction model. In addition, PSS transition sites are valuable information that can be utilized in homology modeling when the boundaries of speculated secondary structures cannot be defined. The prediction performance of the proposed method is evaluated by using Q3 and segment overlap (SOV) scores on two commonly used datasets, RS126 and CB513, and the latest protein dataset, PICES, which are compiled with very strict homology measures.
The experimental results and statistical analyses of the proposed PSS model indicate considerable improvements in PSS prediction accuracy compared to those of most known two-tier ML architectures, ANNs and SVMs. The proposed amino acid encodings shows advantages in extracting sequence information, reducing input parameters and training performances. A successful PSS prediction model can be utilized in protein 3D structure prediction methods to guide more accurate and rapid structure prediction which has important applications in medicine, agriculture and the biological sciences.
Advisor: Dr. Stefan Kremer
Advisory Committee: Dr. Medhat Moussa