Finding Composite Regulatory Patterns in DNA Sequences
The San Diego Division of Calit² presents:
Finding Composite Regulatory Patterns in DNA Sequences
Presenter: Eleazar Eskin, Ph.D.
Host: Professor Pavel Pevzner, Dept. of Computer Science and Engineering, UCSD - For questions and to contact Professor Pevzner, please email Kate Tull at ktull@soe.ucsd.edu.
Date: Friday, August 30, 2002
Time: 10:00 AM, Reception to follow
Location: CMRR Auditorium, UCSD Campus, La Jolla [Directions] [Parking Information]
Live Webcast: http://earth.ucsd.edu:8080/ramgen/encoder/eskin.rm
Archived Webcasts available at: http://www.calit2.net/multimedia/archive.html
Courtesy: California Institute for Telecommunications and Information Technology [Calit²]
Abstract: Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad signals. A difficulty in discovering composite signals is that one of the component monad signal in the groups may be "too weak". Since the traditional monad-based signal finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts.
In this project, we present a new algorithm for discovering composite signals. The core of the algorithm is an exhaustive pattern search algorithm like the classical Waterman et al., 1984 sample-driven algorithm for discovering all patterns that occur at least a certain number of times with up to several mismatches. However, discovering composite signals requires exhaustive search of much longer patterns than is possible with the classical algorithms. We present a new algorithm, MITRA (MIsmatch TRee Algorithm), to perform this search which improves on previous exhaustive search algorithms. We present several sets of experiments over biological and synthetic data and demonstrate that our approach performs well for both monad and composite signals. We also show that MITRA can scale to genome wide pattern discovery. We present results of applying MITRA to discover composite signals in bacterial genomes. [View Power Point Slides]
Bio: Eleazar Eskin received his Ph.D. in the Computer Science Department at Columbia University. Previously, he received his bachelor's degree from the University of Chicago. His areas of interest are Computational Biology and Bioinformatics, and specifically the application of Machine Learning techniques to these areas.