Interdisciplinary Seminar Series

Title: Clustering Biological Data Using Mixture Models

Speaker: Alexander Schliep, Rutgers (CS and BioMaps)

Date: Monday, November 2, 2009 12:00 - 1:00 pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ

Abstract:

Clustering is a routine task in many biology applications for example to organize data, to predict function or to identify novel sub- populations for further studies. There are some pecularities of biological data, which make the process of clustering particularly interesting. The dimension of the data might be larger than the number of samples, few features might be informative---and possibly only for a subset of clusters---and error rates are typically high. More importantly, classes in the data will overlap as, for example, function assignment is ambigous, if genes partake in several pathways.

Mixture models have been proven to address those issues and provide state-of-the-art solutions to many clustering problems in biology. We will present some extensions to the classic machinery which deal with the dimensionality problem, and which allow us to use secondary data in clustering. We will present the detection of clusters of syn-expressed (temporal and spatial co-expression) genes from in-situ images and gene expression time-courses during embryogenesis.

Slides Clustering Biological Data Using Mixture Models