DIMACS TR: 98-15

Bellman Functions on Trees for Segmentation, Generalized Smoothing, Matching Multi-Alignment in Massive Data Sets

Authors: Ilya Muchnik and Vadim Mottl


A massive data set is considered as a set of experimentally acquired values of a number of variables each of which is associated with the respective node of an undirected adjacency graph that presets the fixed structure of the data set. The class of data analysis problems under consideration is outlined by the assumption that the ultimate aim of processing can be represented as a transformation of the original data array into a secondary array of the same structure but with node variables of, generally speaking, different nature, i.e. different ranges. Such a generalized problem is set as the formal problem of optimization (minimization or maximization) of a real-valued objective function of all the node variables. The objective function is assumed to consist of additive constituents of one or two arguments, respectively, node and edge functions. The former of them carry the data-dependent information on the sought-for values of the secondary variables, whereas the latter ones are meant to express the a priori model constraints. For the case when the graph of the pair-wise adjacency of the data set elements has the form of a tree, an effective global optimization procedure is proposed which is based on a recurrent decomposition of the initial optimization problem over all the node variables into a succession of partial problems each of which consists in optimization of an intervening function of only one variable like Bellman functions in the classical dynamic programming. Two kinds of numerical realization of the basic optimization procedure are considered on the basis of parametric representation of the Bellman functions, respectively, for discretely defined and quadratic node and edge functions. The proposed theoretical approach to the analysis of massive data sets is illustrated with its applications to the problems of segmentation, smoothing, fine texture analysis and matching of visual images and geophysical explorative data, as well as to the problem of multi-alignment of long molecular sequences.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1998/98-15.ps.gz
DIMACS Home Page