## DIMACS TR: 97-16

## Optimization algorithms for
separable functions with tree-like adjacency of variables and their
application to the analysis of massive data sets

### Authors: Ilya Muchnik and Vadim Mottl

**
ABSTRACT
**

A massive data set is considered as a set of experimentally
acquired values of a number of variables each of which is associated with the
respective node of an unoriented conjugacy graph that presets the fixed
structure
of the data set. The class of data analysis problems under consideration is
outlined by the assumption that the ultimate aim of processing can be
represented
as a transformation of the original data array into a secondary array of the
same
structure but with node variables of, generally speaking, different nature,
i.e.
different ranges. Such a generalized problem is set as the formal problem of
optimization (minimization or maximization) of a real-valued objective function
of
all the node variables. The objective function is assumed to consist of
additive
constituents of one or two arguments, respectively, node and edge functions.
The
former of them carry the data-dependent information on the sought-for values of
the secondary variables, whereas the latter ones are meant to express the a
priori
model constraints. For the case when the graph of the pair-wise adjacency of
the
data set elements has the form of a tree, an effective global optimization
procedure is proposed which is based on a recurrent decomposition of the
initial
optimization problem over all the node variables into a succession of partial
problems each of which consists in optimization of a function of only one
variable
like Bellman functions in the classical dynamic programming. Two kinds of
numerical realization of the basic optitimization procedure are considered on
the
basis of parametrical representation of Bellman functions, respectively, for
discretely defined and quadratic node and edge functions. The proposed
theoretical
approach to the analysis of massive data sets is illustarted with its
applications to the problems of the analysis of long molecular sequences and
large
visual images.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-16.ps.gz

DIMACS Home Page