DIMACS Workshop on Algorithms for Multidimensional Scaling

August 10, 2001
DIMACS Center, Rutgers University, Piscataway

Organizers:: J. Douglas Carroll, Rutgers University, dcarroll@rci.rutgers.edu; Phipps Arabie, Rutgers University, arabie@andromeda.rutgers.edu; Lawrence J. Hubert, University of Illinois, lhubert@s.psych.uiuc.edu
Presented under the auspices of the Special Focus on Data Analysis and Mining.
Abstracts:


1.

Multiobjective Multidimensional Scaling
Michael Brusco
Florida State University

A number of previous studies have investigated combinatorial data analysis
problems where two or more objective criteria are relevant.  In such
situations, it is often important to generate (or, at least, estimate) the
set of efficient solutions, or, at least, a subset of those solutions.  For
example, Delattre and Hansen (1980) presented an algorithm for generating
the entire efficient set of partitions for a bicriterion clustering
problem.  The two relevant criteria pertained to the homogeneity
(minimization of diameter) and separation (maximization of split) of the
resulting partitions.  Ferligoj and Batagelj (1992) provide a somewhat
broader perspective of the use of multiobjective programming in cluster
analysis.  They note that multicriterion problems can be modeled in a
number of ways including, (a) reduction to a single criterion; (b)
constrained clustering, and (c) consnesus clustering, and (d) direct
clustering algorithms.  The authors subsequently presented and demonstrated
two direct algorithms for multicriterion hierarchical clustering.
More recently, Brusco, Cradit, and Stahl (2001) studied a bicriterion in
market segmentation.  The practical aspects of this paper pertained to the
development of clusters that were homogeneous with respect to a set of
descriptor variables, yet also enabled adequate explanation of a single
response variable.  A simulated annealing heuristic was used to provide a
solution to a scalar optimization problem with a weighted function of
variance measures for the descriptor variables and response
variable.  Brusco, Cradit, and Tashchian (2001) have recently extended this
methodology to the case of two or more response variables, as well as to
situations where it is desirable to generate predictive models for each
cluster.  Specifically, regression equations are established for each
response variable for each cluster using the descriptor variables as
predictors.  The result is a multicriterion version of Sp=E4th=92s (1979)
clusterwise linear regression model, where the regression sum-of squares
for each response variable is of independent interest and the homogeneity
of the clusters with respect to the descriptor variables is also considered
as part of the weighted objective function.

Brusco and Stahl (2001) demonstrated a mulitobjective programming approach
for seriation of asymmetric proximity matrices.  Their analyses focused on
a single n  n proximity matrix and the goal was to provide an ordering that
optimized a weighted objective function for multiple structural indices
originally suggested by Hubert and Golledge (1981).  Dynamic programming
was used to solve the scalar optimization problem for a number of weighting
schemes with the objective of generating a relevant portion of the
efficient set.  Brusco (2001) has recently extended the basic methodology
to the case of multiple proximity matrices.  In this problem, the terms in
the weighted objective function pertain to the same structural index
measure but there is one term for each of the different relevant
matrices.  The goal is to find a single ordering that provides an adequate
structural fit for each of the matrices.

In this workshop, we will investigate the extension of the multiobjective
programming paradigm to multidimensional scaling.  There are at least two
possible avenues for this investigation.  The first of these corresponds to
an extension of  Brusco and Stahl=92s (2001) method for unidimensional
seriation.  Given an n  n symmetric proximity matrix, it might be desirable
to fit a multidimensional structure (say, a two-dimensional structure) to
the matrix while considering two or more loss functions (least squares and
least absolute deviation loss functions would perhaps be the logical
starting point).  The goal would be to identify a two-dimensional scaling
solution that yields good loss function values for all relevant loss
functions.

The second avenue of research is, perhaps, much more promising. Consider a
situation where K n  n proximity matrices are available for a set of
stimuli.  Although it might be possible to aggregate the information from
the K matrices in some fashion and employ metric multidimensional scaling
techniques in the standard fashion, such an approach is apt to cloud the
appropriateness of the two dimensional solution for some of the
matrices.  Another option is to run separate analyses for each of the K
matrices independently and hope that they reveal similar
structures  Suppose that a least-squares two-dimensional solution (using
the city-block metric) were fit to each of the K matrices independently,
perhaps using combinatorial methods suggested by Brusco (2001) and Hubert,
Arabie, and Hesson-McInnis (1992).  The resulting best two-dimensional
solutions for the K matrices might be very different both in terms of the
ordering of objects on each dimension and the coordinates themselves.
Multiobjective programming provides a plausible alternative to the two
options suggested above.  The combinatorial methods of Brusco (2001) and
Hubert, Arabie, and Hesson-McInnis (1992) can be modified to optimize a
weighted function of least-squares loss functions (perhaps standardized
based on their best uniobjective loss function values) for the K
criteria.  The goal of the combinatorial search would be to identify an
ordering for each of the dimensions (say, for example, two dimensions) that
would yield good scaling solutions for each of the K matrices.  In other
words, the combinatorial search would focus on the orderings and
non-negative least squares would be applied to find the optimal coordinates
for each of the K matrices.  Upon the completion of the process, the K
matrices would have identical orderings on both dimensions, but different
coordinate structures.  Alternatively, multiobjective programming could be
used to identify a single coordinate representation for each of the K
matrices, but this would seem rather impractical in many settings.

During the workshop, we will explore the utility of mulitobjective
programming for metric multidimensional scaling of multiple proximity
matrices.  Results recently obtained by Brusco (2001) for unidimensional
seriation problems have been very promising, indicating that it is often
possible to identify an ordering that yields good to excellent index values
for each of the relevant proximity matrices.  Whether or not these findings
will extend to the case of least squares multidimensional (city-block)
scaling is the question of interest for the workshop.

References
BRUSCO, M. J. (2001), =93Fitting Structures to Multiple Proximity Matrices
Using Multiobjective Programming,=94 Working paper, Florida State
University.

BRUSCO, M. J., CRADIT, J. D., and STAHL, S. (2001), A Simulated Annealing
Heuristic for a Bicriterion Partitioning Problem in Market Segmentation,
Journal of Marketing Research, in press.

BRUSCO, M. J., CRADIT, J. D., and TASHCHIAN, A. (2001), Multicriterion
Clusterwise Regression: Identifying Homogeneous Segments while Explaining
Customer Satisfaction and Brand Switching, Working paper, Florida State
University.

BRUSCO, M. J., and STAHL, S. (2001), An Interactive Approach to
Multiobjective Combinatorial Data Analysis, Psychometrika, 66, 5-24.
DELATTRE, M. and HANSEN, P. (1980), Bicriterion Cluster Analysis, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 2, 277-291.

FERLIGOJ, A. and BATAGELJ, V. (1992), Direct Multicriteria Clustering
Algorithms, Journal of Classification, 9, 43-61.

HUBERT, L., ARABIE, P., and HESSON-MCINNIS, M. (1992), Multidimensional
Scaling in the City-Block Metric: A Combinatorial Approach, Journal of
Classification, 9, 211-236.

HUBERT, L. J., and GOLLEDGE, R. G. (1981), Matrix Reorganization and
Dynamic Programming: Applications to Paired Comparisons and Unidimensional
seriation,=94 Psychometrika, 46, 429-441.

SPATH, H. (1979), Algorithm 39: Clusterwise Linear Regression,
Computing, 22, 367-73.



2.

TITLE: Data Visualization with Multidimensional Scaling

SPEAKER: Andreas Buja, AT&T Labs

JOINT WORK WITH: Deborah Swayne, Michael Littman, Heike Hofmann (AT&T Labs)
                  Nate Dean (Rice University)

ABSTRACT:

MDS is largely a tool for creating spatial maps of abstract objects
and rendering them visually.  One can therefore expect synergies if
MDS is embedded in a data visualization system.  We demonstrate an
example of such an embedding, a freeware system called XGvis.

The synergies between data visualization and MDS are numerous because
the results of direct manipulation of MDS maps can be instantly fed
back into MDS.  For example, subsets of objects can be selected and
they may be remapped separately.  Or points may be dragged manually to
examine the stability of an MDS map under perturbations.  At the same
time, interactive selection of MDS parameters permits immediate
feedback about their effects.  For example, the parameter that
specifies the Minkowski metric may be manipulated on a slider while
the map updates itself continually.  Finally, auxiliary diagnostics
such as Shepard diagrams may be generated at any time.

The marriage of an MDS implementation with a data visualization system
adds an element of fun to the analysis, not unlike a video game.
This, however, should not distract from the fact that the benefits of
such capabilities are serious: they afford unique familiarity with
data and methods.


3.

Deriving Market Structures via Additive Decomposition 
of Market Shares (Application of three-way generalized SINDCLUS.)
                                        
Anil Chaturvedi, Kraft Foods
J. Douglas Carroll, Rutgers University


Most existing Market Structure Models  are frequently based on
Brand-switching data. Brand switching data present difficulties in 
defining the switching "periods". Other models are based on choice 
modeling techniques. These techniques result in either overstatements
or understatements of the various marketing mix elements (Advertising,
trade promotions, consumer promotions, etc) depending on the data used -
Market level scanner data, Store level scanner data, or Panel data, or a
mix. Moreover, choice modeling techniques are difficult to execute on a
regular basis.

We present an approach to deriving market structure based on volume share,
or Share of Requirement (SOR) measures from household level data within a
category to define market structure. The data collection task is fairly
easy for retailers (Frequent Shopper Data) or Panel providers.  We assume
that there are underlying clusters of user-groups and brands, where brands
have similar market share patterns that define the market structure within
that cluster. The closest analog to this approach is block-clustering
(Hartigan et al 1997), which partitions =93blocks=94 within a 2-dimensional
map or other 2-dimensional data. DeSarbo=92s GENNCLUS model (1982) does 
provide overlapping block clustering, but the model cannot be used for 
multiple periods, and the estimation procedure is cumbersome.

We present an empirical illustration of our suggested 3-way
market-structure  model to user-group x market share data across five
geographic regions for the Frozen Pizza category.


4.

NONMETRIC UNFOLDING WITH PENALTY FOR LACK OF VARIATION
IN THE TRANSFORMED PROXIMITIES
Willem J. Heiser, Frank M.T.A. Busing & Patrick J.F. Groenen
Leiden University, The Netherlands
heiser@fsw.leidenuniv.nl

Multidimensional unfolding methods suffer from the degeneracy problem in
almost all circumstances. Most degeneracies are easily recognized: the
solutions are perfect but trivial, and are characterized by approximately
equal distances between pairs of points selected from the two different
sets that are scaled. Many solutions for the degeneracy problem have been
proposed and tested, but with little success so far. According to Kim,
Rangaswamy, and De Sarbo (1999), =93the exact causes of degeneracy are
currently unknown=94. Our own suggestion is that degenerate solutions occur
through the presence of an intercept in the data transformations.
We offer a fundamental modification of the approach initiated by Kruskal
and Carroll (1969), who introduced a normalization factor based on the
variance in the usual least squares loss function. Heiser (1981) and De
Leeuw (1983) showed that the normalization factor proposed by Kruskal and
Carroll was not strong enough to avoid degeneracies. The factor proposed in
the present paper for ordinal and interval-scale data discourages or
penalizes transformations of the proximities with small variation, so that
the procedure steers away from solutions with small variation in the
interpoint distances. In contrast to the variance, which is an absolute
measure of variability around the mean, the coefficient of variation is a
relative measure that relates variability around the mean to the distance
of the mean from the origin. The coefficient of variation decreases if a
distribution is uniformly moved away from the origin, while its variance
remains the same. So it is appropriate to use a penalty that is inversely
related to the coefficient of variation. When the transformation is
split-by-rows, we use the harmonic mean of the row-wise variation
coefficients to keep the aggregate factor sensitive for low variability in
any row.

A convergent algorithm is described for minimizing the re-adjusted loss
function, based on iterative majorization (Busing, Groenen, and Heiser,
2001). The results of a simulation study are discussed, in which the best
range of the penalty parameters is determined. Two empirical data sets are
analyzed by our method, clearly showing the benefits of the proposed loss
function.

Key words: degeneration, normalized stress, badness-of-fit function,
penalty function, coefficient of variation, harmonic mean, iterative
majorization.

References

Busing, F.M.T.A., Groenen, P.J.F., & Heiser, W.J. (2001). Avoiding
degeneracy in multidimensional unfolding by penalizing on the coefficient
of variation. Manuscript submitted for publication.

De Leeuw, J. (1983). On degenerate nonmetric unfolding solutions. Technical
Report, Department of Data Theory, Faculty of Social Sciences, Leiden
University.

Heiser, W.J.(1981). Unfolding Analysis of Proximity Data. Unpublished
Doctoral Dissertation, Leiden University.

Kim, C. Rangaswamy, A., & De Sarbo, W.S. (1999). A quasi-metric approach to
multidimensional unfolding for reducing the occurrence of degenerate
solutions. Multivariate Behavioral Research, 34(2), 143-180.

Kruskal, J.B. & Carroll, (1969). Geometrical models and badness-of-fit
functions. In P.R. Krishnaiah (Ed.), Multivariate Analysis, Vol. II, pp.
639-671. Amsterdam: North-Holland.



5.

Applications of MDS and Clustering Methods in Personality 
and Social Psychology
Seymour Rosenberg, Rutgers University

MDS and clustering algorithms have been used in personality and social
psychology for several decades to represent structures implicit in
two-way matrices (objects x attributes).  The general aim in this
research is to obtain a structure for the attributes (a/o objects) using
one or more serviceable measures of proximity.  The resulting structure
is then interpreted in a larger theoretical context, often with the help
of independently measured properties of the elements in the structure. 
Co-occurrences in sorting data also provide serviceable measures of
proximity for the objects sorted by judges (or found
"naturalistically").  Sorting methods are widely used in several
substantive domains, including person perception, clinical
symptomatology, and cognition (Rosenberg, 1982).  Also of recent
interest, on the data-analysis side, are two-mode representations that
simultaneously display the structure of the objects, the structure of
the attributes, and their associative relations (Rosenberg, Van
Mechelen, and De Boeck, 1996).  Such data-analysis methods are
particularly useful for data matrices with binary entries (an object
does/does not possess the attribute) and are appealing in that they do
not require a derived proximity measure.  A brief overview of typical
empirical applications of these structural methods will be given,
followed by a more detailed description of one or two empirical
applications in personality and social psychology.


            Rosenberg, S.  (1982).  The method of sorting in
multivariate research with applications selected from cognitive
psychology and person perception.  In N. Hirschberg & L. Humphreys
(Eds.), Multivariate applications in the social sciences (pp. 117-142). 
Hillsdale, NJ:  Erlbaum Associates.


Rosenberg, S., Van Mechelen, I., & De Boeck, P.  (1996).  A hierarchical
classes model:  Theory and method with applications in  psychology and
psychopathology.  In P. Arabie, L. Hubert, & G. De Soete (Eds.),
Clustering and classification (pp. 123-155).  Teaneck, NJ:  World
Scientific.
Previous: Participation
Next: Registration
Workshop Index
DIMACS Homepage
Contacting the Center

Document last modified on August 13, 2001.
DIMACS Workshop on Algorithms for Multidimensional Scaling

August 10, 2001 DIMACS Center, Rutgers University, Piscataway

Abstracts:

August 10, 2001
DIMACS Center, Rutgers University, Piscataway