DIMACS Workshop on Algorithms for Multidimensional Scaling

August 10, 2001
DIMACS Center, Rutgers University, Piscataway

Organizers:
J. Douglas Carroll, Rutgers University, dcarroll@rci.rutgers.edu
Phipps Arabie, Rutgers University, arabie@andromeda.rutgers.edu
Lawrence J. Hubert, University of Illinois, lhubert@s.psych.uiuc.edu
Presented under the auspices of the Special Focus on Data Analysis and Mining.

Abstracts:


1.

Multiobjective Multidimensional Scaling
Michael Brusco
Florida State University

A number of previous studies have investigated combinatorial data analysis
problems where two or more objective criteria are relevant.  In such
situations, it is often important to generate (or, at least, estimate) the
set of efficient solutions, or, at least, a subset of those solutions.  For
example, Delattre and Hansen (1980) presented an algorithm for generating
the entire efficient set of partitions for a bicriterion clustering
problem.  The two relevant criteria pertained to the homogeneity
(minimization of diameter) and separation (maximization of split) of the
resulting partitions.  Ferligoj and Batagelj (1992) provide a somewhat
broader perspective of the use of multiobjective programming in cluster
analysis.  They note that multicriterion problems can be modeled in a
number of ways including, (a) reduction to a single criterion; (b)
constrained clustering, and (c) consnesus clustering, and (d) direct
clustering algorithms.  The authors subsequently presented and demonstrated
two direct algorithms for multicriterion hierarchical clustering.
More recently, Brusco, Cradit, and Stahl (2001) studied a bicriterion in
market segmentation.  The practical aspects of this paper pertained to the
development of clusters that were homogeneous with respect to a set of
descriptor variables, yet also enabled adequate explanation of a single
response variable.  A simulated annealing heuristic was used to provide a
solution to a scalar optimization problem with a weighted function of
variance measures for the descriptor variables and response
variable.  Brusco, Cradit, and Tashchian (2001) have recently extended this
methodology to the case of two or more response variables, as well as to
situations where it is desirable to generate predictive models for each
cluster.  Specifically, regression equations are established for each
response variable for each cluster using the descriptor variables as
predictors.  The result is a multicriterion version of Sp=E4th=92s (1979)
clusterwise linear regression model, where the regression sum-of squares
for each response variable is of independent interest and the homogeneity
of the clusters with respect to the descriptor variables is also considered
as part of the weighted objective function.

Brusco and Stahl (2001) demonstrated a mulitobjective programming approach
for seriation of asymmetric proximity matrices.  Their analyses focused on
a single n  n proximity matrix and the goal was to provide an ordering that
optimized a weighted objective function for multiple structural indices
originally suggested by Hubert and Golledge (1981).  Dynamic programming
was used to solve the scalar optimization problem for a number of weighting
schemes with the objective of generating a relevant portion of the
efficient set.  Brusco (2001) has recently extended the basic methodology
to the case of multiple proximity matrices.  In this problem, the terms in
the weighted objective function pertain to the same structural index
measure but there is one term for each of the different relevant
matrices.  The goal is to find a single ordering that provides an adequate
structural fit for each of the matrices.

In this workshop, we will investigate the extension of the multiobjective
programming paradigm to multidimensional scaling.  There are at least two
possible avenues for this investigation.  The first of these corresponds to
an extension of  Brusco and Stahl=92s (2001) method for unidimensional
seriation.  Given an n  n symmetric proximity matrix, it might be desirable
to fit a multidimensional structure (say, a two-dimensional structure) to
the matrix while considering two or more loss functions (least squares and
least absolute deviation loss functions would perhaps be the logical
starting point).  The goal would be to identify a two-dimensional scaling
solution that yields good loss function values for all relevant loss
functions.

The second avenue of research is, perhaps, much more promising. Consider a
situation where K n  n proximity matrices are available for a set of
stimuli.  Although it might be possible to aggregate the information from
the K matrices in some fashion and employ metric multidimensional scaling
techniques in the standard fashion, such an approach is apt to cloud the
appropriateness of the two dimensional solution for some of the
matrices.  Another option is to run separate analyses for each of the K
matrices independently and hope that they reveal similar
structures  Suppose that a least-squares two-dimensional solution (using
the city-block metric) were fit to each of the K matrices independently,
perhaps using combinatorial methods suggested by Brusco (2001) and Hubert,
Arabie, and Hesson-McInnis (1992).  The resulting best two-dimensional
solutions for the K matrices might be very different both in terms of the
ordering of objects on each dimension and the coordinates themselves.
Multiobjective programming provides a plausible alternative to the two
options suggested above.  The combinatorial methods of Brusco (2001) and
Hubert, Arabie, and Hesson-McInnis (1992) can be modified to optimize a
weighted function of least-squares loss functions (perhaps standardized
based on their best uniobjective loss function values) for the K
criteria.  The goal of the combinatorial search would be to identify an
ordering for each of the dimensions (say, for example, two dimensions) that
would yield good scaling solutions for each of the K matrices.  In other
words, the combinatorial search would focus on the orderings and
non-negative least squares would be applied to find the optimal coordinates
for each of the K matrices.  Upon the completion of the process, the K
matrices would have identical orderings on both dimensions, but different
coordinate structures.  Alternatively, multiobjective programming could be
used to identify a single coordinate representation for each of the K
matrices, but this would seem rather impractical in many settings.

During the workshop, we will explore the utility of mulitobjective
programming for metric multidimensional scaling of multiple proximity
matrices.  Results recently obtained by Brusco (2001) for unidimensional
seriation problems have been very promising, indicating that it is often
possible to identify an ordering that yields good to excellent index values
for each of the relevant proximity matrices.  Whether or not these findings
will extend to the case of least squares multidimensional (city-block)
scaling is the question of interest for the workshop.

References
BRUSCO, M. J. (2001), =93Fitting Structures to Multiple Proximity Matrices
Using Multiobjective Programming,=94 Working paper, Florida State
University.

BRUSCO, M. J., CRADIT, J. D., and STAHL, S. (2001), A Simulated Annealing
Heuristic for a Bicriterion Partitioning Problem in Market Segmentation,
Journal of Marketing Research, in press.

BRUSCO, M. J., CRADIT, J. D., and TASHCHIAN, A. (2001), Multicriterion
Clusterwise Regression: Identifying Homogeneous Segments while Explaining
Customer Satisfaction and Brand Switching, Working paper, Florida State
University.

BRUSCO, M. J., and STAHL, S. (2001), An Interactive Approach to
Multiobjective Combinatorial Data Analysis, Psychometrika, 66, 5-24.
DELATTRE, M. and HANSEN, P. (1980), Bicriterion Cluster Analysis, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 2, 277-291.

FERLIGOJ, A. and BATAGELJ, V. (1992), Direct Multicriteria Clustering
Algorithms, Journal of Classification, 9, 43-61.

HUBERT, L., ARABIE, P., and HESSON-MCINNIS, M. (1992), Multidimensional
Scaling in the City-Block Metric: A Combinatorial Approach, Journal of
Classification, 9, 211-236.

HUBERT, L. J., and GOLLEDGE, R. G. (1981), Matrix Reorganization and
Dynamic Programming: Applications to Paired Comparisons and Unidimensional
seriation,=94 Psychometrika, 46, 429-441.

SPATH, H. (1979), Algorithm 39: Clusterwise Linear Regression,
Computing, 22, 367-73.


2. TITLE: Data Visualization with Multidimensional Scaling SPEAKER: Andreas Buja, AT&T Labs JOINT WORK WITH: Deborah Swayne, Michael Littman, Heike Hofmann (AT&T Labs) Nate Dean (Rice University) ABSTRACT: MDS is largely a tool for creating spatial maps of abstract objects and rendering them visually. One can therefore expect synergies if MDS is embedded in a data visualization system. We demonstrate an example of such an embedding, a freeware system called XGvis. The synergies between data visualization and MDS are numerous because the results of direct manipulation of MDS maps can be instantly fed back into MDS. For example, subsets of objects can be selected and they may be remapped separately. Or points may be dragged manually to examine the stability of an MDS map under perturbations. At the same time, interactive selection of MDS parameters permits immediate feedback about their effects. For example, the parameter that specifies the Minkowski metric may be manipulated on a slider while the map updates itself continually. Finally, auxiliary diagnostics such as Shepard diagrams may be generated at any time. The marriage of an MDS implementation with a data visualization system adds an element of fun to the analysis, not unlike a video game. This, however, should not distract from the fact that the benefits of such capabilities are serious: they afford unique familiarity with data and methods.
3. Deriving Market Structures via Additive Decomposition of Market Shares (Application of three-way generalized SINDCLUS.) Anil Chaturvedi, Kraft Foods J. Douglas Carroll, Rutgers University Most existing Market Structure Models are frequently based on Brand-switching data. Brand switching data present difficulties in defining the switching "periods". Other models are based on choice modeling techniques. These techniques result in either overstatements or understatements of the various marketing mix elements (Advertising, trade promotions, consumer promotions, etc) depending on the data used - Market level scanner data, Store level scanner data, or Panel data, or a mix. Moreover, choice modeling techniques are difficult to execute on a regular basis. We present an approach to deriving market structure based on volume share, or Share of Requirement (SOR) measures from household level data within a category to define market structure. The data collection task is fairly easy for retailers (Frequent Shopper Data) or Panel providers. We assume that there are underlying clusters of user-groups and brands, where brands have similar market share patterns that define the market structure within that cluster. The closest analog to this approach is block-clustering (Hartigan et al 1997), which partitions =93blocks=94 within a 2-dimensional map or other 2-dimensional data. DeSarbo=92s GENNCLUS model (1982) does provide overlapping block clustering, but the model cannot be used for multiple periods, and the estimation procedure is cumbersome. We present an empirical illustration of our suggested 3-way market-structure model to user-group x market share data across five geographic regions for the Frozen Pizza category.
4. NONMETRIC UNFOLDING WITH PENALTY FOR LACK OF VARIATION IN THE TRANSFORMED PROXIMITIES Willem J. Heiser, Frank M.T.A. Busing & Patrick J.F. Groenen Leiden University, The Netherlands heiser@fsw.leidenuniv.nl Multidimensional unfolding methods suffer from the degeneracy problem in almost all circumstances. Most degeneracies are easily recognized: the solutions are perfect but trivial, and are characterized by approximately equal distances between pairs of points selected from the two different sets that are scaled. Many solutions for the degeneracy problem have been proposed and tested, but with little success so far. According to Kim, Rangaswamy, and De Sarbo (1999), =93the exact causes of degeneracy are currently unknown=94. Our own suggestion is that degenerate solutions occur through the presence of an intercept in the data transformations. We offer a fundamental modification of the approach initiated by Kruskal and Carroll (1969), who introduced a normalization factor based on the variance in the usual least squares loss function. Heiser (1981) and De Leeuw (1983) showed that the normalization factor proposed by Kruskal and Carroll was not strong enough to avoid degeneracies. The factor proposed in the present paper for ordinal and interval-scale data discourages or penalizes transformations of the proximities with small variation, so that the procedure steers away from solutions with small variation in the interpoint distances. In contrast to the variance, which is an absolute measure of variability around the mean, the coefficient of variation is a relative measure that relates variability around the mean to the distance of the mean from the origin. The coefficient of variation decreases if a distribution is uniformly moved away from the origin, while its variance remains the same. So it is appropriate to use a penalty that is inversely related to the coefficient of variation. When the transformation is split-by-rows, we use the harmonic mean of the row-wise variation coefficients to keep the aggregate factor sensitive for low variability in any row. A convergent algorithm is described for minimizing the re-adjusted loss function, based on iterative majorization (Busing, Groenen, and Heiser, 2001). The results of a simulation study are discussed, in which the best range of the penalty parameters is determined. Two empirical data sets are analyzed by our method, clearly showing the benefits of the proposed loss function. Key words: degeneration, normalized stress, badness-of-fit function, penalty function, coefficient of variation, harmonic mean, iterative majorization. References Busing, F.M.T.A., Groenen, P.J.F., & Heiser, W.J. (2001). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Manuscript submitted for publication. De Leeuw, J. (1983). On degenerate nonmetric unfolding solutions. Technical Report, Department of Data Theory, Faculty of Social Sciences, Leiden University. Heiser, W.J.(1981). Unfolding Analysis of Proximity Data. Unpublished Doctoral Dissertation, Leiden University. Kim, C. Rangaswamy, A., & De Sarbo, W.S. (1999). A quasi-metric approach to multidimensional unfolding for reducing the occurrence of degenerate solutions. Multivariate Behavioral Research, 34(2), 143-180. Kruskal, J.B. & Carroll, (1969). Geometrical models and badness-of-fit functions. In P.R. Krishnaiah (Ed.), Multivariate Analysis, Vol. II, pp. 639-671. Amsterdam: North-Holland.
5. Applications of MDS and Clustering Methods in Personality and Social Psychology Seymour Rosenberg, Rutgers University MDS and clustering algorithms have been used in personality and social psychology for several decades to represent structures implicit in two-way matrices (objects x attributes). The general aim in this research is to obtain a structure for the attributes (a/o objects) using one or more serviceable measures of proximity. The resulting structure is then interpreted in a larger theoretical context, often with the help of independently measured properties of the elements in the structure. Co-occurrences in sorting data also provide serviceable measures of proximity for the objects sorted by judges (or found "naturalistically"). Sorting methods are widely used in several substantive domains, including person perception, clinical symptomatology, and cognition (Rosenberg, 1982). Also of recent interest, on the data-analysis side, are two-mode representations that simultaneously display the structure of the objects, the structure of the attributes, and their associative relations (Rosenberg, Van Mechelen, and De Boeck, 1996). Such data-analysis methods are particularly useful for data matrices with binary entries (an object does/does not possess the attribute) and are appealing in that they do not require a derived proximity measure. A brief overview of typical empirical applications of these structural methods will be given, followed by a more detailed description of one or two empirical applications in personality and social psychology. Rosenberg, S. (1982). The method of sorting in multivariate research with applications selected from cognitive psychology and person perception. In N. Hirschberg & L. Humphreys (Eds.), Multivariate applications in the social sciences (pp. 117-142). Hillsdale, NJ: Erlbaum Associates. Rosenberg, S., Van Mechelen, I., & De Boeck, P. (1996). A hierarchical classes model: Theory and method with applications in psychology and psychopathology. In P. Arabie, L. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 123-155). Teaneck, NJ: World Scientific.

Previous: Participation
Next: Registration
Workshop Index
DIMACS Homepage
Contacting the Center

Document last modified on August 13, 2001.