DIMACS Center, Rutgers University, Piscataway

**Organizers:**- J. Douglas Carroll, Rutgers University, dcarroll@rci.rutgers.edu
- Phipps Arabie, Rutgers University, arabie@andromeda.rutgers.edu
- Lawrence J. Hubert, University of Illinois, lhubert@s.psych.uiuc.edu

1. Multiobjective Multidimensional Scaling Michael Brusco Florida State University A number of previous studies have investigated combinatorial data analysis problems where two or more objective criteria are relevant. In such situations, it is often important to generate (or, at least, estimate) the set of efficient solutions, or, at least, a subset of those solutions. For example, Delattre and Hansen (1980) presented an algorithm for generating the entire efficient set of partitions for a bicriterion clustering problem. The two relevant criteria pertained to the homogeneity (minimization of diameter) and separation (maximization of split) of the resulting partitions. Ferligoj and Batagelj (1992) provide a somewhat broader perspective of the use of multiobjective programming in cluster analysis. They note that multicriterion problems can be modeled in a number of ways including, (a) reduction to a single criterion; (b) constrained clustering, and (c) consnesus clustering, and (d) direct clustering algorithms. The authors subsequently presented and demonstrated two direct algorithms for multicriterion hierarchical clustering. More recently, Brusco, Cradit, and Stahl (2001) studied a bicriterion in market segmentation. The practical aspects of this paper pertained to the development of clusters that were homogeneous with respect to a set of descriptor variables, yet also enabled adequate explanation of a single response variable. A simulated annealing heuristic was used to provide a solution to a scalar optimization problem with a weighted function of variance measures for the descriptor variables and response variable. Brusco, Cradit, and Tashchian (2001) have recently extended this methodology to the case of two or more response variables, as well as to situations where it is desirable to generate predictive models for each cluster. Specifically, regression equations are established for each response variable for each cluster using the descriptor variables as predictors. The result is a multicriterion version of Sp=E4th=92s (1979) clusterwise linear regression model, where the regression sum-of squares for each response variable is of independent interest and the homogeneity of the clusters with respect to the descriptor variables is also considered as part of the weighted objective function. Brusco and Stahl (2001) demonstrated a mulitobjective programming approach for seriation of asymmetric proximity matrices. Their analyses focused on a single n n proximity matrix and the goal was to provide an ordering that optimized a weighted objective function for multiple structural indices originally suggested by Hubert and Golledge (1981). Dynamic programming was used to solve the scalar optimization problem for a number of weighting schemes with the objective of generating a relevant portion of the efficient set. Brusco (2001) has recently extended the basic methodology to the case of multiple proximity matrices. In this problem, the terms in the weighted objective function pertain to the same structural index measure but there is one term for each of the different relevant matrices. The goal is to find a single ordering that provides an adequate structural fit for each of the matrices. In this workshop, we will investigate the extension of the multiobjective programming paradigm to multidimensional scaling. There are at least two possible avenues for this investigation. The first of these corresponds to an extension of Brusco and Stahl=92s (2001) method for unidimensional seriation. Given an n n symmetric proximity matrix, it might be desirable to fit a multidimensional structure (say, a two-dimensional structure) to the matrix while considering two or more loss functions (least squares and least absolute deviation loss functions would perhaps be the logical starting point). The goal would be to identify a two-dimensional scaling solution that yields good loss function values for all relevant loss functions. The second avenue of research is, perhaps, much more promising. Consider a situation where K n n proximity matrices are available for a set of stimuli. Although it might be possible to aggregate the information from the K matrices in some fashion and employ metric multidimensional scaling techniques in the standard fashion, such an approach is apt to cloud the appropriateness of the two dimensional solution for some of the matrices. Another option is to run separate analyses for each of the K matrices independently and hope that they reveal similar structures Suppose that a least-squares two-dimensional solution (using the city-block metric) were fit to each of the K matrices independently, perhaps using combinatorial methods suggested by Brusco (2001) and Hubert, Arabie, and Hesson-McInnis (1992). The resulting best two-dimensional solutions for the K matrices might be very different both in terms of the ordering of objects on each dimension and the coordinates themselves. Multiobjective programming provides a plausible alternative to the two options suggested above. The combinatorial methods of Brusco (2001) and Hubert, Arabie, and Hesson-McInnis (1992) can be modified to optimize a weighted function of least-squares loss functions (perhaps standardized based on their best uniobjective loss function values) for the K criteria. The goal of the combinatorial search would be to identify an ordering for each of the dimensions (say, for example, two dimensions) that would yield good scaling solutions for each of the K matrices. In other words, the combinatorial search would focus on the orderings and non-negative least squares would be applied to find the optimal coordinates for each of the K matrices. Upon the completion of the process, the K matrices would have identical orderings on both dimensions, but different coordinate structures. Alternatively, multiobjective programming could be used to identify a single coordinate representation for each of the K matrices, but this would seem rather impractical in many settings. During the workshop, we will explore the utility of mulitobjective programming for metric multidimensional scaling of multiple proximity matrices. Results recently obtained by Brusco (2001) for unidimensional seriation problems have been very promising, indicating that it is often possible to identify an ordering that yields good to excellent index values for each of the relevant proximity matrices. Whether or not these findings will extend to the case of least squares multidimensional (city-block) scaling is the question of interest for the workshop. References BRUSCO, M. J. (2001), =93Fitting Structures to Multiple Proximity Matrices Using Multiobjective Programming,=94 Working paper, Florida State University. BRUSCO, M. J., CRADIT, J. D., and STAHL, S. (2001), A Simulated Annealing Heuristic for a Bicriterion Partitioning Problem in Market Segmentation, Journal of Marketing Research, in press. BRUSCO, M. J., CRADIT, J. D., and TASHCHIAN, A. (2001), Multicriterion Clusterwise Regression: Identifying Homogeneous Segments while Explaining Customer Satisfaction and Brand Switching, Working paper, Florida State University. BRUSCO, M. J., and STAHL, S. (2001), An Interactive Approach to Multiobjective Combinatorial Data Analysis, Psychometrika, 66, 5-24. DELATTRE, M. and HANSEN, P. (1980), Bicriterion Cluster Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 277-291. FERLIGOJ, A. and BATAGELJ, V. (1992), Direct Multicriteria Clustering Algorithms, Journal of Classification, 9, 43-61. HUBERT, L., ARABIE, P., and HESSON-MCINNIS, M. (1992), Multidimensional Scaling in the City-Block Metric: A Combinatorial Approach, Journal of Classification, 9, 211-236. HUBERT, L. J., and GOLLEDGE, R. G. (1981), Matrix Reorganization and Dynamic Programming: Applications to Paired Comparisons and Unidimensional seriation,=94 Psychometrika, 46, 429-441. SPATH, H. (1979), Algorithm 39: Clusterwise Linear Regression, Computing, 22, 367-73.

2. TITLE: Data Visualization with Multidimensional Scaling SPEAKER: Andreas Buja, AT&T Labs JOINT WORK WITH: Deborah Swayne, Michael Littman, Heike Hofmann (AT&T Labs) Nate Dean (Rice University) ABSTRACT: MDS is largely a tool for creating spatial maps of abstract objects and rendering them visually. One can therefore expect synergies if MDS is embedded in a data visualization system. We demonstrate an example of such an embedding, a freeware system called XGvis. The synergies between data visualization and MDS are numerous because the results of direct manipulation of MDS maps can be instantly fed back into MDS. For example, subsets of objects can be selected and they may be remapped separately. Or points may be dragged manually to examine the stability of an MDS map under perturbations. At the same time, interactive selection of MDS parameters permits immediate feedback about their effects. For example, the parameter that specifies the Minkowski metric may be manipulated on a slider while the map updates itself continually. Finally, auxiliary diagnostics such as Shepard diagrams may be generated at any time. The marriage of an MDS implementation with a data visualization system adds an element of fun to the analysis, not unlike a video game. This, however, should not distract from the fact that the benefits of such capabilities are serious: they afford unique familiarity with data and methods.

3. Deriving Market Structures via Additive Decomposition of Market Shares (Application of three-way generalized SINDCLUS.) Anil Chaturvedi, Kraft Foods J. Douglas Carroll, Rutgers University Most existing Market Structure Models are frequently based on Brand-switching data. Brand switching data present difficulties in defining the switching "periods". Other models are based on choice modeling techniques. These techniques result in either overstatements or understatements of the various marketing mix elements (Advertising, trade promotions, consumer promotions, etc) depending on the data used - Market level scanner data, Store level scanner data, or Panel data, or a mix. Moreover, choice modeling techniques are difficult to execute on a regular basis. We present an approach to deriving market structure based on volume share, or Share of Requirement (SOR) measures from household level data within a category to define market structure. The data collection task is fairly easy for retailers (Frequent Shopper Data) or Panel providers. We assume that there are underlying clusters of user-groups and brands, where brands have similar market share patterns that define the market structure within that cluster. The closest analog to this approach is block-clustering (Hartigan et al 1997), which partitions =93blocks=94 within a 2-dimensional map or other 2-dimensional data. DeSarbo=92s GENNCLUS model (1982) does provide overlapping block clustering, but the model cannot be used for multiple periods, and the estimation procedure is cumbersome. We present an empirical illustration of our suggested 3-way market-structure model to user-group x market share data across five geographic regions for the Frozen Pizza category.

4. NONMETRIC UNFOLDING WITH PENALTY FOR LACK OF VARIATION IN THE TRANSFORMED PROXIMITIES Willem J. Heiser, Frank M.T.A. Busing & Patrick J.F. Groenen Leiden University, The Netherlands heiser@fsw.leidenuniv.nl Multidimensional unfolding methods suffer from the degeneracy problem in almost all circumstances. Most degeneracies are easily recognized: the solutions are perfect but trivial, and are characterized by approximately equal distances between pairs of points selected from the two different sets that are scaled. Many solutions for the degeneracy problem have been proposed and tested, but with little success so far. According to Kim, Rangaswamy, and De Sarbo (1999), =93the exact causes of degeneracy are currently unknown=94. Our own suggestion is that degenerate solutions occur through the presence of an intercept in the data transformations. We offer a fundamental modification of the approach initiated by Kruskal and Carroll (1969), who introduced a normalization factor based on the variance in the usual least squares loss function. Heiser (1981) and De Leeuw (1983) showed that the normalization factor proposed by Kruskal and Carroll was not strong enough to avoid degeneracies. The factor proposed in the present paper for ordinal and interval-scale data discourages or penalizes transformations of the proximities with small variation, so that the procedure steers away from solutions with small variation in the interpoint distances. In contrast to the variance, which is an absolute measure of variability around the mean, the coefficient of variation is a relative measure that relates variability around the mean to the distance of the mean from the origin. The coefficient of variation decreases if a distribution is uniformly moved away from the origin, while its variance remains the same. So it is appropriate to use a penalty that is inversely related to the coefficient of variation. When the transformation is split-by-rows, we use the harmonic mean of the row-wise variation coefficients to keep the aggregate factor sensitive for low variability in any row. A convergent algorithm is described for minimizing the re-adjusted loss function, based on iterative majorization (Busing, Groenen, and Heiser, 2001). The results of a simulation study are discussed, in which the best range of the penalty parameters is determined. Two empirical data sets are analyzed by our method, clearly showing the benefits of the proposed loss function. Key words: degeneration, normalized stress, badness-of-fit function, penalty function, coefficient of variation, harmonic mean, iterative majorization. References Busing, F.M.T.A., Groenen, P.J.F., & Heiser, W.J. (2001). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Manuscript submitted for publication. De Leeuw, J. (1983). On degenerate nonmetric unfolding solutions. Technical Report, Department of Data Theory, Faculty of Social Sciences, Leiden University. Heiser, W.J.(1981). Unfolding Analysis of Proximity Data. Unpublished Doctoral Dissertation, Leiden University. Kim, C. Rangaswamy, A., & De Sarbo, W.S. (1999). A quasi-metric approach to multidimensional unfolding for reducing the occurrence of degenerate solutions. Multivariate Behavioral Research, 34(2), 143-180. Kruskal, J.B. & Carroll, (1969). Geometrical models and badness-of-fit functions. In P.R. Krishnaiah (Ed.), Multivariate Analysis, Vol. II, pp. 639-671. Amsterdam: North-Holland.

5. Applications of MDS and Clustering Methods in Personality and Social Psychology Seymour Rosenberg, Rutgers University MDS and clustering algorithms have been used in personality and social psychology for several decades to represent structures implicit in two-way matrices (objects x attributes). The general aim in this research is to obtain a structure for the attributes (a/o objects) using one or more serviceable measures of proximity. The resulting structure is then interpreted in a larger theoretical context, often with the help of independently measured properties of the elements in the structure. Co-occurrences in sorting data also provide serviceable measures of proximity for the objects sorted by judges (or found "naturalistically"). Sorting methods are widely used in several substantive domains, including person perception, clinical symptomatology, and cognition (Rosenberg, 1982). Also of recent interest, on the data-analysis side, are two-mode representations that simultaneously display the structure of the objects, the structure of the attributes, and their associative relations (Rosenberg, Van Mechelen, and De Boeck, 1996). Such data-analysis methods are particularly useful for data matrices with binary entries (an object does/does not possess the attribute) and are appealing in that they do not require a derived proximity measure. A brief overview of typical empirical applications of these structural methods will be given, followed by a more detailed description of one or two empirical applications in personality and social psychology. Rosenberg, S. (1982). The method of sorting in multivariate research with applications selected from cognitive psychology and person perception. In N. Hirschberg & L. Humphreys (Eds.), Multivariate applications in the social sciences (pp. 117-142). Hillsdale, NJ: Erlbaum Associates. Rosenberg, S., Van Mechelen, I., & De Boeck, P. (1996). A hierarchical classes model: Theory and method with applications in psychology and psychopathology. In P. Arabie, L. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 123-155). Teaneck, NJ: World Scientific.

Previous: Participation

Next: Registration

Workshop Index

DIMACS Homepage

Contacting the Center

Document last modified on August 13, 2001.