DIMACS Working Group Meeting on Algorithms for Multidimensional Scaling

August 6-9, 2001
DIMACS Center, Rutgers University, Piscataway

J. Douglas Carroll, Rutgers University, dcarroll@rci.rutgers.edu
Phipps Arabie, Rutgers University, arabie@andromeda.rutgers.edu
Lawrence J. Hubert, University of Illinois, lhubert@s.psych.uiuc.edu
Presented under the auspices of the Special Focus on Data Analysis and Mining.



Linear and Integer Programming Methods for Least Absolute Deviation
Unidimensional and Multidimensional Scaling
Michael Brusco, Florida State University

There has recently been significant interest in the solution of least
absolute deviation (LAD) unidimensional and multidimensional scaling
problems (Brusco, 2001; Chen, 2000; Lau, Leung, and Tse 1998; Simantiraki,
1996).  Both Simantiraki (1996) and Chen (2000) presented mixed-integer
linear programming formulations for LAD unidimensional scaling of symmetric
proximity matrices, which are capable of providing global optimal solutions
for rather small matrices (where the number of objects, n, is 10 or
less).  Unfortunately, neither of these formulations are especially tight
and optimality gaps are frequently quite large.  This results in
consumption of considerable amounts of memory and CPU time, rendering these
formulations computationally infeasible for even medium-sized problems.
There are number of possible strategies for improving the computational
feasibility of integer linear programming methods for LAD scaling
problems.  One possible avenue is to develop new formulations that yield
integer-optimal solutions upon solution of the LP relaxation.  For example,

Ali, Cook, and Kress (1986) present a formulation for a LAD ordering
problem that has a totally unimodular constraint set, enabling solution by
LP or network methods.  The direct extension of their formulation to the
problem studied by Chen (2000) and Simantiraki (1996) is not readily
apparent, but if such an extension were possible it would represent a
significant advancement.  Along the same lines, it might be possible to
model the disjunctive constraints in the Chen (2000) and Simantiraki (1996)
formulations in more efficient manner.  The literature on modeling
disjunctive constraints could prove useful in this regard (Jeroslow, 1989).
A second possible strategy for improving the performance of mixed integer
linear methods is enhanced solution strategies.  Perhaps most importantly,
the generation of strong cuts that rapidly improve lower bounds would be of
significant value.  Some recent experiments have indicated that the most
troublesome problem with current formulations is an extremely slow
improvement of the lower bound.  Strong cuts based on efficient solution of
subproblems appear to have considerable promise, but better methods for
their generation are necessary. The incorporation of information from
efficient heuristic methods can also enhance the solution process by
providing starting solutions as well as branching information.  The
development of methods for good cuts and the effective incorporation of
heuristic solution information will be actively pursued during the workshop.
In the absence of a significant breakthrough in integer linear programming
methods, it seems that heuristics will be necessary for tackling large
unidimensional and multidimensional scaling LAD problems.  Brusco (2001)
recently presented a hybrid solution method for one- and two-dimensional
LAD problems that consisted of two phases.  In the first phase, a simulated
annealing heuristic was used to develop very good starting solutions via a
combinatorial search of lattice points.  In the second phase, these initial
solutions were subsequently supplied to a combinatorial heuristic developed
by Hubert, Arabie, and Hesson-McInnis (1992).  This procedure uses a
pairwise interchange combinatorial search and an efficient simplex method
for l1 norm linear programs (Barrodale and Roberts, 1978, 1980).  The
efficiency and effectiveness of Brusco=92s (2001) heuristic method for large
matrices (say, n > 50) and three or more dimensions remains an open
research question.  However, it seems likely that some refinements will be
necessary and such refinements will be investigated at the workshop.


ALI, I., COOK, W. D., and KRESS, M. (1986), Ordinal Ranking and Intensity
of Preference: A Linear Programming Approach, Management Science, 32,

BARRODALE, I., and ROBERTS, F. D. K. (1978), An Efficient Algorithm for
Discrete l1 Linear Approximation with Linear Constraints, SIAM Journal on
Numerical Analysis, 15, 603-611.

BARRODALE, I., and ROBERTS, F. D. K. (1980), Algorithm 552: Solution of
the Constrained l1 Linear Approximation Problem, ACM Transactions on
Mathematical Software, 6, 231-235.

BRUSCO, M. J. (2001). A Simulated Annealing Heuristic for Unidimensional
and Multidimensional (City-Block) Scaling of Symmetric Proximity Matrices,
Journal of Classification, 18, 3-33.

CHEN, G. (2000). Metric Two-Way Multidimensional Scaling and Circular
Unidimensional Scaling: Mixed Integer Programming Approaches, Doctoral
Dissertation, Department of Management Science and Information Systems,
Rutgers University, Newark, New Jersey.

HUBERT, L., ARABIE, P., and HESSON-MCINNIS, M. (1992), Multidimensional
Scaling in the City-Block Metric: A Combinatorial Approach,=94 Journal of
Classification, 9, 211-236.

JEROSLOW, R. G. (1989). =93Logic-Based Decision Support: Mixed-Integer Model
Formulation,=94 Annals of Discrete Mathematics, 40.

LAU, K., LEUNG, P. L., and TSE, K. (1998), =93A Nonlinear Programming
Approach to Metric Unidimensional Scaling,=94 Journal of Classification, 15,

SIMANTIRAKI, E. (1996), =93Unidimensional Scaling: A Linear Programming
Approach Minimizing Absolute Deviations,=94 Journal of Classification, 13,

2. Triadic distance models: properties and estimation Mark de Rooij Willem J. Heiser Department of Psychology Leiden University A recently developing area within the field of multidimensional scaling is that of triadic distance models. Triadic distance models are suitable for analysing three-way one-mode dissimilarity data, where each element of a triple is treated equally. Different triadic distance models have been proposed in the literature, the two most popular are the perimeter model and the generalized Euclidean distance. We will discuss properties of both these models. Triadic distance models can be fitted to triadic dissimilarity data by minimizing a weighted least squares function. Algorithms for both the perimeter model and the generalized Euclidean model, based on the theory of Iterative Majorization, will be discussed. The notion of triadic distance models can be generalized to three-way three-mode data, where each element of a triple is again treated equally. This generalization parallels the generalization from multidimensional scaling models to multidimensional unfolding models in the two-way case. Thus in the three-way case we obtain a triadic unfolding model. This latter model can also be used to model asymmetry in a triadic dissimilarity table. We will show an algorithm to fit the triadic unfolding model based on the generalized Euclidean Model. Properties of these triadic unfolding models will be shown and some new ideas will be presented. The above mentioned triadic distance and unfolding models can be used to analyse large transition tables, for example obtained from a very long time series by using a moving time window. Even when the resulting transition table is sparse the methods give good results.
3. Proxscal: A General mds program for two-way and three-way Euclidean mds with restrictions on the coordinates Willem J. Heiser & Frank M.T.A. busing Leiden University, The Netherlands heiser@fsw.leidenuniv.nl busing@fsw.leidenuniv.nl PROXSCAL is an extended and generalized version of a previous multidimensional scaling program (SMACOF), based on iterative majorization and alternating least squares algorithms (De Leeuw and Heiser, 1980; a more recent general reference is Heiser, 1995; specific references for PROXSCAL are: Commandeur and Heiser, 1993, and Busing, Commandeur, and Heiser, 1997). The algorithmic strategies used ensure well-behaved convergence behavior, and allow for a wide range of data transformations and models being fitted by weighted least squares. The program has options for several forms of input data and data transformations, including monotone regression and monotone splines. For three-way data, four types of models can be fitted: (1) the Identity model, (2) the Weighted Euclidean or INDSCAL model, (3) the Generalized Euclidean or IDIOSCAL model, and (4) the Reduced Rank model. The positions of the objects in the configuration can be constrained by fixing subsets of their coordinates to be equal to prespecified values, or by estimating some optimal linear combination of prespecified independent variables, which can have missing values, and which can be analyzed with interval, nominal, ordinal or spline transformations. In the three-way case, the constraints are imposed on the common space. There are various alternatives for selecting an initial configuration, and for plotting the results. PROXSCAL is available through the SPSS Categories 10.0 package (Meulman, Heiser, and SPSS Inc., 1999). It offers several improvements upon the scaling procedure available in the SPSS Base system (ALSCAL). These improvements concern primarily the variety of individual differences models and the use of restrictions. Moreover, PROXSCAL aims at minimizing Kruskal's Stress, rather than S-stress. Stress is generally preferred, because it is a measure based on the distances, while S-stress is based on the squared distances. References Busing, F.M.T.A., Commandeur, J.J.F., & Heiser, W.J. (1997). PROXSCAL: A multidimensional scaling program for individual differences scaling with constraints. In W. Bandilla & F. Faulbaum (Eds.), Softstat'97: Advances in Statistical Software, Vol 6 , pp. 67-74. Stuttgart: Lucius & Lucius. Commandeur, J. & Heiser, W.J. (1993). Mathematical Derivations in the Proximity Scaling (PROXSCAL) of Symmetric Data Matrices. Research Report RR-93-04, Department of Data Theory, Leiden University. De Leeuw, J. & Heiser, W.J.(1980), Multidimensional scaling with restrictions on the configuration. In P.R. Krishnaiah (ed.), Multivariate Analysis, Vol. V, pp. 501-522. Amsterdam: North-Holland. Heiser, W.J.(1995), Convergent computation by iterative majorization: Theory and applications in multidimensional data analysis. In W.J. Krzanowski (Ed), Recent Advances in Descriptive Multivariate Analysis, pp. 157-189. Oxford: Oxford University Press. Meulman, J.J., Heiser, W.J., & SPSS Inc. (1999), SPSS Categories 10.0. Chicago, Il.: SPSS.
4. THE IM ALGORITHM FOR SPARSE MDS DATA: SOME INITIAL EXPERIENCES Willem J. Heiser & Frank M.T.A. Busing Leiden University, The Netherlands heiser@fsw.leidenuniv.nl busing@fsw.leidenuniv.nl One obvious approach for efficient multidimensional scaling of massive datasets could be discarding part of the dissimilarity data alltogether. There is room for this strategy since in complete data, the amount of overdetermination of the coordinates in low dimensionality grows rapidly with n, the number of objects to be scaled. In their pioneering study on incomplete designs for nonmetric multidimensional scaling, Spence and Domoney (1974) used Monte Carlo methods to show that both random deletion designs and cyclic designs perform quite well in terms of recovery. They found that in conditions with moderately low error about 60 percent of the dissimilarities could be deleted. Overlapping clique designs did not perform well. Spence (1982) indicated that the size of the fraction to be retained could be as low as 6p/(n-1), where p is the dimensionality of the solution. Graef and Spence (1979) studied which characteristics of the input data are essential for successful reconstruction, and found that large distances are crucially important; if the large distances are lost, recovery deteriorates dramatically. We discuss the algorithmic implications of incomplete data in terms of sparse matrix operations. The basic least squares MDS algorithm based on Iterative Majorization (IM) involves repeated matrix multiplication with Guttman's (1964) "correction matrix", which varies from iteration to iteration, alternated with repeatedly solving a linear system of equations with a fixed matrix of coefficients (cf. De Leeuw and Heiser, 1977). In principle, for both operations suitable sparse methods are available, but there is a technical complication because the system of equations is not of full rank, and the usual method for resolving this issue destroys the sparseness of the system. We are currently half-way in constructing an IM algorithm that profits from the sparseness in the data, and is still guaranteed to converge to at least a local minimum. In a small Monte Carlo experiment we studied our provisional sparse MDS program, using five designs: k cycles, k chains, k random bands, k bands based on the minimum spanning tree, and k bands based on the maximum spanning tree. Here k had levels k =3D 1, 2, 4, 8, 16, 32. Other factors manipulated were n =3D 32, 64, 128, 256, 512, and error level equal to 0%, 20%, 40%. It turns out that generally recovery is surprisingly good, but time required to reach the solution is disappointingly long, due to a strongly increased number of iterations required for reaching the usual convergence criterion. Key words: incomplete designs, sparse matrices, cyclic designs, minimum spanning tree. References Graef, J., & Spence, I. (1979). Using distance information in the design of large multidimensional scaling experiments. Psychological Bulletin, 86. 60-66. De Leeuw, J. & Heiser, W.J. (1977). Convergence of correction matrix algorithms for multidimensional Scaling. In J.C. Lingoes et al (Eds.), Geometric Representations of Relational Data, pp. 735-752. Mathesis Press, Ann. Arbor, MI. Guttman, L. (1968). A general nonmetric technique for finding the smallest coordinate space for a configuration of points. Psychometrika, 33, 469-504. Spence, I., & Domoney, D.W. (1974). Single subject incomplete designs for nonmetric multidimensional scaling. Psychometrika, 39, 469-490. Spence, I. (1982). Incomplete experimental designs for multidimensional scaling. In R.G. Golledge & J.N. Rayer (Eds.), Proximity and Preference: Problems in the Multidimensional Analysis of Large Data Sets, pp. 29-46. Minneapolis, MN: University of Minnesota Press.
5. Kernel methods for nonlinear multivariate analysis Yoshio Takane and Yuriko Oshima-Takane McGill University Kernel methods, originally developed in the context of support vector machines, can be used much more widely for nonlinear multivariate analysis. In this paper we explicate the basic idea, and demonstrate the usefulness, of kernel techniques in various multivariate analysis techniques including regression, discrimination, redundancy analysis, PCA, MDS and canonical correlation analysis. In the kernel methods input variables are first nonlinearly mapped into high dimensional feature spaces, to which linear methods are applied. Using a kernel "trick", this is done without directly dealing with the high dimensional feature spaces.
6. Jos M.F. Ten Berge University of Groningen, The Netherlands Some recent results of Three-way Algebra in the context of Tucker Three-Way PCA, CANDECOMP/PARAFAC (INDSCAL) and hybrid decompositions in between. 1. Real-valued three-way arrays can be decomposed in the least-squares sense by Tucker three-way PCA (Tucker-3). This yields component matrices A, B, C, (e.g., for individuals, traits, and occasions, respectively), and a three-way core array which contains weights for the joint impact of each triple of components from A, B, and C. Tucker-3 solutions are rotationally indeterminate. Question 1: To what extent can cores be transformed to extreme simplicity (a vast majority of core elements being zero), and what is the maximum simplicity (maximum number of zeros) that can be attained? 2. CANDECOMP/PARAFAC (CP), the preferred approach for the least-squares version of INDSCAL, is a constrained Tucker-3 analysis, the core array being a (constant) tridiagonal identity array. Whereas a Tucker-3 solution is rotationally indeterminate, CP solutions are unique (under relatively weak conditions). Question 2: Are the sufficient conditions for CP uniqueness also necessary? 3. The rank of a three-way array is the smallest number of components that yields a perfect CP fit. The maximal rank an array of any given size can have does not coincide with the typical rank of that array, the rank that arises with non-zero probability. Question 3: What results are available on the typical rank of three-way arrays? 4. When elements of a Tucker-3 core array are constrained to be zero, hybrid methods in between Tucker-3 and CP arise, that sometimes also have uniqueness, just like CP. Question 4: Which constraints on cores are inactive in view of the rotational indeterminacy? How do we tell tautologies from models? Partial answers, based on three-way algebra, will be given. These answers are closely related. In particular, some answers to Question 4 follow at once from results on maximal simplicity, and on typical rank.

Previous: Participation
Next: Registration
Workshop Index
DIMACS Homepage
Contacting the Center

Document last modified on July 30, 2001.