DIMACS Workshop on Algorithms for Multidimensional Scaling

DIMACS Working Group Meeting on Algorithms for Multidimensional Scaling

August 6-9, 2001
DIMACS Center, Rutgers University, Piscataway

Organizers:: J. Douglas Carroll, Rutgers University, dcarroll@rci.rutgers.edu; Phipps Arabie, Rutgers University, arabie@andromeda.rutgers.edu; Lawrence J. Hubert, University of Illinois, lhubert@s.psych.uiuc.edu
Presented under the auspices of the Special Focus on Data Analysis and Mining.
Abstracts:


1.

Linear and Integer Programming Methods for Least Absolute Deviation
Unidimensional and Multidimensional Scaling
Michael Brusco, Florida State University

There has recently been significant interest in the solution of least
absolute deviation (LAD) unidimensional and multidimensional scaling
problems (Brusco, 2001; Chen, 2000; Lau, Leung, and Tse 1998; Simantiraki,
1996).  Both Simantiraki (1996) and Chen (2000) presented mixed-integer
linear programming formulations for LAD unidimensional scaling of symmetric
proximity matrices, which are capable of providing global optimal solutions
for rather small matrices (where the number of objects, n, is 10 or
less).  Unfortunately, neither of these formulations are especially tight
and optimality gaps are frequently quite large.  This results in
consumption of considerable amounts of memory and CPU time, rendering these
formulations computationally infeasible for even medium-sized problems.
There are number of possible strategies for improving the computational
feasibility of integer linear programming methods for LAD scaling
problems.  One possible avenue is to develop new formulations that yield
integer-optimal solutions upon solution of the LP relaxation.  For example,

Ali, Cook, and Kress (1986) present a formulation for a LAD ordering
problem that has a totally unimodular constraint set, enabling solution by
LP or network methods.  The direct extension of their formulation to the
problem studied by Chen (2000) and Simantiraki (1996) is not readily
apparent, but if such an extension were possible it would represent a
significant advancement.  Along the same lines, it might be possible to
model the disjunctive constraints in the Chen (2000) and Simantiraki (1996)
formulations in more efficient manner.  The literature on modeling
disjunctive constraints could prove useful in this regard (Jeroslow, 1989).
A second possible strategy for improving the performance of mixed integer
linear methods is enhanced solution strategies.  Perhaps most importantly,
the generation of strong cuts that rapidly improve lower bounds would be of
significant value.  Some recent experiments have indicated that the most
troublesome problem with current formulations is an extremely slow
improvement of the lower bound.  Strong cuts based on efficient solution of
subproblems appear to have considerable promise, but better methods for
their generation are necessary. The incorporation of information from
efficient heuristic methods can also enhance the solution process by
providing starting solutions as well as branching information.  The
development of methods for good cuts and the effective incorporation of
heuristic solution information will be actively pursued during the workshop.
In the absence of a significant breakthrough in integer linear programming
methods, it seems that heuristics will be necessary for tackling large
unidimensional and multidimensional scaling LAD problems.  Brusco (2001)
recently presented a hybrid solution method for one- and two-dimensional
LAD problems that consisted of two phases.  In the first phase, a simulated
annealing heuristic was used to develop very good starting solutions via a
combinatorial search of lattice points.  In the second phase, these initial
solutions were subsequently supplied to a combinatorial heuristic developed
by Hubert, Arabie, and Hesson-McInnis (1992).  This procedure uses a
pairwise interchange combinatorial search and an efficient simplex method
for l1 norm linear programs (Barrodale and Roberts, 1978, 1980).  The
efficiency and effectiveness of Brusco=92s (2001) heuristic method for large
matrices (say, n > 50) and three or more dimensions remains an open
research question.  However, it seems likely that some refinements will be
necessary and such refinements will be investigated at the workshop.

References

ALI, I., COOK, W. D., and KRESS, M. (1986), Ordinal Ranking and Intensity
of Preference: A Linear Programming Approach, Management Science, 32,
1642-1647.

BARRODALE, I., and ROBERTS, F. D. K. (1978), An Efficient Algorithm for
Discrete l1 Linear Approximation with Linear Constraints, SIAM Journal on
Numerical Analysis, 15, 603-611.

BARRODALE, I., and ROBERTS, F. D. K. (1980), Algorithm 552: Solution of
the Constrained l1 Linear Approximation Problem, ACM Transactions on
Mathematical Software, 6, 231-235.

BRUSCO, M. J. (2001). A Simulated Annealing Heuristic for Unidimensional
and Multidimensional (City-Block) Scaling of Symmetric Proximity Matrices,
Journal of Classification, 18, 3-33.

CHEN, G. (2000). Metric Two-Way Multidimensional Scaling and Circular
Unidimensional Scaling: Mixed Integer Programming Approaches, Doctoral
Dissertation, Department of Management Science and Information Systems,
Rutgers University, Newark, New Jersey.

HUBERT, L., ARABIE, P., and HESSON-MCINNIS, M. (1992), Multidimensional
Scaling in the City-Block Metric: A Combinatorial Approach,=94 Journal of
Classification, 9, 211-236.

JEROSLOW, R. G. (1989). =93Logic-Based Decision Support: Mixed-Integer Model
Formulation,=94 Annals of Discrete Mathematics, 40.

LAU, K., LEUNG, P. L., and TSE, K. (1998), =93A Nonlinear Programming
Approach to Metric Unidimensional Scaling,=94 Journal of Classification, 15,
3-14.

SIMANTIRAKI, E. (1996), =93Unidimensional Scaling: A Linear Programming
Approach Minimizing Absolute Deviations,=94 Journal of Classification, 13,
19-25.


2.

Triadic distance models: properties and estimation

Mark de Rooij
Willem J. Heiser

Department of Psychology
Leiden University

A recently developing area within the field of multidimensional scaling
is that of triadic distance models. Triadic distance models are suitable
for analysing three-way one-mode dissimilarity data, where each element
of a triple is treated equally. Different triadic distance models have
been proposed in the literature, the two most popular are the perimeter
model and the generalized Euclidean distance. We will discuss properties
of both these models. Triadic distance models can be fitted to triadic
dissimilarity data by minimizing a weighted least squares function.
Algorithms for both the perimeter model and the generalized Euclidean
model, based on the theory of Iterative Majorization, will be discussed.

The notion of triadic distance models can be generalized to three-way
three-mode data, where each element of a triple is again treated
equally. This generalization parallels the generalization from
multidimensional scaling models to multidimensional unfolding models in
the two-way case. Thus in the three-way case we obtain a triadic
unfolding model. This latter model can also be used to model asymmetry
in a triadic dissimilarity table. We will show an algorithm to fit the
triadic unfolding model based on the generalized Euclidean Model.
Properties of these triadic unfolding models will be shown and some new
ideas will be presented.
The above mentioned triadic distance and unfolding models can be used to
analyse large transition tables, for example obtained from a very long
time series by using a moving time window. Even when the resulting
transition table is sparse the methods give good results.


3.

Proxscal:
A General mds program for two-way and three-way Euclidean mds
with restrictions on the coordinates
Willem J. Heiser & Frank M.T.A. busing
Leiden University, The Netherlands
heiser@fsw.leidenuniv.nl
busing@fsw.leidenuniv.nl

PROXSCAL is an extended and generalized version of a previous 
multidimensional scaling program (SMACOF), based on iterative majorization 
and alternating least squares algorithms (De Leeuw and Heiser, 1980; a more 
recent general reference is Heiser, 1995; specific references for PROXSCAL 
are: Commandeur and Heiser, 1993, and Busing, Commandeur, and Heiser, 
1997). The algorithmic strategies used ensure well-behaved convergence 
behavior, and allow for a wide range of data transformations and models 
being fitted by weighted least squares.

The program has options for several forms of input data and data 
transformations, including monotone regression and monotone splines. For 
three-way data, four types of models can be fitted: (1) the Identity model, 
(2) the Weighted Euclidean or INDSCAL model, (3) the Generalized Euclidean 
or IDIOSCAL model, and (4) the Reduced Rank model. The positions of the 
objects in the configuration can be constrained by fixing subsets of their 
coordinates to be equal to prespecified values, or by estimating some 
optimal linear combination of prespecified independent variables, which can 
have missing values, and which can be analyzed with interval, nominal, 
ordinal or spline transformations. In the three-way case, the constraints 
are imposed on the common space. There are various alternatives for 
selecting an initial configuration, and for plotting the results.
PROXSCAL is available through the SPSS Categories 10.0 package (Meulman, 
Heiser, and SPSS Inc., 1999). It offers several improvements upon the 
scaling procedure available in the SPSS Base system (ALSCAL). These 
improvements concern primarily the variety of individual differences models 
and the use of restrictions. Moreover, PROXSCAL aims at minimizing 
Kruskal's Stress, rather than S-stress. Stress is generally preferred, 
because it is a measure based on the distances, while S-stress is based on 
the squared distances.

References

Busing, F.M.T.A., Commandeur, J.J.F., & Heiser, W.J. (1997). PROXSCAL: A 
multidimensional scaling program for individual differences scaling with 
constraints. In W. Bandilla & F. Faulbaum (Eds.), Softstat'97: Advances in 
Statistical Software, Vol 6 , pp. 67-74. Stuttgart: Lucius & Lucius.

Commandeur, J. & Heiser, W.J. (1993). Mathematical Derivations in the 
Proximity Scaling (PROXSCAL) of Symmetric Data Matrices. Research Report 
RR-93-04, Department of Data Theory, Leiden University.

De Leeuw, J. & Heiser, W.J.(1980), Multidimensional scaling with 
restrictions on the configuration. In P.R. Krishnaiah (ed.), Multivariate 
Analysis, Vol. V, pp. 501-522. Amsterdam: North-Holland.

Heiser, W.J.(1995), Convergent computation by iterative majorization: 
Theory and applications in multidimensional data analysis. In W.J. 

Krzanowski (Ed), Recent Advances in Descriptive Multivariate Analysis, pp. 
157-189. Oxford: Oxford University Press.

Meulman, J.J., Heiser, W.J., & SPSS Inc. (1999), SPSS Categories 10.0. 
Chicago, Il.: SPSS.


4.

THE IM ALGORITHM FOR SPARSE MDS DATA:
SOME INITIAL EXPERIENCES
Willem J. Heiser & Frank M.T.A. Busing
Leiden University, The Netherlands
heiser@fsw.leidenuniv.nl
busing@fsw.leidenuniv.nl

One obvious approach for efficient multidimensional scaling of massive
datasets could be discarding part of the dissimilarity data alltogether.
There is room for this strategy since in complete data, the amount of
overdetermination of the coordinates in low dimensionality grows rapidly
with n, the number of objects to be scaled. In their pioneering study on
incomplete designs for nonmetric multidimensional scaling, Spence and
Domoney (1974) used Monte Carlo methods to show that both random deletion
designs and cyclic designs perform quite well in terms of recovery. They
found that in conditions with moderately low error about 60 percent of the
dissimilarities could be deleted. Overlapping clique designs did not
perform well. Spence (1982) indicated that the size of the fraction to be
retained could be as low as 6p/(n-1), where p is the dimensionality of the
solution. Graef and Spence (1979) studied which characteristics of the
input data are essential for successful reconstruction, and found that
large distances are crucially important; if the large distances are lost,
recovery deteriorates dramatically.

We discuss the algorithmic implications of incomplete data in terms of
sparse matrix operations. The basic least squares MDS algorithm based on
Iterative Majorization (IM) involves repeated matrix multiplication with
Guttman's (1964) "correction matrix", which varies from iteration to
iteration, alternated with repeatedly solving a linear system of equations
with a fixed matrix of coefficients (cf. De Leeuw and Heiser, 1977). In
principle, for both operations suitable sparse methods are available, but
there is a technical complication because the system of equations is not of
full rank, and the usual method for resolving this issue destroys the
sparseness of the system. We are currently half-way in constructing an IM
algorithm that profits from the sparseness in the data, and is still
guaranteed to converge to at least a local minimum.

In a small Monte Carlo experiment we studied our provisional sparse MDS
program, using five designs: k cycles, k chains, k random bands, k bands
based on the minimum spanning tree, and k bands based on the maximum
spanning tree. Here k had levels k =3D 1, 2, 4, 8, 16, 32. Other factors
manipulated were n =3D 32, 64, 128, 256, 512, and error level equal to 0%,
20%, 40%. It turns out that generally recovery is surprisingly good, but
time required to reach the solution is disappointingly long, due to a
strongly increased number of iterations required for reaching the usual
convergence criterion.

Key words: incomplete designs, sparse matrices, cyclic designs, minimum
spanning tree.

References

Graef, J., & Spence, I. (1979). Using distance information in the design of
large multidimensional scaling experiments. Psychological Bulletin, 86.
60-66.

De Leeuw, J. & Heiser, W.J. (1977). Convergence of correction matrix
algorithms for multidimensional Scaling. In J.C. Lingoes et al (Eds.),
Geometric Representations of Relational Data, pp. 735-752. Mathesis Press,
Ann. Arbor, MI.

Guttman, L. (1968). A general nonmetric technique for finding the smallest
coordinate space for a configuration of points. Psychometrika, 33, 469-504.

Spence, I., & Domoney, D.W. (1974). Single subject incomplete designs for
nonmetric multidimensional scaling. Psychometrika, 39, 469-490.

Spence, I. (1982). Incomplete experimental designs for multidimensional
scaling. In R.G. Golledge & J.N. Rayer (Eds.), Proximity and Preference:
Problems in the Multidimensional Analysis of Large Data Sets, pp. 29-46.
Minneapolis, MN: University of Minnesota Press.


5.


Kernel methods for nonlinear multivariate analysis
Yoshio Takane and Yuriko Oshima-Takane
McGill University

Kernel methods, originally developed in the context 
of support vector machines, can be used much more 
widely for nonlinear multivariate analysis.  In this
paper we explicate the basic idea, and demonstrate the
usefulness, of kernel techniques in various 
multivariate analysis techniques including regression, 
discrimination, redundancy analysis, PCA, MDS and 
canonical correlation analysis.  In the kernel methods 
input variables are first nonlinearly mapped into high 
dimensional feature spaces, to which linear methods 
are applied.  Using a kernel "trick", this is done 
without directly dealing with the high dimensional
feature spaces.



6.

Jos M.F. Ten Berge
University of Groningen, The Netherlands

Some recent results of Three-way Algebra in the context of Tucker Three-Way
PCA, CANDECOMP/PARAFAC (INDSCAL) and hybrid decompositions in between.

1. Real-valued three-way arrays can be decomposed in the least-squares
sense by Tucker three-way PCA (Tucker-3). This yields component matrices A,
B, C, (e.g., for individuals, traits, and occasions, respectively), and a
three-way core array which contains weights for the joint impact of each
triple of components from A, B, and C. Tucker-3 solutions are rotationally
indeterminate.

Question 1: To what extent can cores be transformed to extreme simplicity
(a vast majority of core elements being zero), and what is the maximum
simplicity (maximum number of zeros) that can be attained?

2. CANDECOMP/PARAFAC (CP), the preferred approach for the
least-squares version of INDSCAL, is a constrained Tucker-3 analysis, the
core array being a (constant) tridiagonal identity array. Whereas a
Tucker-3 solution is rotationally indeterminate, CP solutions are unique
(under relatively weak conditions).

Question 2: Are the sufficient conditions for CP uniqueness also necessary?

3. The rank of a three-way array is the smallest number of components
that yields a perfect CP fit. The maximal rank an array of any given size
can have does not coincide with the typical rank of that array, the rank
that arises with non-zero probability.

Question 3: What results are available on the typical rank of three-way
arrays?

4. When elements of a Tucker-3 core array are constrained to be zero,
hybrid methods in between Tucker-3 and CP arise, that sometimes also have
uniqueness, just like CP.

Question 4: Which constraints on cores are inactive in view of the
rotational indeterminacy? How do we tell tautologies from models?

Partial answers, based on three-way algebra, will be given. These answers
are closely related. In particular, some answers to Question 4 follow at
once from results on maximal simplicity, and on typical rank.
Previous: Participation
Next: Registration
Workshop Index
DIMACS Homepage
Contacting the Center

Document last modified on July 30, 2001.
DIMACS Working Group Meeting on Algorithms for Multidimensional Scaling

August 6-9, 2001 DIMACS Center, Rutgers University, Piscataway

Abstracts:

August 6-9, 2001
DIMACS Center, Rutgers University, Piscataway