DIMACS Series in
Discrete Mathematics and Theoretical Computer Science

VOLUME Seventy Two
TITLE: "Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications"
EDITORS: Regina Liu, Robert Serfling and Diane Souvaine


Ordering Information

This volume may be obtained from the AMS or through bookstores in your area.

To order through AMS contact the AMS Customer Services Department, P.O. Box 6248, Providence, Rhode Island 02940-6248 USA. For Visa, Mastercard, Discover, and American Express orders call 1-800-321-4AMS.

You may also visit the AMS Bookstore and order directly from there. DIMACS does not distribute or sell these books.



PREFACE


This book is a collection of some of the research work presented in the workshop on "DATA DEPTH: ROBUST MULTIVARIATE STATISTICAL ANALYSIS, COMPUTATIONAL GEOMETRY & APPLICATIONS". The workshop was held from May 14 to 16, 2003, at Rutgers University in New Jersey, and it was sponsored by DIMACS with support from the National Science Foundation. The workshop was co-organized by Regina Liu, Robert Serfling, Diane Souvaine and Yehuda Vardi. There were more than 100 participants from various fields, including: statistics, computer sciences, mathematics and operations research. This workshop brought together not only two DIMACS special foci but two different communities: statisticians and computational geometers. The result was a very exciting interdisciplinary workshop that laid the foundations for further interfaces and collaborations.

1. Goal of the Workshop

Multivariate data analysis plays a role of ever-increasing importance in scientific studies. In current developments of multivariate analysis, a more geometric point of view is being emphasized. Descriptive measures and sample statistics that can capture properly the higher-dimensional features of multivariate data are needed. Several geometric approaches have been proposed recently. Especially promising is the one founded on the concept of data depth. This new concept provides centeroutward orderings of points in Euclidean space of any dimension. It also provides many new perspectives to aspects of probability as well as of computer science. In particular, the development of implementable computing algorithms for depthbased statistics has brought about many new challenges in computational geometry. The extensive development of data depth in recent years has spawned attractive depth-based tools for nonparametric multivariate data analysis, with a wide range of applications. The diversity in approaches, emphases, and concepts, however, makes it necessary to seek unified views and perspectives that would guide the further development of the depth-based approach.

2. Data Depth and Statistics

There has been a flurry of activities in the development of data depth. Different notions of data depth include: the projection of half-space, the counting of random simplices, the peeling of convex hulls, the summing over volumes of random simplices, the summing over absolute distance among random pairs, and others. Each notion of data depth provides a distinctive center-outward ordering of sample points in a multidimensional space. The probabilistic geometry underlying the definition of each data depth produces a particular ordering of the points in the space and makes it especially suitable for certain types of applications. Variousstudies have been carried out to investigate and compare the structural properties of various data depths and their corresponding geometric layout such as contours and central quantiles. Data analysis methodologies based on data depth have also been developed for constructing confidence regions, computing p-values for testing hypotheses, constructing rank tests, and regression, etc. Applications to statistical quality control, aviation safety data analysis, and gene clustering have been demonstrated with success. Many more applications are currently on-going or continue to be discovered. Although research on the subject of data depth has been on-going for more than a decade, many problems are still open. Developing efficient and implementable computer algorithms is also a crucial element of the development of data depth analysis methodology, and it requires special expertise in computer science. The need for collaborations between statisticians and computer scientists in this regard is evident and so was the need for this special workshop to bring together experts from both communities.

3. Computational Efficiency and Algorithms

The computational geometry community has long recognized that there are many important and challenging problems that lie at the interface of geometry and statistics. There are many instances in which geometric techniques have been utilized to give efficient algorithms for important problems in statistics. However, some of these algorithms, while guaranteeing good theoretical asymptotic performance, are difficult to implement in practice and thus are of limited utility for statisticians or general consumers of "scientific computing". There is a clear demand for modifying or creating implementable algorithms with good theoretical performance for statistical analysis of large scientific datasets.

The concept of data depth has been developed by statisticians for nonparametric multivariate data analysis. Among the numerous different notions of data depth, the so-called halfspace depth lends itself nicely to the creation of convex depth contours within the multivariate data set. Computational geometers and statisticians have provided algorithms and code for computing the halfspace depth contours in two dimensions. The running time is O(n2 log n) for a dataset of size n. The theoretical result that requires only O(n log5 n) does not lend itself easily to implementation. Recent improvements have used either the duality and topological sweep of an arrangement of lines in the dual or some randomized versions. Although deepest points have been widely studied in the computational geometry literature, the best known time for algorithms for center points in high dimensions is still of O(nd+1). So far most computational algorithms are primarily centered around the halfspace depth in lower dimensions. Efficient computational algorithms are needed for other existing notions of depth, especially for higher dimensional settings. The main impediment to the widespread use of depth-induced data analysis is the computational bottleneck. Exploiting geometry to create implementable algorithms, both exact and approximate, would be extremely valuable, as evidenced by several of the papers in this volume.

4. Acknowledgements

This workshop was held under the auspices of the DIMACS Special Focus on Data Analysis and Mining and the DIMACS Special Focus on Computational Geometry and Applications, with support from NSF DMS 02-05136 and NSA MDA 04-02-0101. Many people in DIMACS and the Department of Statistics of Rutgers University had worked hard to help make this workshop a reality. In particular, we would like to thank the Department of Statistics of Rutgers University and Ms. Pat Wolf from the Department of Statistics, Rutgers University, for her enduring patience and efficiency in handling all the details from abstracts to those papers in this volume.


TABLE OF CONTENTS



Foreword                                                 ix

Preface                                                  xi

Depth functions in nonparametric multivariate inference   1
    R. Serfling

Rank tests for multivariate scale difference based on
  data depth                                             17
    R.Y. Liu and K. Singh

On scale curves for nonparametric description of
  dispersion                                             37
    J. Wang and R. Serfling

Data analysis and classification with the zonoid depth   49
    K. Mosler and R. Hoberg

On some parametric, nonparametric and semiparametric
  discrimination rules                                   61
    A. Hartikainen and H. Oja

Regression depth and support vector machine              71
    A. Christmann

Spherical data depth and a multivariate median           87
    R.T. Elmore, T.P. Hettmansperger, and F. Xuan

Depth-based classification for functional data          103
    S. Lopez-Pintado and J. Romo

Impartial trimmed means for functional data             121
    J.A. Cuesta-Albertos and R. Fraiman

Geometric measures of data depth                        147
    G. Aloupis

Computation of half-space depth using simulated
  annealing                                             159
    B. Chakraborty and P. Chaudhuri

Primal-dual algorithms for data depth                   171
    D. Bremner, K. Fukuda, and V. Rosta

Simplicial depth: An improved definition, analysis,
  and efficiency for the finite sample case             195
    M.A. Burr, E. Rafalin, and D.L. Souvaine

Fast algorithms for frames and point depth              211
    J.H. Dul

Statistical data depth and the graphics hardware        223
    S. Krishnan, N.H. Mustafa, and
      S. Venkatasubramanian

Index Index of Volumes
DIMACS Homepage
Contacting the Center
Document last modified on December 12, 2006.