Testing for Equal Distributions in High Dimension

by Gabor J. Szekely and Maria L. Rizzo.

Abstract: We propose a new nonparametric test for equality of two or more multivariate distributions based on Euclidean distance between sample elements. Several consistent tests for comparing multivariate distributions can be developed from the underlying theoretical results. The test procedure for the multisample problem is developed and applied for testing the composite hypothesis of equal distributions, when distributions are unspecified. The proposed test is universally consistent against all fixed alternatives (not necessarily continuous) with finite second moments. The test is implemented by conditioning on the pooled sample to obtain an approximate permutation test, which is distribution free. Our Monte Carlo power study suggests that the new test may be much more sensitive than tests based on nearest neighbors against several classes of alternatives, and performs particularly well in high dimension. Computational complexity of our test procedure is independent of dimension and number of populations sampled. The test is applied in a high dimensional problem, testing microarray data from cancer samples.

Key Words: homogeneity, two-sample problem, multisample problem, permutation test, e-distance, E-statistics, energy statistics

Gabor J. Szekely, gabors@bgnet.bgsu.edu
Maria L. Rizzo, rizzo@math.ohiou.edu

Editor: Ravinda Khattree,khattree@oakland.edu

READING THE ARTICLE: You can read the article in portable document (.pdf) format (210501 bytes.)

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 3100 times since July 24, 2006.

Return to the InterStat Home Page.