Testing for Equal Distributions in High Dimension
by Gabor J. Szekely and Maria L. Rizzo.
Abstract:
We propose a new nonparametric test for equality of two
or more multivariate distributions based on Euclidean distance
between sample elements. Several consistent tests for comparing
multivariate distributions can be developed from the underlying
theoretical results. The test procedure for the multisample
problem is developed and applied for testing the composite
hypothesis of equal distributions, when distributions are
unspecified. The proposed test is universally consistent against
all fixed alternatives (not necessarily continuous) with finite
second moments. The test is implemented by conditioning on the
pooled sample to obtain an approximate permutation test, which is
distribution free. Our Monte Carlo power study suggests that the
new test may be much more sensitive than tests based on nearest
neighbors against several classes of alternatives, and performs
particularly well in high dimension. Computational complexity of
our test procedure is independent of dimension and number of
populations sampled. The test is applied in a high dimensional
problem, testing microarray data from cancer samples.
Key Words:
homogeneity, two-sample problem,
multisample problem, permutation test, e-distance, E-statistics,
energy statistics
Authors:
Gabor J. Szekely, gabors@bgnet.bgsu.edu
Maria L. Rizzo, rizzo@math.ohiou.edu
Editor:
Ravinda Khattree,khattree@oakland.edu
READING THE ARTICLE: You can read the article in
portable document (.pdf) format (210501 bytes.)
NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.
This page has been accessed 314 times since July 24, 2006.
Return to the
Home Page.