Generalized Principal Component Analysis of Continuous and Discrete Variables

by Avner BAR-HEN.

Abstract: This paper studies the problem of Ordination Analysis when both qualitative and quantitative variables are present.

Principal component analysis is mainly applicable to continuous variables while discrete characters can be analyzed with correspondence analysis. In this article we propose a new approach to handle simultaneously continuous and discrete variables.

A similarity index between two variables is defined and formula for various cases are derived. A distance matrix between individuals is constructed and some properties are derived. It allows a unified approach to various ordination techniques (PCA, CA for example).

As always, the observations are, subject to sampling errors, and the results may thus be different from one sample to another. Based on resampling techniques, we derive tools to determine which representation subspaces provide good insurance against instability and thus against wrong conclusions. The proposed techniques also allow to detect influential observations and outliers.

Finally, the results are applied to data drawn from an agro-forestry study.

Key Words: correspondence analysis, distance, generalized canonical analysis, optimal scaling, RV coefficient, similarity index

Author:
Avner Bar-Hen, Avner.Bar-Hen@biomath.u-3mrs.fr

Editor: Ravi Khattree,khattree@oakland.edu

READING THE ARTICLE: You can read the article in portable document (.pdf) format (262288 bytes.)

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 3918 times since July 24, 2006.


Return to the InterStat Home Page.