Classification of Insolvent Small Businesses in Egypt by Some Running Cost Variables: A Decision Tree Approach

by Amr I. Abdelrahman and Dina H. Abdel-Hady.

Abstract: Discriminant analysis and logistic regression; shares a common model which is "the general linear model". These two statistical classification approaches tend to concentrate on the parameter values and their significance level as a guide to the adequacy of the model. Data mining tools have been used for classification and prediction of group membership. Data mining techniques such as neural networks, genetic algorithm, CART, CHAID, Exhaustive CHAID, and QUEST are data-driven rather than model-driven. The study applies Exhaustive CHAID and CART decision tree methods to Small Industrial Businesses data to discover any latent relationship between the financial status of ISB (solvent vs. Insolvent) and some running cost obligations, that include cost of marketing, transportation, raw material, social security and Insurance, and wages. Using "age of Business" as covariate, MANCOVA and ANOVA reveal no significance differences between the two classified groups, and the assumptions for linear discriminant and logistic regression were not satisfied. The Exhaustive CHAID and CART decision trees were applied for the classification of "Insolvent" ISBs with equal priors to discover any unobvious and hidden relationships between the financial status of ISBs and the predictors as categorized by the decision tree algorithms. SPSS software produces different rules to classify each ISB as Solvent or Insolvent, they produce also different classifiers. The Gini measure is used as a splitting criterion for classifiers. Applying Exhaustive CHAID, the Gini measures select the predictors in the following order: Marketing cost, Social security cost, and transportation cost. The splits are all significant at ?=5%. When applying CART, the Gini measures bi-split the predictors in the order of: raw Material cost, Marketing cost, Transportation cost, Social security cost, Annual Taxes and monthly wages. Misclassification rate is approximately equal for the two methods ( 35.2% using Exhaustive CHAID and 33.5% for CART). It is recommended to use the two growing methods to a large data set

Key Words: Logistic Regression; Linear Discriminant Analysis; Decision trees; CART; CHAID; Exhaustive CHAID; QUEST; Genetic Algorithm; Neural Networks; Classification; Gini Measure; Re-substitution Index

Authors:
Amr I. Abdelrahman, , amrelatraby@yahoo.com
dina1002007@yahoo.com

Editor: : Ahmed Youssef,ahyoussef@hotmail.com

READING THE ARTICLE: You can read the article in portable document (.pdf) format (265442 bytes.)

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 1628 times since JULY 2, 2010.


Return to the InterStat Home Page.