Modeling Count data with high proportion of zeros ( M.L.E calculation using Simulated Annealing)

by Santanu Dutta.

Abstract: The data sets ,considered in this paper, are non-negative, integer valued, generally contain small number of distinct values with high proportion of zeros. To model such frequency count data we have considered a wide variety of discrete distributions as possible models . For most of these discrete distributions considered in this paper, the maximum likelihood estimates cannot be obtained analytically(viz. by directly solving the maximum likelihood equations). We suggest to look at maximum likelihood estimation as a Non-Linear Optimization problem.So to calculate the Maximum Likelihood estimates of model parameters we have used a stochastic optimization algorithm named 'Simulated Annealing'. For 'model selection' we have used both A.I.C and B. I.C which are penalized log likelihood based model selection rules . So the use of Simulated Annealing for maximizing the likelihood or log- likelihood also aids model selection ,using A.I.C or B. I.C , and it enables us to experiment with a large number of discrete probability models for each data set. Parametric Bootstrapping has been used to approximate the variances ,co-variances and means of the sampling distribution of the maximum likelihood estimates and also to set confidence limits ,wherever necessary. The necessity and motivation behind using Simulated Annealing and Parametric Bootstrapping is clearly stated in 'Introduction' and 'Section1'. We have considered four real life data sets and a simulated experiment in 'Section2' and found Negative Binomial, Geometric, Zero Inflated Logarithmic , Zero Inflated Poisson and Generalized Poisson distributions to be very useful for modeling frequency count data with high proportion of zeros and exhibiting over dispersion. The detailed findings and comparisons are reported in conclusion.

Key Words: count data with high proportion of zeros, Simulated Annealing , Parametric bootstrapping , maximum likelihood, model selection , A.I.C, B.I.C

Author:
Santanu Dutta, sdutta@tezu.ernet.in

Editor: Debasis Kundu,kundu@iitk.ac.in

READING THE ARTICLE: You can read the article in portable document (.pdf) format (81759 bytes.)

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 2663 times since July 24, 2006.


Return to the InterStat Home Page.