An Economical Sample Size Determination Algorithm for Clinical Data Statistical Analysis

by Hassan Assareh,Mary Waterhouse, Ian Smith, Russell Brighouse, Kelley Foster, and KerrieMengersen.

Abstract: For most data analysis problems, sample size formulae are constructed by focusing on statisticalcharacteristics rather than economical constraints. When performing a complicatedstatistical analysis involving clinical data, such as risk model construction, choosing a samplesize which simultaneously satisfies statistical (accuracy and precision) and economical (cost of data inspection and error modification) requirements are non-trivial. This research presents ageneral data capturing algorithm which addresses this issue. It uses Value of Information theoryfrom a Bayesian decision making context and the concept of Utility. We propose a customizedversion of the algorithm to determine an appropriate sample size for risk model constructionusing logistic regression and then apply it for calibration of the Acute Physiology and ChronicHealth Evaluation II (APACHE II), a severity disease scoring system of intensive care units, for various utility scenarios. We also outline extensionswhich could be made to the framework and techniques.

Key Words: Bayesian Statistics, Data Quality, Logistic Regression, Modification Cost, Optimization, Risk Model, Sample Size, Utility, Value of Information

Hassan Assareh,
Mary Waterhouse,
Ian Smith,
Russell Brighouse,
Kelley Foster,
Kerrie Mengersen,

Editor: James Knaub,

READING THE ARTICLE: You can read the article in portable document (.pdf) format (375506 bytes.)

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 1252 times since JUNE 18, 2013.

Return to the InterStat Home Page.