Quasi-Cutoff Sampling and Simple Small Area Estimation with Nonresponse

by James R. Knaub, Jr.

Abstract: Here, small area estimation is applied in the sense that we are “borrowing strength” from data outside of given subpopulations for which we are to publish estimated totals, or means, or ratios of totals. We will consider estimated totals for establishment surveys. A subpopulation for which we wish to estimate a total will be called a “publication group” (PG), and data that may be modeled together, using one regression, will be called an “estimation group” (EG). See Knaub(1999, 2001, 2003) regarding this for a more complex application. When a PG consists of a set of EGs, that is stratification. When an EG contains PGs, this is a simple form of small area estimation because we are using data outside of a given publication group to help estimate statistics/parameters for that model, used to estimate for each impacted PG. (In Knaub(1999, 2001), there are overlapping ‘areas’ as well.) Here we consider very small areas (PGs), which may fall within a ‘larger’ EG, and here we are only considering one regressor, but this could be generalized. Sample sizes and population sizes considered in this paper can be very small within a given PG, say a State and economic end-use sector. In the case of n = N = 1, a single response is the total for that PG. If it is part of an EG with other data, then if there is a nonresponse in that case, an estimate in place of that observation may be obtained for contribution, for example, to a US-level aggregate number for that end-use sector, and a variance contribution to be added to the US-level variance would be found as well. Further, a scatterplot for such an estimation group, especially if a confidence band were constructed (Knaub(2009), section 4, and Knaub(2012b), Figure 1) could be used to help edit data. If that PG with n = N = 1 were looked at alone, one could not have a scatterplot that would determine if a response were reasonable for the current circumstances. (A forecast for that one point would not be as good if some event were to cause a break in the time series, and one would have to consider a time series for every single point, many more graphs, and for some there would be no series available. But a scatterplot to accompany this regression modeling would consider every point used in the model. Data for which there are no regressor data, such as “births,” are “added on” to totals outside of modeling.) Techniques here may be used for estimation (“prediction”) for sample surveys, and to impute for nonresponse for sample surveys and census surveys. There may be applications to other fields of statistics as well.

Key Words: Regression, Model-Based Estimation, Weighted Least Squares, Scatterplots, Small Area Estimation, Data Editing, Establishment Surveys, Seasonality, Borrowing Strength

James R. Knaub, Jr., jamesRknaub@gmail.com

Editor: Richard G. Graf, rgraf@sunstroke.sdsu.edu

READING THE ARTICLE: You can read the article in portable document (.pdf) format (451790 bytes.)
If you have any comments or for further discussion, contact an author.

NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication.

This page has been accessed 824 times since MAY 05, 2014.

Return to the InterStat Home Page.