# Sampling Error and Variance Estimation

## SAMPLE DESIGN

IPUMS-International is a collection of sample microdata based on subsets of full population data from countries around the world. Population estimates calculated from IPUMS data are subject to sampling error. Whenever possible, researchers who need accurate summary population totals should rely on official published tables from the country of origin. The main purpose of the census microdata in IPUMS is for the calculation of multivariate models and specialized statistics that are not available in summary tables.

The microdata samples in IPUMS-International employ a variety of sample designs. All IPUMS samples contain individual level data, most are clustered by household, many are stratified, and some are differentially weighted. The IPUMS samples are either systematically drawn from full count data by IPUMS (or according to IPUMS specifications) or they are drawn by the statistical offices of the country of origin according to a variety of complex sample designs. Where possible, IPUMS-International provides 10% samples of census data by selecting every 10th household after a random start. Samples drawn by countries of origin employ a variety of complex sampling techniques that may include oversampling, clustering and stratification.

All of these sample designs have potential effects on multivariate standard error calculation, and all data should be weighted to ensure representative estimates. This summary describes how sample design affects sample precision, estimates the resulting differences in standard errors across variables and IPUMS samples, and discusses strategies for obtaining unbiased and efficient estimates of statistical significance. For further discussion of the material presented here, see the IPUMS-International working paper by **Cleveland et al.** The **Sample Design Summary** page provides a quick reference to sample design characteristics that affect standard error estimation. The **IPUMS Sample Descriptions** page provides detailed information about the designs of the microdata samples. We have also calculated summary estimates suggestive of the quality of **age reporting** in the microdata samples.

## SAMPLING ERROR

The sample methodology of IPUMS-International samples has the potential to significantly affect the precision of sample estimates, particularly in samples that employ complex sampling designs. Estimates derived from any sample are subject to sampling variability, which is usually measured as the standard error. The standard error of a sample statistic estimates the variation of that statistic across many similar samples drawn from the same population. Approximately two-thirds of random samples will produce estimates within one standard error of the full population, and approximately 95 percent of samples produce estimates within two standard errors. Standard errors depend on both sample size and sample design.

### CLUSTERING AND DIFFERENTIAL WEIGHTING

Nearly all the IPUMS samples are cluster samples: they are samples of households rather than individuals. Individuals are sampled as parts of households because many important topics of analysis, such as fertility, household composition, and nuptiality, require information about multiple individuals within the same household. In addition, some countries drew samples from select regions or sub-regions within the country, thereby adding a further layer of geographic clustering. Sometimes, the sample designs use differential probabilities of selection resulting in heterogeneity in sample weights. Clustering and weighting violate the assumption of independent observations and can produce exaggerated estimates of statistical significance.

Consider the case of clustering by households. Some individual characteristics, such as religion or ethnicity, are highly correlated within households. Suppose we wish to estimate the standard error for the proportion of the population that reports Buddhism as their religion. If one household member reports Buddhist religion, the odds are high that other household members are also Buddhist. If we had the sort of sample generally assumed by statistics textbooks - an independent random sample of all individuals in the population - the standard error for Buddhism would be inversely proportional to the square root of the number of individuals in the sample. But because the samples are cluster samples and Buddhism is highly correlated within clusters, the usual method for calculating standard errors would overestimate sample precision.

Standard errors in cluster samples depend on both the number of clusters sampled and on the homogeneity of variables within clusters. Calculation of standard errors for cluster samples is complicated. In the worst case, with perfect homogeneity within clusters, the standard errors for variables would be inversely proportional to the square root of the number of clusters rather than the number of individuals. For variables that are heterogeneous within clusters, such as age and sex, clustering may have little or no effect on sample precision.

The impact of clustering therefore varies from variable to variable. It also varies from census year to census year and from country to country. The homogeneity of particular characteristics within clusters can change over time. For example, as ethnic intermarriage increases, ethnicity within households becomes less homogeneous. The larger the average size of clusters, the smaller the number of independent observations. Household size has fallen dramatically in many countries over the past century as fertility has declined, extended families have become less common, and more persons live alone. Thus, in many countries, the samples for recent years have smaller clusters, on average, than those for earlier years.

Cluster size is also influenced by the size and treatment of group quarters. To maximize sample precision, large dwelling units are sampled at the individual level in all census samples. Instead of treating a prison with 1,000 inmates as a single sample unit, individuals are sampled as if they are 1,000 one-person households, which multiplies the number of independent observations for persons in large units. The rule for designating units to be sampled on an individual basis varies among the samples, but the threshold for splitting a household into individual units is typically between 30 and 50 persons.

### STRATIFICATION

All of the IPUMS-International samples are implicitly or explicitly stratified. That is, they divide the population into strata based on key characteristics, and then sample separately from each stratum. This ensures that each stratum is proportionately represented in the final sample.

Stratification has the opposite effect of clustering; it increases the precision of sample estimates. It does so not only for those characteristics that are explicitly stratified, but also for any other characteristics that are correlated with them. In some cases, the positive effects of stratification outweigh the adverse effects of clustering, so the IPUMS sample designs can actually yield smaller standard errors than would be obtained through a simple random sample of similar size.

Nearly all of the samples in IPUMS-International are implicitly stratified in some way. In some cases, the IPUMS samples incorporate explicit stratification by such factors as household size, geographic location, household or even individual characteristics. More often, however, implicit geographic stratification occurs as an indirect byproduct of the sample design. Most IPUMS-International samples are systematic samples, typically drawn by selecting every nth household in the source file after designating a random starting point. The data are usually collected through direct enumeration, a procedure by which census enumerators travel from block to block and from village to village within a specified geographic area, recording census information in a roughly systematic order. Even where absolute sequential order is not preserved prior to sampling, census data are often sorted according to small geographic areas so that systematic sampling is equivalent to low-level geographic stratification. The IPUMS-International samples contain a more even geographical distribution of households than would be expected from a true random sample. This stratification directly improves the precision of geographic variables such as region and urban residence and indirectly improves the precision of variables highly correlated with geography, like race, ethnicity, education, household utilities, and dwelling characteristics.

### ESTIMATING SAMPLING ERRORS

One consequence of the use of complex sample design is that sampling errors of estimates cannot be computed using a typical textbook formula. The basic formula for computing the variance of an estimate is based on the assumption that data were sampled using simple random sampling with replacement (unrestricted sampling). Kish (1965) used the term "design effect" to refer to the ratio of the variance of an estimate from a complex design to the variance of an estimate from an unrestricted (SRS) sample of the same size. The square root of the design effect, the "design factor," indicates the degree to which the simple random sample estimate of standard error differs from an estimate that explicitly incorporates information about the sample design.

To illustrate the types of variables most significantly influenced by sample design, Table I shows estimated design factors for selected variables in several IPUMS samples. The design factors were generated by dividing the full count data set into 100 subsample replicates that mimic the sample design of the IPUMS-International public use sample, calculating the standard deviation of the expected value of each variable across the 100 subsamples, and dividing the result by the standard error from the publicly released IPUMS 10% sample that statistical theory predicts for a simple random sample of the same size.

Variable | Ghana 2000 |
Bolivia 2001 |
Mongolia 2000 |
Rwanda 2002 |
---|---|---|---|---|

Household Level Variables | ||||

Number of persons | 1.0 | 0.9 | 0.9 | 0.8 |

Number of non-relatives | 1.0 | 0.9 | 0.9 | 0.9 |

Electricity | 0.5 | 0.6 | 0.5 | 0.7 |

Toilet | 0.6 | 0.7 | 0.6 | 0.8 |

Phone | NA | 0.8 | 0.8 | 0.9 |

Kitchen | 0.8 | 1.0 | 0.6 | NA |

Bath | 0.7 | NA | 0.6 | NA |

Floor Material | NA | 0.6 | NA | 0.6 |

Ownership | NA | 0.9 | NA | 0.7 |

Radio | NA | 0.9 | NA | NA |

Individual Level Variables | ||||

Age | 1.0 | 1.1 | 0.9 | 1.0 |

Sex | 1.0 | 1.0 | 0.9 | 0.9 |

Marital Status | 0.9 | 1.1 | 1.2 | 1.0 |

Literacy | 1.2 | 1.2 | 1.0 | 1.0 |

Employment Status | 1.1 | 1.0 | 1.0 | 0.8 |

Ethnicity and Religion Variables (Individual Level) | ||||

Ethnicity Ghana | Akan 1.9 Mole 2.1 |
|||

Ethnicity Bolivia | Quechua 1.2 Aymara 1.2 |
|||

Ethnicity Mongolia | Kalick 1.5 Kazak 1.2 |
|||

Religion Rwanda | Catholic 2.0 Protestant 2.1 |

A design factor of 1.0 means that the effects of stratification and clustering on sample precision are either negligible or cancel one another out. If the design factor is 1.0, a standard statistical output that uses simple random sample assumptions would produce reliable significance statistics. A design factor larger than 1.0 means that the empirically observed standard errors are greater than what a standard formula assuming simple random sample calculations would produce. Statistical procedures that do not account for sample design would overestimate statistical significance in such cases. Conversely, a design factor less than 1.0 means that the sample is more precise than would be predicted by standard statistical tests and resulting significance tests would be conservative.

Only a few variables have design factors that regularly exceed 1.0 by a wide margin. The most dramatic are race, religion, or ethnicity variables at the individual level. This reflects the influence of clustering, since households are frequently extremely homogeneous with respect to race, ethnicity and religion. Birthplace, language and citizenship status also tend to be homogeneous within clusters and often have relatively high design factors. Of course, any household characteristics attributed to all individuals within a household will be perfectly clustered by household. Researchers should be aware of this effect and make adjustments to their analysis if it involves variables highly subject to household clustering.

The design factors for household level variables are calculated from household, rather than individual, records. They are frequently lower than 1.0 because they reflect the influence of stratification, in the absence of clustering. As described above, most IPUMS-International samples are systematic geographically representative samples. Geographic stratification tends to decrease empirically observed standard errors in variables that are also geographically sorted, like household utilities or dwelling characteristics. For most applications using very large datasets these design effects can safely be ignored, because they only yield conservative estimates of significance. It may be helpful, however, to correct for overestimated standard errors in analyses of small populations or when analyzing data from complex samples. We suggest methods for doing so below.

Although illustrative, the design factors presented in Table I are inappropriate for adjusting standard errors in complex multivariate analyses or for subpopulation estimates. They are only valid for analyses of all individuals in the given country as a whole, and results for any population subgroup could differ significantly. In addition, many of the variables presented in Table I have been coded as dichotomous variables reflecting whether or not a person or household reported the characteristic noted in the table. Design factors, however, can differ across categories within a given variable. For example, the categories "head (reference person)" and "spouse" from the relationship variable have low design factors because there is no potential for homogeneity within households in non-polygamous countries. The design factor for "child" from the same variable is higher because households with children often have more than one child.

Users interested in further information about a variety of standard error estimation methods should see the **IPUMS User Note**. The note was originally written to explain approaches to standard error estimation for the IPUMS-USA data project, but the methods are relevant to all census microdata sample design issues and to complex samples more broadly. The note also discusses the weakness of the design factor approach to standard error estimation in greater detail. In short, design factors are not useful as corrective multipliers in complex data analysis, especially given the computing options available in current statistical software packages. Below, we discuss options available to users who wish to adjust their estimation for complex sample design.

## SAMPLE DESIGN ESTIMATION ADJUSTMENTS

The IPUMS samples are large, and for the great majority of studies there is little risk of drawing invalid inferences because of underestimated variance. For studies of weak relationships or small population subgroups, however, there can be risk of misleading estimates of statistical significance. This section provides guidance for minimizing such risk.

Underestimates of sample variance can arise from clustering and weighting. As shown in Table 1, most IPUMS-International data are systematic unweighted samples with no geographic clustering. For such samples, there is limited potential for underestimated variance which might lead to invalid statistical inferences. The only concern in these instances is clustering by households. Most census research has minimal household clustering because it focuses on particular subpopulations that rarely cluster in households. For example, studies of fertility focus on women of childbearing age, and households typically only have one such woman. The clustering concern for the systematic unweighted samples can arise with studies of children, since households often include multiple children. When doing analyses of children and other groups likely to appear multiple times in the same household, researchers can adopt strategies to eliminate the redundant cases. Instead of assessing the characteristics of all children, for example, one can look at eldest children, or youngest children, or children of a particular age, or a randomly selected child from each household.

An alternative, thanks to improvements in the analytical power of modern statistical software, is to incorporate information about sample design into estimation procedures. All major statistical software programs, including SAS, Stata, SPSS, and R, now allow researchers to specify basic elements of complex sample design. These programs make use of Taylor Series linearization to adjust variance estimates and tests of statistical significance. IPUMS users can specify the household identifier (SERIAL) as the cluster variable (or primary sampling unit) for any analysis that might be influenced by household clustering, and can also specify the weight variable (PERWT) to account for the effects of heterogeneous sample weights. The IPUMS staff is developing a cluster variable that will offer the potential for more refined variance estimates. The new variable will identify geographic clustering as well as household clustering.

As of September 2015, we have added a new variable to aid in accounting for the effects of stratification on sample variance. As discussed above, stratification improves the precision of samples, and findings of statistical significance without adjustments for stratification will be conservative. Accordingly, adjusting for stratification effects is of less concern than adjusting for clustering. The new STRATA variable includes information about explicit strata whenever such information is available, and includes geographic pseudo-strata for systematic samples following the procedure described in **Davern et al. (2009)**.

Researchers using complex sample procedures should be cautious about using case selection to study subpopulations. In order to correctly adjust for clustering, the statistical software relies on information about the structure of the full sample. When subsetting, clusters without the characteristic of choice may be eliminated from the data and adjusted estimates from incomplete information will be wrong. If subsetting is essential to the analysis, some statistical packages have domain analysis features that retain full information for purposes of calculating sample design adjustments. Again, see the **IPUMS User Note** for further detail about dealing with subpopulations.

For most analyses using IPUMS data, there is little risk of drawing invalid conclusions due to underestimated variance. When examining relationships on the margin of statistical significance, however, it may be wise to adjust for household clustering and weighting as outlined above. The **Sample Design Summary** page provides a quick reference to sample design characteristics that affect standard error estimation. These procedures will yield conservative estimates of statistical significance for all IPUMS samples except the few that incorporate geographic clustering.