Harmonization

International census samples employ differing numeric classification systems and reconciliation of these codes is a major part of this project. Variables must be easy to use for comparisons across time and space. This requires that we provide the lowest common denominator of detail that is fully comparable. On the other hand, we must retain all meaningful detail in each sample, even when it is unique to a single dataset.

For most variables, it is impossible to construct a single uniform classification without losing information. Some samples provide far more detail than others, so the lowest common denominator of all samples inevitably loses important information. Composite coding schemes offer a solution. The first one or two digits of the code provide information available across all samples. The next one or two digits provide additional information available in a broad subset of samples. Finally, trailing digits provide detail only rarely available. For example, in IPUMS-International, the first digit of the variable for marital status is comparable across all samples. The second digit delineates consensual unions from other forms of marriage (where appropriate) and distinguishes among the categories separated, divorced, and married with spouse absent. The final digit provides additional detail with the married and married-spouse-absent categories (such as polygamous marriages in Kenya). The basic goal of our harmonization efforts is to simplify use of the data while losing no meaningful information.

In addition to providing harmonized codes for variables and accompanying documentation, the IPUMS-International project is carrying out a variety of additional tasks to improve data quality, not all of which have been implemented at this time. These tasks include the following:

Back to Top