Geography Variables

IPUMS provides two kinds of geography variables: harmonized and year-specific.

  • Harmonized variables provide consistent geographic units for a country across sample years, to facilitate comparisons over time. Some geographic detail is typically lost in the construction of the harmonized units.
  • Year-specific variables retain all of the original detail from each sample, but they are usually not fully consistent over time.

IPUMS geographic harmonization is performed on first- and second-level administrative units, which are provided by most countries. GIS boundary files are provided for harmonized and year-specific variables at both administrative levels. Some samples also have additional geography variables.

Click HERE for a list of geography variables across all IPUMS-International samples.

The basic set of geography variables for any IPUMS country includes the following (using Mozambique as an example):


Variable Geographic level Type of geography
GEO1_MZ First level: provinces Spatially harmonized, 1997-2007 [GIS]
GEO1_MZ1997 First level: provinces 1997 provinces [GIS]
GEO1_MZ2007 First level: provinces 2007 provinces [GIS]
GEO2_MZ Second level: districts Spatially harmonized, 1997-2007 [GIS]
GEO2_MZ1997 Second level: districts 1997 districts [GIS]
GEO2_MZ2007 Second level: districts 2007 districts [GIS]

IPUMS geography is in transition

IPUMS is in the process of applying the geographic variable treatment described above to all samples in the database. Currently, many countries still have geographic elements that are not yet updated, reflecting an older method of organizing this information. These countries lack year-specific variables. Instead, they have variables that are harmonized over time by name only. That is to say, the codes and labels are harmonized across samples, but the boundaries are not necessarily consistent, and some places exist in some samples and not in others. There are no boundary files associated with these variables. Only the first-level geographic units are harmonized for these countries.

By summer 2016 it is expected that all all countries will follow the new method described above. The basic set of variables under the old method looks like the following (Vietnam is the example):

Variable Geographic level Type of geography
GEO1_VN First level: provinces Spatially harmonized, 1989-2009 [GIS]
GEO1_VNX First level: provinces Harmonized by name [Non-GIS]
GEO2_VNX Second level: districts Harmonized by name [Non-GIS]

Notes on current state of geography

  • Spatially harmonized geography at the first administrative level is available for all IPUMS countries and they have associated GIS boundary files.
  • Spatially harmonized geography at the second administrative level are available for many IPUMS countries and they also have associated GIS boundary files.
  • Year-specific geography at the first and second administrative level is available for many IPUMS countries; they have associated GIS boundary files.
  • Variables that are harmonized by name are only available for countries where year specific geography is unavailable. As year-specific geography at the first and second administrative levels becomes available, non-spatially harmonized variables will be phased out.
  • At the present moment IPUMS is only working on providing users with spatially harmonized units at the first and second level of geography. If lower levels of geography meet our confidentiality requirements, they are harmonized by name. Any other geography variables like cities and urban areas are also harmonized by name.

Geographic harmonization process

Creation of spatially consistent geographic units involves a series of processes:

(A) Acquisition and creation of historical GIS files for each of a country's censuses. For older samples, images from historical maps or census volumes are digitized and converted into digital files.

(B) Where the borders of the modern units do not align with historical units, because of boundary changes, larger aggregated units are created that remain stable over time; we refer to this process as harmonization of geographic boundaries.

(C) If aggregated geographical units for the latest sample year have less than 20,000 population, units are grouped for confidentiality, privileging contiguity and similarity in population density; we refer to this process as regionalization.

(D) Units that are harmonized and regionalized are disaggregated to create smaller year-specific units in order not to disadvantage users interested in only a specific sample year. The smaller units are grouped for confidentiality, as necessary.