March 15, 2018
- Historical census data from Canada, Denmark, the United Kingdom, Germany, Iceland, Norway, Sweden, and the United States for the period 1703 to 1911 are now available from IPUMS-International. The complete count and sample datasets were previously disseminated by the North Atlantic Population Project (NAPP). Where possible, the data have been integrated into existing IPUMS-International variable coding schema. Some new variables have been created that are available only for these pre-1960 datasets.
- NAPP data users should note that many NAPP variables are available from IPUMS-International by different names. For a complete list of NAPP variables that have been renamed in IPUMS-Interational, refer to the crosswalk.
- IPUMS-International now provides harmonized and year-specific geography variables for all countries including 13 new samples from Dominican Republic, Germany, Indonesia, Israel, Malaysia, Mongolia, Nicaragua, Nigeria, Palestine, Paraguay, Thailand, United Kingdom, and Uruguay. First-level and second-level year specific geography variables are also available for all countries. IPUMS provides corresponding, downloadable GIS boundary files for all harmonized and year specific geography variables. More information about IPUMS geography variables is available here.
- Harmonized and year-specific geography variables for Brazil and Colombia have been edited to accommodate for the availability of refined municipal boundaries. Users should be aware that codes and labels have changed in all harmonized and year specific geography varaibles for these two countries.
- Errors affecting BPLSE2 (formerly BPLPARSE) for Sweden 1890 and the underlying source variable were corrected. Several thousand cases were incorrectly coded as 258101000. These cases have been updated with the correct code: 258171000.
Historical data from NAPP project now available from IPUMS-International.
August 1, 2017
- Released new full-count datasets for Great Britain 1851, 1861, 1871 (Scotland only), 1891, and 1901. Released a new full-count dataset for Sweden 1910.
- Released revised full-count data for Great Britain 1881
- Released revised full-count datasets for Sweden 1890 and 1900. The revision includes the following changes that improve comparability across Sweden datasets:
- Revisions to certain ethnicity and work variables (and the underlying source data): ORIGIN, LABFORCE, OCCHISCO, OCRELATE, OCSTATUS.
July 15, 2017
- AGEMARR was edited to add data for Hungary 1980 and 1990.
April 19, 2017
- Twenty-four new samples for Belarus, Botswana, Canada, China, Egypt, Greece, Hungary, India, Iran, Mexico Poland, Romania, Tanzania, and Trinidad and Tobago were added to the data series. Botswana, Poland, and Trinidad and Tobago are new countries in IPUMS. The samples for Belarus, Canada, China, Egypt, Greece, Hungary, India, Iran, Mexico, Romania, and Tanzania extend the pre-existing data series for those countries.
- IPUMS-International now includes harmonized second-level geography variables for 70 countries. New first-level and second-level year specific geography variables are also available for many countries. IPUMS provides corresponding, downloadable GIS boundary files for all harmonized and year specific geography variables. More information about IPUMS geography variables is available here.
- Harmonized geography variables for Iran, Mexico, and Tanzania have been edited to accommodate new samples. Likewise, harmonized geography variables for Malawi have been edited to accommodate recently acquired boundary information for the 1987 sample. Users should be aware that codes and labels have changed in all harmonized and year specific geography varaibles for these countries.
- EDATTAIN for Brazil 2010 was edited to identify individuals who never attended school (code 110).
- ETHNICSN for Senegal 1988 and 2002 was edited to correct a label error. The label for code 113 was changed from "Wolof" to "Khassonke".
- CHDEAD was edited to remove data for Armenia 2001, Armenia 2011, Costa Rica 2011, Mozambique 1992, and Mozambique 2007. These data had been calculated by subtracting CHSURV from CHBORN. As a rule, we only include samples that directly reported total child deaths in CHDEAD.
- Errors affecting the family interrelationship pointer variables in all Zambia samples were corrected.
November 1, 2016
- Released the full-count dataset for United States 1850.
March 25, 2016
December 1, 2015
- An error in LIT for Senegal 2002 was corrected. The values for yes and no had been reversed.
November 1, 2015
- Released new full-count datasets for Great Britain 1911, Denmark 1787 and 1801, Iceland 1703 and 1910, and Sweden 1880.
- Released a new dataset for Iceland 1729 which contains full-count data for three counties: Rangárvallasýsla, Árnessýsla, Hnappadalssýsla.
- Released a new 5% sample for Canada 1911.
- Released a revised full-count dataset for United States 1880. The revision includes the following additions and improvements:
- The revised file includes the variable OCCHISCO, which provides occupation using the Historical International Standard Classification of Occupations (HISCO) coding scheme.
- The revised file also includes several variables that were not present in the original Church of Jesus Christ of Latter-day Saints complete-count database for 1880.
- Several minor edits to singular cases correcting entry errors. Most notably, most of the cases formerly coded as "Adopted, n.s." (RELATEH code 0304) have been changed to "Adopted Child". This change also effects the variables STEPMOMH and STEPPOPH as some children coded as "Adopted, n.s." were considered step-children when adoption status could not be determined. RELATEH codes for some boarders and lodgers were being coded as "Relative of Employee" in cases where there was no employee in the household. These codes have been corrected. There were also several cases where an original RELATEH code was missing and allocated, but the following individual RELATEH codes within the household were incorrectly edited to Boarder/Lodger. This has been fixed to reflect the original input values for RELATEH. This change to RELATEH also affects variables that were generated based on family interrelationship.
- Data from one missing reel of microfilm was restored to the 1880 complete-count database.
- Released a revised full-count dataset for Iceland 1901. The revision includes improved household breaks as well as refined versions of parish and farm ID variables. The dataset also includes a new parish of birth variable.
October 8, 2015
- PERSONS was expanded to include the following samples: Armenia 2011; Austria 2011; Costa Rica 2011; Ethiopia 1984, 1994, 2007; France 2011; Ghana 1984; Mozambique 1997, 2007; Paraguay 1962, 1972, 1982, 1992, 2002; Portugal 2011; Puerto Rico 2010; South Africa 2011; Spain 2011.
- YRIMM was expanded to include Spain 2011.
September 1, 2015
- Nineteen new samples for Armenia, Austria, Costa Rica, Ethiopia, France, Ghana, Mozambique, Paraguay, Portugal, Puerto Rico, South Africa, and Spain were added to the data series. Ethiopia, Mozambique, and Paraguay were newly added countries to IPUMS. Samples for other countries extend pre-existing series for those countries.
- IPUMS-International has continued to improve geography, providing harmonized geographic units for the second administrative level for roughly half the countries. The revisions to geography are expected to be completed by summer 2016. More information about IPUMS geography variables is available here.
- IPUMS-International renamed approximately 100 integrated variables, expanding them to be somewhat more consistent and intuitive. Affected variables with their current and previous names are listed here. Geography variable also underwent wholesale renaming. We apologize for any inconvenience to users posed by these changes.
July 1, 2014
- IPUMS-International has revised the geographic variables, introducing first-level harmonized subnational geography for all countries, along with associated GIS boundary files. Pre-existing geography variables were renamed to conform to a more systematic naming convention intended to distinguish harmonized and unharmonized variables at the first and second administrative levels.
- Twenty new samples for Dominican Republic, Ghana, Ireland, Liberia, Mali, Nigeria, Ukraine, Uruguay, and Zambia were added to the data series. The samples for Ghana, Ireland, Mali, and Uruguay extend the pre-existing series for those countries.
July 1, 2013
- Twenty-seven new samples for Argentina, Bangladesh, Brazil, Burkina Faso, Cameroon, Ecuador, Fiji, Haiti, Kenya, Krygyz Republic, Panama, South Sudan, and the United States were added to the data series. The South Sudan sample contains records that were formerly part of the Sudan 2008 sample, but which are now treated as a separate country.
- Data from four recent censuses from Brazil, Ecuador, Haiti, and Panama that record individual mortality and/or migration events were made available from IPUMS-International. These files can be downloaded and linked to data produced by the extract system.
June 1, 2012
- Twenty-six new samples for El Salvador, Indonesia, Mexico, Morocco, Nicaragua, Turkey, and Uruguay were added to the data series. The sample for Mexico extends the pre-existing series for that country.
- Released new full-count datasets for Noway 1910 and Sweden 1890.
- Released a slightly revised full-count dataset for Sweden 1900.
- Released samples of linked males, females and couples across the 1851 and 1881 Great Britain datasets.
- The data release also added 40 new harmonized variables and approximately 2300 unharmonized variables specific to the individual samples.
August 1, 2011
- IPUMS-International significantly redesigned the extract process in the web dissemination system. The new process is far more streamlined, relegating the numerous steps in the old system to a list of options that users can choose to ignore.
June 1, 2011
- Twenty-six new samples for Cambodia, Egypt, France, Germany, Iran, Ireland, Jamaica, Malawi, Palestine, Sierra Leone, Sudan, and Vietnam were added to the data series. The samples for Cambodia, Egypt, France, Palestine and Vietnam extend pre-existing series for those countries.
- Released new full-count datasets for Iceland 1801 and 1901 and Norway 1801.
- Released two new samples for Canada. The 1852 sample is a systematic 1-in-5 sample of the national population. The 1891 sample combines three slightly overlapping subsamples of 5, 10 and 100% into one national sample.
- The data release also added 40 new harmonized variables and approximately 2100 unharmonized variables specific to the individual samples.
February 1, 2011
- IPUMS-International introduced a new version of the web user interface for browsing variables and creating data extracts. The new system is explicitly designed around the concept of a "data cart" one adds to while browsing and from which one "checks out" to generate a data extract. We continue to develop new features based on this design.
July 1, 2010
- A new sample for India 2004 was added to the data series. The sample is an employment survey similar to the other India samples.
- Released a new sample from Mecklenburg-Schwerin 1819, which includes full count data for the city of Rostock.
- Released new linked data samples for the United States 1850 to 1870 and 1900 to 1910.
- Linked datasets across samples in the United States and Norway.
- The datasets for Norway include linked males and couples across all three census years from 1865 to 1900.
- The datasets for the United States include 7 linked pairs of census years involving the 880 complete count data. The linked years include: 1850-1880, 1860-1880, 1870-1880, 1880-1900, 1880-1910, 1880-1920, and 1880-1930. We have created three independent linked samples for each paired year: linked men, linked women, and linked married couples. For more information on the linked samples, refer to the linked samples page.
- Added an expanded version of the 1880 United States dataset, with additional education and disability variables and a 1-in-5 oversample of the minority population.
June 1, 2010
- Twenty-eight new samples for Cuba, Mali, Nepal, Pakistan, Peru, Puerto Rico, Saint Lucia, Senegal, Switzerland, Tanzania, and Thailand were added to the data series.
- The data release also added 55 new harmonized variables and approximately 2500 unharmonized variables specific to the individual samples.
- IPUMS-International added a discussion of sampling error that highlights situations where sample design can significantly affect standard errors. We continue to develop this material.
February 1, 2010
- IPUMS-International made available downloadable datasets containing fertility, mortality and migration events for seven censuses from developing countries. Because there can be multiple events per person or per household, these data do not fit within the data structure handled by the IPUMS extract system. Instead, these files can be downloaded and matched onto the extract data, giving researchers complete flexibility to devise their own measures.
January 1, 2010
- IPUMS-International also introduced a new web interface that integrates variable browsing with the data extract process. The new system also includes a variable search feature.
- A problem with MARST affecting Brazil 2000 was corrected. There are three source variables that are not entirely consistent with one another. After review, we altered our interpretation of the data, which results in more consensual unions and separated persons and fewer married, divorced and widowed. The unharmonized source variables are unchanged, so users can access them to explore this issue further.
- An error in INCEARN affecting Venezuela 1981 was corrected. Too many cases were receiving a value of "1".
May 1, 2009
- Nineteen new samples for Armenia, Bolivia, France, Guinea, India, Italy, Jordan, Kyrgyz Republic, Mongolia, Romania, Slovenia, and South Africa were added to the data series. The Indian samples are large employment surveys that asked many questions common in censuses. The French, Romanian, and South African samples extend pre-existing series of samples for those countries.
- IPUMS-International also added approximately 60 new harmonized variables and 1700 unharmonized variables specific to the individual samples.
- IPUMS-International introduced GIS boundary files. These enable users to map variables with country-level geography and variables relating to the first administrative level within in each country, such as place of residence and birthplace.
October 1, 2008
- Released a complete count dataset of Sweden 1900.
- Released a sample of Great Britain (England, Wales, and Scotland) 1851.
- Revised Canada 1871, 1881, 1901, Norway 1865, 1875, 1900, United States 1880, and Great Britain 1881 (England, Wales, and Scotland).
June 1, 2008
- Thirty-two new samples for Austria, Canada, China, Colombia, Egypt, Ghana, Iraq, Malaysia, Mexico, Netherlands, Panama, United Kingdom, United States, and Venezuela were added to IPUMS-International.
- IPUMS-International added approximately 100 new harmonized variables amd 2000 unharmonized variables specific to the individual samples.
- Location-of-mother and location-of-father data were developed for all samples. The constructed parental locator variables MOMLOC and POPLOC gives the record number within the household of each person's mother or father using information on age, relationship, marital status, child-bearing, and other data. The variables makes it easy to attach the characteristics of a person's parent to their own record (such as mother's age or father's occupation), or to summarize the characteristics of dependent children (such as number of own children in school). The basis for making the links is summarized in the variable PARRULE.
- Users can customize their extract size by selecting the number of households or persons they want from each dataset. The extract system draws a subset of households that match the desired case-count or sample fraction and generates syntax files that adjust the weight variables appropriately.
- While browsing the documentation, users can save variables to include in their data extract later in their web session.
- The extract system will "attach characteristics": it will use information from the record of the spouse, mother, father, or head to create new variables such as "mother's employment status." The feature uses the constructed family interrelationship "pointer" variables, MOMLOC, POPLOC, and SPLOC.
July 1, 2007
- Corrected an error in the SPLOC affecting Rwanda 1991, South Africa 2001, and Uganda 1991. Second and higher order spouses in polygamous unions were not being linked to their husband.
- Improved the RELATE codes in 1980 and 1990 Hungary for persons who were not part of the primary family. The samples only contain relationship-to-subfamily-head information, but it is possible to infer the relationships between subfamilies in most cases.
June 1, 2007
- Seventeen new samples for Argentina, Hungary, Israel, Palestine, Portugal, and Rwanda were added to the data series.
- Location-of-spouse data were developed for all samples. The constructed spouse locator variable SPLOC gives the record number within the household of each person's spouse using information on age, relationship, marital status, and other data. The variable makes it easy to attach the characteristics of a person's spouse to their own record (such as spouse's age or occupation). The basis for making the link is summarized in the variable SPRULE.
December 1, 2006
- Sixteen new samples for Belarus, Cambodia, Greece, Philippines, Romania, Spain, and Uganda were added to the data series.
- IPUMS-International added unharmonized variables as a new feature of the documentation system, giving users access to the full information of the original samples -- even those variables we have not harmonized cross-nationally. Also introduced content filtering, so only information for selected countries appears on the various documentation pages.
October 1, 2006
- Released four new datasets: Canada 1871 and 1901, Norway 1865, Scotland 1881.
- Revised Canada 1881, England and Wales 1881, Norway 1900, United States 1880 to add a substantial number of newly constructed variables:
- Family interrelationship variables: Added variables on number of couples, mothers, and fathers in household. Added grandparent pointers (analogous to the existing MOMLOC, POPLOC, and SPLOC pointer variables, but for grandparents). Number of sons or daughters married or unmaried is now available for all datasets with relationship information; previously, variables were only available for England and Wales 1881. Added new variable for number of children under age 10, analogous to the existing NCHLT5 variable.
- Geographic variables: For Canada and the United States, urban residence, (IPUMS compatible) city codes, and city populations are now available. Enumeration and supervisors districts are available for the United States.
- Work and employment variables: Labor force participation is now available for all samples. Harmonized occupational codes (adapted from the HISCO coding scheme) are available for the United States, Canada, and Norway. PRODUCT codes for sales workers are now available for both the United States and Norway. Standardized occupational strings (OCCLABEL) are available for the United States; this variable corrects spelling mistakes, expands abbreviations, and standardizes common phrases, to allow researchers searching for very specific occupations a better chance of finding these individuals.
- Ethnicity and migration variables: Simplified country of birth codes identifying individuals as being born in a specific NAPP country, or in any other country (NAPPSTER), are now available for all samples. SPANNAME is now available for the United States 1880.
- Other variables: AGEMONTH now available for the United States 1880.
June 1, 2006
- Nineteen new samples for Chile, Costa Rica, Ecuador, South Africa, and Venezuela were added to the data series.
- Approximately 50 new variables were added to IPUMS-International.
- IPUMS-International introduced a dynamically generated variables page allowing users to customize their view of the contents of the data series.
- IPUMS-International also added a feature that compiles on a single web page, for any IPUMS variable, all relevant enumeration text from every census. This page can also be customized to include only the samples of interest to researchers.
December 1, 2005
- The expanded samples and improved data extraction system developed in March were moved to the regular IPUMS-International site. The beta test site was deactivated.
March 1, 2005
- Thirty-three new household variables and 100 new person variables were added to the data series on the beta test site. The new release adds substantially to the household record, and nearly completes all remaining person variables for the 28 samples currently in the data series.
- IPUMS-International introduced a new data extraction system with improved features. The most significant improvement is the ability to revise and resubmit past data extracts. The system also allows the user, when performing case selection, the choice of including only the persons meeting the selection criteria, or including all persons within households in which any person meets the selection criteria.
January 1, 2005
- Added Scotland to the Great Britain 1881 dataset.
- Added harmonized occupational data for Great Britain 1881.
- Added imputed relationships for Canada 1881.
December 1, 2004
- Posted revisions to the Canada 1881 and United States 1880 data. Corrected missing values in Canada 1881. Corrected missing values and improved the code for constructing household inter-relationship pointers for the United States 1880 data. Also added the following new variables:
- LABFORCE: Added labor force participation variable based on the gainful occupation definition. This variable is consistent with the IPUMS LABFORCE variable for all pre-1940 censuses.
- SEIUS: This variable for the Duncan Socioeconomic Index is consistent with the IPUMS variable SEI.
- OCSCORUS: This occupational income score variable reports median total income in 1950 for the occupation (OCC50US). The unit of this variable is hundreds of 1950 dollars. Thus, an occupational income score of 70 means that the median total income of all people with the same occupation in 1950 was $7000. This variable is consistent with the IPUMS variable OCCSCORE.
- NAPPSTER: This recode of country of birth identifies the five NAPP countries, assigns all other birthplaces to one code, and retains a code for unknown birthplace. This variable allows users to easily select all people born in any NAPP country.
- SEAUS: This variable for state economic area (a grouping of contiguous counties that had close economic ties at the 1940 and 1950 censuses) is consistent with the IPUMS variable SEA, except in the Dakota Territory.
- YEAR: Reports the year the census was conducted. Note that this is a four digit variable, while the IPUMS YEAR variable uses two digits.
- RELEASED: Reports the date this version of the data was released.
November 1, 2004
- Released data for Norway 1900. Norway 1900: users should be aware of the following data characteristics.
- The Norwegian enumeration of relationship to head does not correspond to the household breaks. Specifically, not everyone who is the first person in a household is enumerated as a head. The Norwegian enumeration of relationships recorded relationships within dwellings, which sometimes contained multiple households.
- To generate statistics at the household level, users should select people with a PERNUM of 1. Selecting people with a RELATE code of 0101 will not select all households.
- Users should note that several coding schemes and variables retain information that was not consistently enumerated, but which was nevertheless recorded by some enumerators. In particular:
- BPLNO retains some information on the U.S. states where people were born, though this information was not required.
- Ethnic origin and language spoken were not collected in all areas of Norway. See the variable descriptions for ORIGIN and LANGUAGE variables for further detail.
- Added a NAPP version of HISCO for the first occupation specified.
- The following variables are alpha-numeric: RECTYP, RESNAMNO, OCCSTRNG, NAMEFRST and NAMELAST.
- Released data for England and Wales 1881. Users should be aware of the following characteristics:
- The data for Scotland and occupational harmonization have yet to be added to this sample.
- Two sets of household relationship codes are available for this census: the codes used by the 1881 British census project (RELAGB) and a harmonized set of codes used in all NAPP censuses (RELATE).
- We have included the pointer variables constructed by the 1881 British census project (MOTHERGB, FATHERGB, SPOUSEGB) for comparison with the NAPP pointers constructed by the Minnesota Population Center. NAPP-constructed pointers follow the IPUMS conventions and are constructed similarly for all datasets.
- The following variables are alpha-numeric: RECTYP, BPSTPAGB, DISABGB, OCCSTRING, NAMEFRST and NAMELAST.
September 1, 2004
- New five percent samples for USA 1980-2000 replaced the previously disseminated one percent samples.
July 1, 2004
- An error in OCCISCO for Kenya 1989 was corrected. Persons in the labor force were incorrectly coded to "Not Applicable."
June 1, 2004
- Preliminary samples for Brazil 1960, 1970, 1980, 1991 and 2000 were added to IPUMS-International. Some constructed variables, such as the spouse and parental locators, are not yet included in the Brazil samples.
January 1, 2004
- New versions of the Colombia samples were introduced that allow the identification of municipalities and municipality groupings with at least 20,000 population in 1993.
December 1, 2003
- Released data for Canada 1881. All variables that are common to the historical censuses of the United States and Canada are coded in harmonized coding schemes. Users should note the following issues:
- Relationship to household head was not enumerated in the 1881 Canadian census. Without relationship information, all spouses are coded as 2, "married, spouse absent". An imputed version of relationship, similar to IMPREL in IPUMS, will be constructed in the future, and will help identify whether or not a spouse is present.
- Users should note that several coding schemes and variables retain information that was not consistently enumerated, but which was nevertheless recorded by some enumerators.In particular:
- BPLCA retains information on birthplace below province, although respondents were only required to state their province.
- The majority of respondents stated only one religion and one ethnic origin. If a second ethnic origin or religion was entered, that information is recorded in a second variable, ORIGN2CA and RELIGON2 respectively. Because most people did not list a second entry, the vast majority of cases are blank.
- Canadian occupations have been coded into a version of HISCO modified by NAPP.
- The following variables are alpha-numeric: RECTYP, SDSTCA, SDSTNMCA, OCLANGCA, OCCSTRING, NAMEFRST and NAMELAST.
- An error in RELATE for Mexico 1970 was corrected. The census asked for relationship to head of family, not head of household. In the previous version of the data, we erroneously interpreted these families/subfamilies in multi-family households as if they were separate households in multi-household dwellings. Only households with subfamilies were affected by the error. In the current data, subfamily members in multi-family households are coded "unknown" for relationship to household head, but a separate variable SUBFREL retains their relationship to their family head.
October 1, 2003
- An error in URBAN for 1970 Mexico was corrected. The values for urban and rural had been reversed.
- All the French samples in IPUMS-International were replaced with versions containing persons organized in households. The previous versions were individual-level samples that did not group co-resident persons.
August 1, 2003
- We have posted some corrections to the previous United States 1880 dataset.
July 1, 2003
- IPUMS-International added numeruous variables on birthplace, migration, and disability, among others.
- United States 2000 was added to IPUMS-International.
- Released preliminary data from the U.S. Census of 1880. These preliminary data are not complete. Users should be aware of several issues.
- Preliminary data include the states alphabetically from Alabama through Ohio.
- Missing data values have not been allocated. Users may encounter a "Z" in numeric fields where there are missing values in MARST, RELATE, and FBPLDTUS, and MBPLDTUS.
- A coding error has changed the ages of 13 year olds to 12 years old.
- Occupation coding is incomplete for approximately 180,000 people. People with occupation code 997 also have temporary codes for industry.
- Pointer and associated constructed variables should be used with caution in this beta release.
- At this time, we are undertaking a final round of geography checking on the complete-count data, so small level geographic units are not yet available. If your research requires data from smaller geographic areas than counties, you should download the string variable (hyphen delimited) RECIDUS. This variable includes information on microfilm reel number, microfilm sequence number and stamped page number from the microfilmed manuscripts, which provides enough information to identify smaller geographic areas.
- Users should only download the alphabetic strings NAMEFRST, NAMELAST, and OCCSTRNG if they are necessary for research. These 32 character variables increase file sizes significantly.
April 1, 2003
- CHSURV, which reports the number of children born to a woman who were still living at the time of the census, was added to IPUMS-International.
- Preliminary versions of the constructed household and family interrelationship variables, including MOMLOC, POPLOC, and SPLOC were added to IPUMS-International.
- The Kenya 1999 sample was replaced with an improved version.
March 1, 2003
- China 1982 was added to the IPUMS-International database.
August 1, 2002
- A problem with PERWT for Vietnam 1989 was corrected. The previous weights were erroneous.