Interpreting Variable Harmonization Files
How to read a harmonization table
General
- The first two columns provide the output codes (column A) and labels (column B) for the harmonized variable.
- Each subsequent column represents an input dataset. The header rows for those columns give the sample name (row 1), sample label (row 2), and source variable name (row 3). Row 4 indicates if there are conditions governing recoding for the sample, as described below. The body of the table, starting with row 6, contains the values the input variable takes for the sample. For most categorical variables, the code is usually followed by a "=" sign and the original value label.
- All values aligned on a row of the table get recoded to the harmonized value in column A.
- Metadata for the source variables referenced in the table can be accessed via the "source variables" tab for the harmonized variable. That information can also be found using the variable search function, selecting the option to include source variables.
- Multiple samples in a column (row 1) - All samples listed in the column are recoded identically, with their source variables listed one-for-one in the source variable row.
- Multiple source variables for a sample (row 2) - Sometimes two source variables are used as input for a sample and are recoded as one concatenated field. Usually the concatenated input codes are not labelled.
- A harmonized variable is given as the source variable (row 3) - In some cases the output of a harmonized variable is used as input for another harmonized variable. The data for the relevant samples are subject to the additional recoding specified in the table. Essentially, the original source variables are recoded a second time.
- "No recode" in a sample column (row 4) - The source variable codes are not altered.
- "Partial recode" in a sample column (row 4) - Only the specific codes listed in the sample column are recoded; others are unaltered.
- ## (hashtags) in the harmonized code (column A) - Indicates a heading used when displaying the codes on the web. No data are recoded on these rows.
- " (quotation mark) in the harmonized label column (column B) - This is a placeholder when multiple rows are assigned the same output value.
- * (asterisk) in the body of the table - An indication of where variable programming is expected to assign cases (see the programming file). The asterisks are only a convention and do not actually constrain the values to which the programming may recode cases.
How to read variable programming
- The code is written in C++, but should be largely intelligible to statistical package users.
- The programming is normally executed after any recodes specified in the harmonization table.
- Most of the programming is sample-specific, with blocks of code starting with "case dataset_id::[sample name]" and ending with "break;"
- Sample names match those in the harmonization table, which also includes their labels.
- Source variables referenced in the code use the short names generated by IPUMS. You can locate the variables in the web system most easily by using the variable search feature, selecting the option to include source variables.