PUMS datafiles contain records representing 1 in 1000, 1 percent, and 5 percent samples of the housing units in the United States and the persons in them. Each PUMS file provides records for states and some of their geographic levels.
The 5% sample identifies every state and various subdivisions of states called Public Use Microdata Areas (PUMAs), each with at least 100,000 persons. These PUMAs were primarily based on counties, and may be whole counties, groups of counties, or places. When these entities have more than 200,000 persons, PUMAs can represent parts of counties, places, etc. None of these PUMAs on the sample crosses state lines.
The 1% sample was based primarily on metropolitan/nonmetropolitan areas, and contains PUMAs which were made from whole central cities, whole Metropolitan Statistical Areas (MSAs) or Primary Metropolitan Statistical Areas (PMSAs), MSAs or PMSAs outside the central city, groups of MSAs or PMSAs, and groups of areas outsides MSAs or PMSAs. When the areas have more than 200,000 persons, 1% PUMAs can represent parts of central cities, MSAs/PMSAs, and so forth. 1% PUMAs may cross state lines and in that case state codes are not shown.
Each datafile is documented in a codebook containing a data dictionary and supporting appendix information. Electronic versions for the codebooks are only available for the 1980 and 1990 datafiles.
Each file generally contains two record types, each with different variables, rather than one longer record with all the variables. The two basic record types are the housing unit record and the person record. A serial number in each record links the persons in the housing unit to the proper housing unit record.
Information from the censuses were derived either from questions asked of the entire population or from questions asked of only a sample of the population. Those questions asked about every person and housing unit are called 100-percent or short-form questions. The others are called sample or long-form questions.
Those households receiving the short-form questionnaires were asked only the 100-percent questions, and those receiving the long form were asked both the 100-percent questions and the sample questions. In 1990, some 17.7 million housing units received a long form, out of an estimated total of 106 million units (about 16.7%). Sampling rates vary depending on geographic location and population size.
PUMS datafiles contain a sample of the individual long-form census records showing most population and housing characteristics with identifying information removed.
The coding system varies for each census, so it is important to have access to the codebook for each census in order to assess the meaning of a specific field in a census record and its comparability across censuses. Very little comparability exists between geographic identifiers on each of the previous files, but housing and population characteristics are similar. Because of this similarity, microdata files from the most recent censuses are useful for analysis of trends.
The sample questionnaires were edited for completeness and consistency, and substitutions or allocations for any missing data were made. Allocation flags appear at the end of each record to indicate when an item has been allocated. A user wishing to tabulate only actually observed values can eliminate variables with allocated values.
Discussed in more detail in every decennial datafile description. Generally, the following topics are of interest:
allocation flags for housing items; bedrooms; condominium status; contract rent; cost of utilities; family income; family, subfamily, and relationship recodes; farm status and value; fire, hazard, flood insurance; fuels used; gross rent; house heating fuel; household income; household type; housing unit weight; kitchen facilities; linguistic isolation; meals included in rent; mortgage status and selected monthly owner costs; plumbing facilities; presence and age of own children; presence of subfamilies in household; property value; real estate taxes; rooms; sewage disposal; source of water; state; telephone in housing unit; tenure; units in structure; vacancy status; vehicles available; year householder moved into unit; and year structure built.
ability to speak English; age; allocation flags for population items; ancestry; children ever born; citizenship; class of worker; disability status; educational attainment; Hispanic origin; hours worked; income by type; industry; language spoken at home; marital status; means of transportation; migration PUMA; migration state; military status, periods of active duty military service, veteran period of service; mobility status; occupation; person's weight; personal care limitation; place of birth; place of work PUMA; place of work state; poverty status; race; relationship; school enrollment and type of school; time of departure for work; travel time to work; vehicle occupancy; weeks worked; work status; work limitation status; and year of entry.