Design your future

Census Bureau

American Community Survey (ACS): Public Use Microdata Samples

ACS microdata overview

The United States Census Bureau conducts an annual survey of approximately 3.5 million households called the American Community Survey (ACS), and releases the data in two formats that we use: a Summary file and a Public Use Microdata Sample (PUMS) file. The data is widely used by the federal government to shape legislation and funding for the nation.

Methodology and content

In contrast to the decennial census which is designed to literally count all people living in the US, the ACS is administered using sampling: a long-form questionnaire is sent to a small percentage of the population every year. To protect privacy, some values, such as financial gains and losses or person age are top and/or bottom coded, and geographical location is only broadly known. This is in contrast to the summary files, which are able to share summary information from the households for very small geographical areas such as zipcodes or even blocks without endangering citizen privacy. The household PUMS data is reported across Public Use Microdata Areas (PUMAs) which each contain at least 100,000 people and are defined after each decennial census.

Included in the microdata files are a wide range of social, demographic, economic, and housing data which serve to enable analysis of relationships between various data points in a large community. For example, Ididio is able to describe the characteristics of people with given careers or degrees by aggregating the individual responses. The microdata is available in 1-year and 5-year formats; we use the 5-year data to allow greater statistical accuracy of the information presented.

Finding the data

Formatting and viewing the microdata files is not an easy task. However, people without data wrangling knowledge have terrific access to this data through the Census Bureau's American FactFinder, or, alternatively, through IPUMS. We access the microdata files through the Census FTP Server. We most recent accessed the 1-year PUMS data for 2018 on January 7, 2020 and the 5-year PUMS data for 2018 on February 5, 2020.

Additional details for this source

Which records are used to create data?

At Ididio, we used the house-by-house ACS microdata to investigate outcomes associated with people 1. Who earned bachelor's degrees in various fields, or 2. Who were working in various occupations.

Creating data based on college majors

When reporting outcomes for people based on the field of an earned bachelor's degree, we limit the microdata to those people who have attained a bachelor's degree. Some people report two undergraduate majors, and for such individuals, we split the corresponding survey record into two records. Each of the new records contains a single major, and the new record's survey weight is equal to half of the original's weight. This allows us to aggregate according to a single major, given that all possible combinations of double-majors would not yield statistically useful results.

Creating workforce data

The assumptions made in compiling workforce are significant in the results that we present on Ididio. Each assumption is based on an extensive study of the assumptions made by BLS and the Census Bureau in other data products, as well as a comparative analysis of the data that resulted when using differing assumptions.

Except when creating statistics related to age, we limit the person records to those who report occupations and who are 65 or younger. We found much more outliers for workers as they age, as post-retirement careers can be atypical.
With the exception of calculations involving employment status and part-time/full-time status for all who report occupations, we only calculate workforce data for individuals who report working 35 hours a week or more. This is our effort to create salary and other statistics for full-time work only.
Many individuals report both wage and salary income (paid by an employer) as well as self employment income, and we combine these two values for a single income corresponding to an individual. However, ACS top-codes these values at differing levels, so it is difficult to quantify/identify top-coding errors in our resulting estimates for high wage earners. It is also not possible to infer from the survey questions whether both wage earnings and self-employment earnings are for the same reported occupation, and without additional information we do infer that both types of earnings apply to the reported occupation.

Measuring experience

One of our big goals with careers is to separate the jobs that come with experience from those that are starting-level, and to share with our members the potential for economic growth that may come without switching jobs and climbing a career ladder. Many resources that we've read choose a static age range to identify starting salaries, and similarly a static age range for mid-career salaries. However, starting ages differ quite a bit in reality depending on the education or experience required for entry.

To get our hands around the concept of experience, let's decide that the starting age and all younger ages account for roughly 10% of the workforce. So for a job that doesn't require education, that age is typically in the early 20s or even late teens, and for a job requiring further education, the age rises.

To find a starting salary, we found the median (middle) salary reported for each career at the starting age as well as a year younger and a year older. We averaged those median salaries to provide an approximate starting salary.

We then wanted to determine a salary that might best reflect an experienced worker's salary, and we decided to choose the youngest age that is older than 75% of workers. We again took a year younger and a year older, and we again averaged the median salaries for these three ages.

Once we found an experienced salary and a starting salary, we divided the experienced salary by the starting salary to find the experience quotient. The average experience quotient for all workers is 1.5, meaning that within the same job we could typically expect a salary increase of about 50% with experience. Higher quotients mean better return for experience, although we noted that the highest quotients tended to reflect abysmal starting pay.

Special program statistics

When analyzing bachelor's programs, one statistic Ididio offers an underemployment metric. Using the ACS microdata, we find the percentage of bachelor's graduates in a given field who end up in each possible career field. For each career field, we find the percentage of workers who have less than a bachelor's-level education. Finally, for each bachelor's degree field, we take a weighted average of the percentage of workers who have less than a bachelor's education using the percentage of bachelor's workers in each career as weights. This is the overall average percentage of people with less than a bachelor's degree who are working with people who earned a bachelor's degree in a given field. Ididio provides this metric as one of the ways to think about the end value of a degree. Could you have had the same job without earning a bachelor's first?

Career focus by college major

Another statistic that Ididio creates to help consider potential college majors an idea of the focus of a major. A very focused major, such as is the number of end-careers held by at least 1% of ACS respondents for each college major. A small number of end-careers suggests a very focused major; for example, Registered Nursing prepares students for nursing careers and we have about a half dozen associated careers, with 65% of majors all working as registered nurses. A large number of end careers suggests a more general major that doesn't necessarily lead to a clear career path; for example, about two dozen end-careers are held by at least 1% of business administration and management majors. The most populated career for business majors accounts for only 7% of graduates.

STEM designations for careers

The Census Bureau maintains a list of STEM and STEM-related careers to support their data products. We use this list to flag careers as STEM. The list is available on the Industry and Occupation Code Lists & Crosswalks page. We are using the 2018 list which we downloaded on October 7, 2019.

Methodology for creating percentiles from ACS microdata

At Ididio, we use the ACS microdata to create five-number summaries of the spread of responses:

The median is the value dividing the data into two parts, with 50% of the responses at or below the median and 50% at or above the median.
The 25th and 75th percentiles bound the middle 50% of the data, with 25% of the data at or below the 25th percentile and 25% of the data at or above the 75th percentile.
Eighty percent of the data is in between the 10th and 90th percentiles, with 10% of responses less than or equal to the 10th percentile, and 10% of responses at least as large as the 90th percentile.

We follow the method described by the National Institute of Standards and Technology (NIST) to calculate percentiles from a set of numbers, and we use interpolation to calculate values from binned data.

Each person record in the ACS is given a weight, so that the summed weights of all people estimate the total population. Each person is also given 80 replicate weight values that can be used to estimate the uncertainty of statistics inferred using the person record. For all statistics Ididio creates using the PUMS data, we calculate the percent standard error using the replicate weights and the Technical Documentation for ACS data methods.

We do not share any statistics created for which the standard error is greater than 25% of the data value.

Special errors in reporting high salaries and old age

The ACS microdata uses a special kind of top-coding of certain variables in order to protect the privacy of respondents. As an example, all reported wage and salary income above a certain percentile of responses are replaced with a single new value that causes the average over all data to be equivalent to that of the unchanged data. The trigger and replaced salary values vary by state and year.

The result of this policy is that some reported values may be quite a bit higher than their original value, and errors may not be possible to infer. There's a very helpful academic article Top-Coding and Public Use Microdata Samples from the US Census Bureau that explains this top-coding approach and its ramifications, and from the Census Bureau you can see a Census Bureau list of top-coded values and the impacted variables as well as the levels that trigger top coding is also available.