(Compilation of illustrations by Pat Kinsella for the Center for Public Integrity)
Reading Time: 3 minutes

Economists estimate wage theft costs workers more than $15 billion a year. Journalists at the Center for Public Integrity spent a year investigating how widespread the practice is and how effectively the U.S. Department of Labor’s Wage and Hour Division combats this systemic problem.

Now we’re making the data and code that drove these investigations available to the public.

Use our data

Interested in using this data for your own reporting? You can find all the data along with the code we used to produce our analyses on GitHub. Journalists, researchers and anyone else with questions about how to use the data can contact Public Integrity data reporter Joe Yerardi at jyerardi@publicintegrity.org.

Our reporting found:

  • Companies that repeatedly steal from employees rarely face enhanced penalties for repeated violations.
  • The United States Postal Service is one of the worst offenders.
  • The higher an industry’s share of immigrant workers, the greater the rate of wage theft.
  • Explosive growth in guest worker visas has not been accompanied by an equivalent increase in wage theft investigations.

Here’s what we’re releasing: 

  • Department of Labor Wage and Hour Division closed investigative cases from fiscal year 2006 through 2020, obtained via a Freedom of Information Act request.
  • American Community Survey microdata on employment by nativity status and industry from IPUMS USA.
  • Visa totals for fiscal years 2011 to 2019 from the U.S. Department of State.

Our analysis

As is typical with data-driven investigations, we had to deal with challenges and come up with creative solutions. 

For a story in May about companies that were repeat offenders of wage regulations, we grouped employers by their employer identification number (EIN) because the names of companies were inconsistent in the data. EINs were either missing or withheld in about one third of cases involving minimum wage or overtime violations. We excluded those cases from portions of the analysis based on specific companies. For example, some employers used Social Security numbers in place of EINs and those numbers were withheld for privacy reasons. We did not include those employers when calculating the proportion of repeat offenders fined by the WHD. When calculating overall figures such as the total money improperly withheld from employees each year, we used the entire universe of cases.

As we began working on our story focused on immigrant workers, we faced another challenge : How to quantify the number of immigrant workers and the industries that have been cited the most for cheating those workers. The WHD does not document immigration status when conducting investigations, so we used American Community Survey data to calculate the proportion of foreign workers by industry. And because the two sets of data use different industry codes, we relied on a crosswalk to match them. By combining the data sets, we were able to calculate the rate of wage theft cases per 100,000 workers for 95 industries along with the proportion of foreign-born workers in each industry.

For our reporting on the vulnerability of guest workers to wage theft, the last story in our series, we again turned to the WHD data. We wanted to highlight some of the specific obligations employers have to guest workers when it comes to issues such as reimbursing visa and transportation costs to and from their U.S. worksites to the workers’ homes outside the United States. The data contains more than 1,000 violations, with more than 250 violations related to guest workers who use H-2A or H-2B visas. Because the descriptions of violations varied widely, we grouped them into broad categories such as “cost-shifting” and “housing” to provide meaningful analysis to readers.

Interested in using this data for your own reporting? You can find all the data along with the code we used to produce our analyses on GitHub. Journalists, researchers and anyone else with questions about how to use the data can contact Public Integrity data reporter Joe Yerardi at jyerardi@publicintegrity.org.


Help support this work

Public Integrity doesn’t have paywalls and doesn’t accept advertising so that our investigative reporting can have the widest possible impact on addressing inequality in the U.S. Our work is possible thanks to support from people like you. 

Joe Yerardi

Joe Yerardi is a data reporter at the Center for Public Integrity, reporting on a broad range of topics....