Focus
August 22, 2025 | 13:40
U.S. Data Quality in the Firing Line
U.S. Data Quality in the Firing LineRecent hefty revisions to payrolls and news that, owing to shortages of data collectors, the CPI could be becoming less accurate are casting doubt on the quality of critical U.S. economic data. This is a serious but fixable problem. |
|
|
The U.S. employment report for July revealed large downward revisions to payrolls for the previous two months. May’s growth was revised down by 125k to 19k and June’s was reduced by 133k to 14k. Moreover, with July payrolls reported at just +73k, this was now the weakest three-month tally since the first few months of the pandemic and, before then, the Great Recession. The narrative on the labour market had gone from ‘slowing but still sturdy’ to ‘sluggish’ in one fell swoop. This eroded the Administration’s confidence in the head of the Bureau of Labor Statistics (BLS), Erika McEntarfer, who was fired, with E. J. Antoni announced as her replacement (pending Senate approval). The monthly employment data are based on the Current Employment Statistics (CES) survey, a.k.a. the establishment survey. The CES surveys roughly 631k establishments each month to produce granular estimates of nonfarm employment, hours worked, and earnings of workers on payrolls. It forms the focal point of the monthly Employment Situation report issued by the BLS alongside the Current Population Survey (CPS), a.k.a. the household survey. The CPS is used to produce estimates of labour force status by demographic characteristics, with the unemployment rate being the most closely watched metric. As the timeliest official major U.S. macroeconomic indicator (it often precedes the CPI by at least one week), it is the most anticipated release by market watchers and policymakers. And, for that reason, it has historically caused the most volatility in markets. Hence, any potential disruption to the current data offerings or erosion of trust in the official employment numbers would have significant ramifications for markets. Accordingly, the previous heads of the BLS have maintained a strict nonpartisan ethos to avoid any inkling that an administration was attempting to ‘cook the books’ to make the numbers look more favourable. |
The process for putting together the payrolls data involves revisions over the course of three consecutive months as the collection rates from the CES typically climb from around two-thirds of the sample to over 90% (Chart 1). Additionally, when the January figures are released each year, the CES data are revised again to align with employment levels captured in the Quarterly Census of Employment and Wages (QCEW), which is put together from state and federal unemployment insurance data and covers over 95% of U.S. jobs. While the recent revisions to the payroll data can be jarring, it’s important to place them in the appropriate historical context. Revisions to macroeconomic indicators are common and sometimes large. The 133k revision to June was not outsized compared to the standard error on the month-over-month change in employment from the CES for any given month, which is about 83k. |
|
There are two important contributing factors to the run of larger-than-expected downward revisions to payrolls over the past few years. First, establishment survey responses are increasingly received late. Specifically, in recent months, businesses have probably had more pressing issues (e.g., tariffs) than filling out voluntary surveys in a timely fashion. So, by the time more CES survey responses arrive by the second and third estimates, the story can change markedly. The second issue stems from the process that the BLS uses to estimate the net additions to payrolls arising from new business formations. This is referred to as the net business birth/death adjustment. Each year, the BLS models how many new jobs it expects to be added based on the difference between entering and exiting businesses (new firms aren’t directly captured in the CES). Recently, those adjustments have been exceeding the trend in establishments based on the QCEW data (Chart 2). That makes further downward revisions more likely than in periods where the birth/death model was better aligned to the QCEW. |
Concerns over BLS data, specifically the Consumer Price Index, had been brewing even before the payroll revision controversy. A shortage of CPI data collectors emerged in the wake of the federal government’s hiring freeze that went into effect on January 20, 2025. Apart from normal attrition, it’s unclear how many separations arose from the efforts of the Department of Government Efficiency (DOGE). To adjust for the labour constraints, the BLS began collecting less data in some local areas and no data in a few other areas, and, thus, relying on less accurate imputation methods to ‘fill in’ the missing data. When a specific price is not available, the price change is imputed via one of three methods. ‘Home cell’ imputation employs the average price from other stores in the same area. When this is not available, ‘different cell’ imputation employs the average price from other stores in a broader region. And when even this is not available, ‘carry-forward’ imputation uses the same price as the previous month. As you move from a direct price quote and sequentially through the three methods, the accuracy of the data suffers. The BLS does not publish how many prices were imputed. But, since 2019, it has published the shares of the methods employed among all imputations. Typically, ‘home cell’ is about 90%, ‘different cell’ is 10%, and ‘carry-forward’ is 0%. Beginning with the March CPI data, the share of ‘different cell’ imputation began to rise, hitting 15% (the previous high was 16% in April 2020 at the onset of the pandemic). It nearly doubled to 29% for April and hit 35% for the June data (Chart 3). It was 32% in July. The BLS has said that this shift should have “minimal impact on the overall inflation rate” but market participants are a little skeptical. It doesn’t help that the consensus forecast for the monthly move in the core CPI has come up short in four of the past five periods, including by as much as 0.2 ppts for March and May. |
|
Last month, Reuters conducted a survey of 100 leading policy experts including Nobel laureates, former policymakers, academics from top U.S. universities and economists from major banks, consultancies and think tanks. The survey showed that 41% were ‘very concerned’ about the quality of data, with 48% being ‘slightly concerned’. Some 71% felt that U.S. authorities weren’t treating the issue with sufficient urgency and 63% judged that agencies don’t have adequate resources to produce high-quality data. With social security benefits and several other income benchmarks indexed to CPI, getting it ‘right’ matters to the broader economy. |
Stoking these concerns was also the termination of the Federal Economic Statistics Advisory Committee (FESAC), effective February 28, 2025. The FESAC had advised the Bureau of Economic Analysis (BEA), the Census Bureau and the BLS on “statistical methodology and other technical matters related to the collection, tabulation, and analysis of federal economic statistics”. Even Fed Chair Powell weighed in. During recent congressional testimony, he said: “I wouldn’t say that I’m concerned about the data today, although there has been a very mild degradation of the scope of the surveys… But I would say the direction of travel is something I’m concerned about.” Indeed, the response rates for several household and establishment surveys have fallen significantly since 2015 (Chart 4) [1]. |
|
There are ways to turn the ‘direction of travel’ around and improve data quality. Ending the hiring freeze for, and rehabilitating the budgets of the statistical agencies would be a quick fix. The adage ‘you get what you pay for’ can apply, partly, to data quality. In real terms, the BLS’ budget hasn’t risen in over two decades despite immense growth in the labour market and consumer product landscape over the same period. Another is to make survey responses mandatory. The Census Bureau’s American Community Survey (ACS) and the decennial census are both mandatory at the federal level, but the CES is not. (Note that some states make CES survey responses mandatory.) Employing more technology and big data are also ways to turn things around. For the CPI, the BLS is already using big data, with ‘one firm’ (the name is not published owing to confidentiality) providing the BLS “with a large volume of price data” on apparel and household goods instead of them having in-house data collectors gather them via store visits. However, with about 100,000 price quotes collected per month on commodities and services, approximately two-thirds is still collected by personal visits. The BLS was already looking at expanding its ‘alternative data’ (anything not collected in person), such as corporate-supplied data, secondary source data (third-party datasets), along with web scraping and establishment-provided application programming interfaces (APIs). We reckon these efforts have been given extra impetus. Bottom Line: Recent revisions aside, data from the BLS remain the gold standard for producing market-relevant macroeconomic data. Equipping the agency with the necessary tools and resources to continue to produce high-quality data should be a priority for the current and future administrations. [1] For the CES, the response rate is measured as the number of establishments responding to the survey compared to the number of eligible establishments based on the universe of UI-covered businesses (including those that no longer respond). Collection rates for the CES, however, exclude establishments from the denominator if they are nonresponders. [^] |