When the government wants to know how many people are unemployed, it calls people and asks them whether they’re working. When it wants to know how quickly prices are rising, it sends researchers to stores to check price tags. And when it wants to know how much consumers are spending, it mails forms to thousands of retailers asking about their sales.
“Big data” may have revolutionized industries from advertising to transportation, but many of our most vital economic statistics are still based on methods that are decidedly, well, small.
Now economists both inside and outside government are trying to change that. They are working to open up access to government records, make better use of private-sector data and use modern statistical techniques to link together different sources — steps, they believe, that could allow for economic statistics to become more accurate, more detailed and perhaps available more quickly. Ultimately, they hope to allow economists to tackle questions that aren’t answerable using currently available data sources — how government programs affect participants years or even decades down the line, for example. President Obama’s latest budget, released last month, dedicates an entire chapter to proposals to expand access to so-called administrative data, records collected as part of government programs rather than through surveys.
But it won’t be easy. Efforts to change the way the government collects statistics face legal, bureaucratic and practical hurdles and in some cases could run afoul of privacy advocates worried about how the government tracks its citizens. Despite bipartisan support for change, actual progress has been slow.
“It’s just taking a lot longer than anyone wanted,” said Hal Varian, chief economist for Google and an outspoken advocate for the expanded use of modern data-collection approaches.
One example of the shortcomings of the current system — and the potential for improvement — is the monthly jobs report. Every month, investors, economists and journalists (myself included) race to digest the Bureau of Labor Statistics’ count of new jobs. But the report is highly imperfect. Its numbers, based on a survey of businesses, are volatile and subject to revision. It provides little information about what kinds of jobs are being created or how much they pay. The only demographic information available is based on an entirely separate survey with even larger margins of error.
The government has much more complete information on almost all those subjects. State unemployment systems collect detailed information on employment and wages. The Internal Revenue Service has extensive information about virtually every business in the country, and the Social Security Administration tracks nearly all workers. There’s even the National Directory of New Hires, a little-known database created to track child-support delinquents.
Statisticians at the Bureau of Labor Statistics don’t have access to any of those sources for the monthly jobs report, however. IRS data is off-limits under federal law. Other sources are unavailable either because the agencies that control them won’t share them or because the BLS doesn’t have the resources to turn them into a usable form.
The problem goes far beyond the jobs report. Most of the government’s most closely watched economic indicators are based on surveys, from monthly reports on construction, retail sales and inflation to annual reports on household income and consumer spending. In nearly every case, more complete data exists from either public or private sources, if only government agencies could get access to it.
The access limitations are at least partly the legacy of the last time that the government tried to expand its collection and use of administrative data. In 1965, the Johnson administration proposed the creation of a national data center in part to track the performance of Johnson’s Great Society initiatives.
Chief Analytics Officer Europe
15% off with code 7WDCAO17
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017