Major technological revolutions often prompt doubts about whether existing economic measures can keep up. Artificial intelligence is no exception. But in the case of AI, the anxiety is misplaced.
A wave of commentary now argues that AI is generating enormous economic value that official statistics cannot see. Erik Brynjolfsson of Stanford University and coauthors have proposed supplementary national accounts—branded “GDP-B”—to capture the consumer surplus created by free digital goods that conventional GDP ignores. A recent Peterson Institute analysis estimates that quality-adjusted AI output has grown by more than 2,000 percent per year. An essay from the semiconductor research firm SemiAnalysis argues that AI is generating an industrial-revolution-scale transformation, resulting in “dark output” existing national accounts are structurally incapable of capturing.
These arguments, while differing in origin and intent, share a common diagnostic error. They treat the problem as one of GDP when the underlying issue lies in the construction of price indices. This distinction matters because the appropriate remedy depends on the problem being solved. Attempting to fix measurement by expanding or redefining GDP is not only unnecessary, it is counterproductive. It corrupts a stable and well-understood measure of a nation’s income in pursuit of impacts that, properly understood, are not fundamentally GDP phenomena.
Price Indices Are Where the Trouble Lies
Gross domestic product is the market value of final goods and services produced in the economy during a given period. Nominal GDP records those values in the current year’s dollars. Real GDP adjusts nominal GDP using a price index to remove the effects of inflation, leaving a measure intended to reflect changes in the quantity and quality of output produced. That adjustment depends not only on how statisticians account for price changes, but also for changes in the quality of goods and services. The quality adjustment problem does not reside in nominal GDP, which simply asks what was spent, earned, or produced in current dollars, but rather in the deflator applied to convert those nominal figures into real terms.
The Boskin Commission drew attention to this issue in the 1990s, when it identified several sources of upward bias in the Consumer Price Index. There is substitution bias (consumers switch to cheaper alternatives), outlet bias (shoppers move to discount retailers), new goods bias (new products are late to enter the index), and quality change bias (improvements in products are improperly treated as price increases).
Statistical agencies have spent the better part of three decades trying to address these biases through methods such as hedonic regressions (which try to isolate price changes attributable to changes in product characteristics), as well as through the use of scanner data from retail transactions and other innovations in price measurement. The effort has been substantial. The quality problem in price indices, however, has not been resolved, and with AI it may be about to become dramatically worse.
The reason is simple. Price indices work better when quality changes only gradually or in an observable way. A faster processor has a speed you can compare. But a large language model does not lend itself to clean unit measurement. Its value depends on a combination of its performance on various benchmarks, reliability, latency, context length, integration into business systems, and the skill of the operator. The apparent unit of output—a token—is not a stable unit of economic value.
As SemiAnalysis observes, “a million tokens can produce junk, a useful email summary, a legal document, or a decision that changes a company’s strategy. The economic value depends on the output, not the token count.” However, this is a price index problem, not a GDP problem. The nominal transaction is always visible, whether as spending on token use, software subscriptions, cloud services, or capital equipment; determining how its real content evolves over time is where the economic machinery breaks down.
Falling Prices Can Mean Falling Incomes
One of the most important dimensions of AI’s economic impact may involve what happens to measured output when technology causes prices to fall. Falling prices are usually celebrated, and for consumers they are often an unambiguous gain. Generating more output per dollar also means the economy is getting more productive. That is all true. But rapid price declines carry a distinctive risk that this typical framing obscures. Total income can fall even as productivity and output rises.
If AI raises productivity in service sectors while driving prices toward zero, the income that previously supported workers in those sectors can collapse. The result is a form of growth that may leave large segments of the workforce worse off even as the economy becomes more productive. If price declines outpace output gains, total revenues in affected industries must fall, reducing the income available to workers and other producers despite higher levels of production. The same logic applies when AI collapses the price of services that American workers produce.
Critically, however, nominal GDP correctly accounts for this outcome. If drafting a legal document recently cost $150 and now costs $0.50 in AI tokens, and if the lawyer who previously drafted that document is no longer employed in the same capacity, then market income in the legal services sector falls. Nominal GDP records that fall accurately. It is precisely not a deficiency of GDP that this shows up as declining income. It is GDP functioning as intended. One would emphatically not want to “control” for these price declines in the national accounts, as if the falling cost of legal services were a measurement error rather than an income loss for the people who used to provide them.
In the SemiAnalysis essay, the authors call this “substitution dark output.” The idea is that a lawyer’s invoice, previously used to measure the value of the final output, may be replaced by a few cents of AI spending even though the output is the same. They estimate roughly $1.5 trillion of exposure in sectors where AI could plausibly take over tasks. But that substitution is not automatically invisible to GDP; its measured effect depends on what people do with the extra money in their pockets when the saving is passed on to customers. Even so, if token spend is the only transactional trace under the new arrangement, then it is also the only new income generated and that is precisely what GDP should record.
Keep Welfare Economics Distinct From GDP
Brynjolfsson and coauthors have done useful work documenting the consumer surplus generated by free digital goods. Their proposed GDP-B measure attempts to capture the consumer surplus from services that conventional GDP does not fully capture, even after accounting for the spending that lower prices may free up elsewhere in the economy. This includes value from free or low-cost digital goods such as internet search, social media, and AI assistants.
One way to estimate that surplus is to ask how much people would have to be paid to give up access to those services. The problem here lies less in the exercise than in the branding. Calling the resulting measure “GDP-B” implies a proximity to GDP that is methodologically misleading. GDP is a measure of market income and production. The consumer surplus generated by a free AI assistant may be a useful welfare measure, but the portion not redirected into market activity is not income. It is not something households can spend, save, or invest. An official metric that incorporates psychological surplus alongside market transactions has ceased to function as a national income account.
Alternative welfare measures that are clearly labeled, methodologically distinct, and quarantined from core national accounts are entirely appropriate. The BEA has already experimented with digital economy satellite accounts, and researchers can continue to build parallel dashboards. But experimental welfare measures should not migrate into the core GDP statistics. Indeed, one could argue that too much imputation already goes on, as with owner’s equivalent rent, which treats homeowners as if they rent their homes to themselves.
An adjustment that raises measured GDP growth by even a few tenths of a percentage point in one year can shape public understanding of economic growth, inform tax and transfer policy, and influence monetary policy decisions. The burden of proof for such adjustments should be extremely high.
Reduce the Stakes of Measurement Error
Satellite accounts branded “GDP-B,” measures that blur the line between income and surplus, and quality adjustments that cannot be verified by what people actually pay in markets all invite political pressure on a statistical system whose legitimacy depends on perceived objectivity. Once GDP becomes a receptacle for values other than observed market income, it becomes difficult to explain why some unpriced benefits should be included and others should not. Subjective quality improvements, estimated consumer surplus, environmental benefits, equity goals, or other politically favored priorities can all claim admission. The number then becomes harder to interpret and easier to manipulate.
The most honest conclusion may also be the most sobering one. Price indices are imperfect instruments, and their imperfections cannot be eliminated through methodological ingenuity alone. The solution is a more humble acknowledgment of what price indices can and cannot do, combined with policy institutions that reduce the stakes of getting measurement wrong.
The most durable path to better measurement is a stable monetary unit. Fiscal discipline and monetary institutions that preserve the long-run stability of the dollar can reduce some of the burden placed on price indices by limiting large swings in the value of money. A government that maintains price stability narrows the gap between nominal and real magnitudes, reducing one source of measurement uncertainty.
Even without such policy improvements, and whatever happens with price indices, the nation needs to track its income, just as any responsible household or business would. Otherwise, its debts and obligations have no clear reference point. Imperfect measurement is not an argument for embracing ignorance.
AI will eventually prove its economic worth—or it will not. If welfare is improving in ways that generate market activity and income, those effects will eventually be reflected in conventional GDP. To the extent welfare gains remain outside the sphere of market transactions, they may be worth measuring separately, but they should not be relabeled as national income. On the other hand, if what AI generates is primarily more slop, busy work, and token spend without corresponding income gains, official statistics should not be inflated to flatter the technology.
Likewise, welfare economists can keep measuring consumer surplus from digital as well as other types of goods. These are valuable research programs. But they should remain clearly labeled as what they are: supplementary analyses of welfare, not revisions to the core national income and product accounts.
The integrity of a national income measure is worth protecting. It is the shared language in which households, firms, investors, and governments account for the economy they all inhabit. Once that language is diluted by objectives other than measuring income and production, economic debates become less coherent and public decision-making less reliable.










