Government economists are starting to think of ways to more specifically and comprehensively measure the production of artificial intelligence, prompting consideration of a new task force within the Federal Economic Statistics Advisory Committee.
“Where we could go from here, in terms of questions that were asked of FESAC members,” Rajshree Agarwal said, is to form “potentially a joint task force that carefully addresses … conceptual and measurement issues … maybe leverage AI itself in creating new data sources and the use of external data sets beyond what is being identified.”
Agarwal is an economics professor and faculty director at the University of Maryland and a member of FESAC which collectively advises the Bureau of Economic Analysis, the Bureau of Labor and the Census Bureau. The group discussed challenges in measuring the production of AI in the U.S. economy during a Dec. 13 meeting where Agarwal was responding to presentations by BEA research economist Tina Highfill, BLS economist and program manager Michael Wolf, and Emin Dinlersoz, the Census Bureau’s principal economist.
“My concern, in spite of all of these really commendable efforts at using existing data to capture changing industry and occupational structure,” Agarwal said, is that, given the transformational nature of the technology, the presented “statistics likely mask underlying changes in tasks within occupations and complementarities between humans and technology.”
The question of how to accurately measure the value of AI is expected to come into greater focus over the next year as proponents of a hands-off approach to regulating the technology continue to cite projected benefits to the economy in their appeals to policymakers amid observations of a potentially looming investment bubble.
“Given the rapid development of the technology and massive capital outlays, developers of these technologies will be under pressure to define and verify their presumed benefits,” Nigam Shah, the chief data scientist for Stanford Health Care, wrote in “Predictions for AI in 2025,” for the university’s institute for Human Centered AI.
“Shared and transparent benchmarking … will become mainstream, so that informed decisions can be made about the claimed benefits of using generative AI … we will have to devise ways of thinking that go beyond a narrow efficiency or productivity lens as we currently do,” he wrote, linking to research on practical applications for AI in the field.
But Highfill, the BEA researcher, said in addition to researching use cases for AI, it’s also important to more deliberately measure AI productivity, particularly since the epic levels of investment in resources and capital for AI are coming from both the private sector and the government.
“We also have an additional motivation to measure AI production because of some recent federal government actions to encourage domestic production of AI,” she said, noting the 2022 Chips Act and President Biden’s 2023 executive order on AI. “[In] both of these, the idea is to really encourage AI production in the United States. So these are just two reasons to think about why we want to measure production of AI, not just think about the uses of it.”
She added, “if we want to have a comprehensive measure of the overall impact of AI, we need to know not just who's using AI, but who's producing it. What industries are producing it, how many Americans are employed doing this? How have those relationships changed over time and things like that.”
BEA typically uses data collected by the Census Bureau in biweekly surveys of U.S. companies. Highfill sought feedback on how the BEA might define AI in such probes, and where else the agency might look for data sources that more specifically address AI production, as opposed to four more broadly common elements: manufacturing of chips, the publishing of software, computer and data services, and research and development.
“Given, what's going on with resources and budget cuts and things like that, how should we prioritize measuring production of AI areas of research?” she asked. “And then, if you do think that this should be a priority, that we should pursue this, how do you think AI should be defined? We went through the four common elements, but I think that there could be arguments for adding different things like perhaps construction of data centers or something like that.”
Agarwal and another FESAC member -- Avinash Collis, an assistant professor at Carnegie Mellon’s Heinz College of Information Systems and Public Policy -- both confirmed it would be valuable to explore other data sources for the agencies to measure AI production and to generally broaden their aperture when thinking about the issue.
In considering Gross Domestic Product, for example, in which capital investments are factored, Agarwal said, “We may be able to see that productivity of labor or GDP is increasing, but, in both production and use, the data creation needs to also differentiate between volume -- the number of technologies, the firms that are coming in -- versus newness of what is it that they're bringing with them, across these technological systems.”
She also added “there’s a caution of taking firms at their word on what they're planning on doing regarding employment and use of AI, not because I believe that they're lying, but because, they may not even know [the answer] -- the person that's being interviewed in the survey.”
Agarwal suggested the possibility of combing through sites like arxiv.org, a repository where academics have been sharing their research on AI, for one source of data to gauge AI production.
“Many of the top tech firms are doing patent publication papers and publishing their work there. So this may be a very good way,” she said, “because AI is so fundamentally affected by academic research.”
Providing another possibility, she said, “for jobs and occupational data, my colleagues in the Smith School of Business have actually used AI to map jobs in collaboration with firms that are tracking job data.”
Collis, the other FESAC member, identified a number of ways in which the available data sources fall short, noting reliance on developers’ projections, and inconsistency in how AI is defined.
“Compared to firms, households, seem to be adopting [AI] at a much faster pace,” he said for example, noting “I think it would be super useful to measure these things” but adding “we have to take, whatever OpenAI says their usage numbers are at face value,” because “there isn't really good data there.”
He expressed particular concern about “AI washing,” as an examination of commonly used business formation statistics data might reflect companies misrepresenting their use of AI in fundraising applications. “Oftentimes if you look at the underlying technology, there isn’t much AI about what they’re using,” he said, “so that could be a potential overestimate if you look at business formation statistics data.”
“What should count and what shouldn't count as AI?” Collis said, noting a significant difference between using Copilot, Microsoft’s AI engine, as a routine part of Microsoft Office, and using it to come up with predictions or trends with more sophisticated modeling.
Those differences in defining the technology could yield artificial disparity when surveying for uptake of the technology, as it did in a comparison of AI adoption in the U.S. and the U.K. where the latter was shown to be using AI to a much greater degree but it was based on a much broader definition.
On that note, Highfill said BEA has been collaborating with the Census Bureau and international partners on a “handbook” due out from the Organization for Economic Corporation and Development for measuring AI production.
“The idea is to come up with solid definitions that will hopefully allow for internationally comparable data, so that we can at least know that we're measuring the same thing across countries,” she said.
