As businesses of all kinds are integrating automation tools into coding, into writing, and yes, into transactions, there’s some amount of concern about accuracy and oversight. I say “some amount of concern,” but I think it’s safe to say it’s a rather large amount.
In finance and elsewhere, LLMs, chatbots and various kinds of neural nets do make mistakes. But why? Aren’t they computers? How can they make mistakes? And what happens when they make mistakes in the real world?
Answering the second question first, I found this article by Asha Kiran Kumar at Analytics Insight, where the writer supplies these basic points on AI mistakes and the consequences. I’m including these verbatim because they are so good:
- When machine-driven decisions cause harm, blame moves through the chain of developers, companies, managers, and leaders who shaped and released the system.
- Governments are building strict rules for high-risk uses, demanding transparency, oversight, and clear documentation to prove systems were deployed responsibly.
- From insurers to executives, everyone involved in building or using these systems will carry a piece of the accountability as autonomy grows and risks rise.
To recap:
Any process is like an organism, with its own touch points. Regulators are trying to stay ahead of AI problems. And those who use AI must wrestle with the liabilities.
Thoughts from a Financial AI Panel in Boston
In the Imagination in Action conference in Boston this past April, I heard Nina Gregory, formerly of NPR, interview a panel on financial AI, and what it means when models don’t work perfectly in this sector. Nina managed many of the great conversations that we had at IIA, an event that I help to put on at MIT each year.
For this panel, we had Celestino Amore of IlliquidX.AI, Miquel Noguer of the Artificial Intelligence Finance Institute, and Brian Peltonen of Parcosm AI, a company working on different kinds of structured data solutions.
Cybersecurity Advice
Before talking about AI mistakes specifically, the first piece of this talk was around cybersecurity. I wanted to highlight part of what each participant said.
“We hired the top guys out there that make sure that the system is contained,” Amore said, citing the importance of proprietary control of systems. “We have our own data center. We have our own control of the LLM. We have our own control of the data where it’s possible, and we only go out (sic) where it is an exchange or a trading platform for a particular type of function. But the majority of the data and the LLMs are built and used inside ….we use multiple instruments, not just only one, and it’s supervised by this big infrastructure of cybersecurity.”
Noguer promoted the value of gaming out scenarios.
“I think that the computer science community is launching a kind of a little bit of a dangerous message in the sense that ‘all that matters is to get the agents going, then we’ll see what happens,’” he said. “Whereas what you need to do is, you need to pick all the tasks and game-theory them.”
Peltonen had a concern about agents exploring environments where they can essentially make security mistakes, which leads into the rest of the conversation about handling LLMs and their liabilities in finance.
“My brother once said something about hackers: that they’re like water,” he said. “If there’s a crack, they’ll find it. And I think with agents, there’s something similar: that in trying to be helpful, they will find those cracks. So if you don’t have a really structured environment, making sure that they have a walled-in playground, they have the ability to find places where somebody, somewhere, left their database credentials in a file that they find. … if you don’t have a very structured environment where these things are placed in walled environments, they can find these cracks. And it’s not anything devious; they’re trying to be helpful.”
That’s in addition to problems like data drift, bias in training data, black swan events, and other things that the panelists touched on in pondering the challenges of instituting these systems in a highly regulated industry.
Picking Out Key Things
Peltonen also made a pitch for the importance of data accuracy. He and Gregory went over the idea of AI needing to bring points that correspond to what human analysts would say in a given situation, such as after reading a financial report, and trying to boil it down to the essentials.
“We would need attribution,” Gregory said.
“Not only attribution, but just finding, like, is this similar to what your team of analysts would say?” Peltonen replied. “Is this similar to what other experts would say? It’s a bunch of stuff. Maybe that stuff was in the documents, but are these the important things?”
Amore weighed in, noting the role of regulation.
“The regulated financial environment puts a lot of guardrails in this environment,” he said. “So on top of the control of the data, control of the various programs, how they are built, they want to know how precise these agents are. They don’t want them to go around by themselves doing whatever they want to do. They ask specifically: where does the human have the final say in any final decision?”
People and Machines
Noguer had a very provocative take on AI work and its quality, contrasting it to humans who come to work drunk.
“It’s like a human,” he said of AI. “You need to ensure that it doesn’t come drunk to work.”
He explained.
“I think we’re misrepresenting the human abilities, in the sense that, remember that in the past we had ‘four eyes,’ right? There’s a very important task, so somebody performs a task, somebody else approves. There’s obvious risk that the second guy just clicks and says, ‘Okay, I’m going to trust him, he’s always done a good job.’ You can have AI doing the same thing: being lazy, reading stuff, and just giving you a kind of a boring summary of a 10-K or a 10-Q, because you haven’t worked enough.”
Read the Check
Noguer also gave a compelling specific example of a simple error that could have enormous consequences every time it happens.
“You did an OCR on a banking check, and it was wrong,” he said, of a composed scenario that shows why the ‘four eyes’ are critical. “The OCR told you that this is 10 million, and it was 10,000—you see, it happens all the time. Really, OCR is not perfect.”
AI’s Utility
I want to also highlight this part of the conversation near the end, where Gregory got an idea from each panelist about why AI is so useful in this field, despite all of the caveats that hey had presented.
“You can produce things very quickly,” Peltonen said of AI’s acceleration, while also noting that it’s important to get things right. “The gap between idea and realization is very small. Things that would have required large teams, long periods of time, careful planning, can now be developed with simple prompts. “There are some things that you can do in like 15 to 30 minutes, where you can really build out something that would have taken many months of human labor.”
Amore cited an example:
“For an old system, or in an old fashion, it used to take two years and a bunch of programmers to connect a financial platform to SWIFT—to the SWIFT engine, which is a system of payments but also settlement for any securities, clearing securities. Now with AI, it took a team of few guys and six months, and a fraction of a multi-billion-dollar budget.”
Noguer’s example was more geopolitical, and points to the greater context of AI in finance:
“I don’t think there’s a single portfolio manager in the planet that hasn’t asked Claude about the Iran war, when it’s going to end, and the implications in oil and so on,” he said. “So everybody, all the portfolio managers, are using Claude or ChatGPT.”
I thought about this. Indeed, finance takes place in a world beset by change. So part of navigating financial markets is reading those tea leaves that have to do with more than just pricing. In other words, what’s happening in the world is crucial to financial work. Now, we have Kalshi, where people can actually place trades on events, even fairly quotidian ones – but maybe we had a form of Kalshi all along.
What we have to figure out, in finance, and in work, and in life, is what AI will be responsible for – and what we will continue to be responsible for as humans. Stay tuned.











