In today’s column, I will showcase an intriguing and disturbing facet of OpenAI’s latest ChatGPT-like model known as o1. Even if you aren’t actively using o1, the issue has notable repercussions concerning all generative AI apps and large language models or LLMs all told.

The problem has to do with so-called AI hallucinations popping up in the most unusual or unexpected of places. Hint: I’ll be telling you quite a bit about chain-of-thought reasoning.

Let’s heartily discuss the matter.

In case you need some comprehensive background about o1, take a look at my overall assessment in my Forbes column (see the link here). I subsequently posted a series of pinpoint analyses covering exceptional features, such as a new capability encompassing automatic double-checking to produce more reliable results (see the link here).

Unpacking AI Hallucinations And Their Occurrences

First, a quick overview of AI hallucinations is worthwhile to set the table.

I am not a fan of the “AI hallucinations” expression since it implies that today’s AI hallucinates and otherwise suggests a misleading and wrongful anthropomorphizing of generative AI. Lamentedly, the catchphrase has caught on and we are stuck with it. AI is not sentient, and we need to be mindfully careful to imply otherwise.

The gist of the terminology is that at times the AI can mathematically and computationally make-up content that is woefully not properly grounded and has no basis in fact. For example, an AI-generated essay about Abraham Lincoln might contain a portion that says he flew jet planes all across the country to give his memorable stump speeches. You could rightfully indicate that this is an error by the AI, which some prefer to say is a confabulation versus being referred to as a hallucination.

Almost everyone who uses generative AI has likely heard of AI hallucinations and possibly experienced a few from time to time. Sometimes they are obvious. The Lincoln example would seem to be pretty obvious. Other times the AI hallucinations can be insidiously believable, and someone might not realize they are being tricked by the AI.

Chain-Of-Thought Explained And Wisely Used

Shifting gears, I want to introduce an aspect of generative AI that is less well-known but gaining awareness. I am referring to chain-of-thought reasoning, also known as CoT.

Here’s what CoT is about. When using conventional generative AI, there is research that urges the use of a chain-of-thought processing approach to potentially achieve greater results from AI (see my detailed explanation at the link here). A user can tell the AI to proceed on a step-at-a-time basis, considered a chain of thoughts akin to how humans seem to think things through (be cautious again in overstating this capacity about AI). Using CoT in AI seems to drive generative AI toward being more systematic and not rushing to derive a response. Another advantage is that you can then see the steps that were undertaken and decide for yourself by inspection whether the AI seemed to be logically consistent.

OpenAI’s latest model o1 takes this to an interesting extreme.

The AI maker has opted to always force o1 to undertake a chain-of-thought approach. The user cannot turn it off, nor sway the AI from doing a CoT. The upside is that o1 seems to do better on certain classes of questions, especially in the sciences, mathematics, and programming or coding tasks. A downside is that the extra effort means that users pay more and must wait longer to see the generated results.

Chain-Of-Thought Hidden Versus Seen

We are getting closer to the issue at hand, hang in there.

For o1, the chain-of-thought that is displayed to you is not the actual under-the-hood chain-of-thought that actually arose during the processing of your prompt. OpenAI has said that for various reasons including that the hidden CoT might reveal the proprietary secret sauce of o1, they won’t let you see it.

So, instead, the AI concocts a special displayable chain-of-thought that is presumably a summarized or recast version of the true chain-of-thought. Is the displayed version a reasonable variation of the underlying and hidden chain-of-thought? We have no means of knowing. Without getting access to the hidden chain-of-thought, there is no workable means to gauge how close or how far off the displayed version is.

The key here is that few people seem to realize that in o1 there is a raw or hidden chain-of-thought and there is a kind of faked displayable version that you see.

Please keep that in mind.

AI Hallucinations Found In Unlikely Places

We usually expect that AI hallucinations might occur in everyday content produced by generative AI but would be somewhat taken aback if an AI hallucination occurs in the chain-of-thought.

The thing is, since o1 is only displaying a contrived chain-of-thought, this means that there is a chance now for AI hallucinations to appear in the visible chain-of-thought. The assumption is that since the displayed version is AI-generated, akin to generating everyday content like essays, there is now ample opportunity for AI hallucinations to be snuck into the visible CoT.

AI hallucinations can arise in these three circumstances:

  • (1) Within AI-generated content such as essays. We already know, assume, and generally anticipate this might occur from time to time.
  • (2) Within displayed AI chain-of-thought for o1. People hadn’t expected AI hallucinations to happen in a conventional chain-of-thought, but it does seem to occur for displayable chain-of-thought in o1.
  • (3) Within the AI chain-of-thought in o1 that is hidden from view. Does the chain-of-thought that is hidden in o1 possibly contain AI hallucinations? It is hard to know from the outside. Maybe yes, maybe no.

Let’s chat about proof of the pudding.

Various social media postings have showcased AI hallucinations appearing within the displayed chain-of-thought while using o1. How often does this occur? There aren’t any statistics yet on this phenomenon. Perhaps it rarely happens. One can certainly hope so. On the other hand, it could be happening quite a bit, but people aren’t noticing, or don’t care, or know but aren’t sure whether to come forth about it.

Quick Example Of AI Hallucination Inside Chain-Of-Thought

Suppose I log into a generative AI app and ask a question about how I can best get from San Francisco to New York City. I will keep this simple so that I can bring attention to the matters at hand.

Let’s see what happens without invoking chain-of-thought.

  • My entered prompt: “What is the best way to get from San Francisco to New York City?”
  • AI-generated response: “Fly there.”

Okay, that answer makes sense. The outcome seems reasonable, though we only see the generated result and do not know what steps the AI took to arrive at the answer.

I will do the same prompt, and this time enable a chain-of-thought feature so we can see what steps are being undertaken. This is a displayable chain-of-thought. Let’s also pretend that an AI hallucination exists in the chain-of-thought (I’ll bold it for your ease of recognition).

  • My entered prompt: “What is the best way to get from San Francisco to New York City?”
  • Generative AI chain-of-thought enabled.
  • Step 1: Assume that speed of travel is key.
  • Step 2: Airline travel would be the fastest method.
  • Step 3: But airplanes don’t yet exist and only horse-and-buggy are available.
  • Step 4: Airline tickets are often available at low prices so the trip can be affordable.
  • Step 5: Conclusion that flying is the best option.
  • Step 6: Display the answer.
  • AI-generated response: “Fly there.”

You can see that six steps took place.

The steps seem to be relatively logical, except for the zany step 3. Step 3 is an AI hallucination. The generative AI concocted the notion that airplanes do not yet exist and that a horse-and-buggy would be the way to proceed.

We were lucky that the result of flying there was the response. Imagine if step 3 completely confounded the answer. The response might have been to take a horse and buggy. Yes, you could still probably do so, but I doubt most of us have that option in mind.

Debating About The AI Hallucinations Of o1

One viewpoint is that having an AI hallucination in the displayed chain-of-thought is not that big a deal.

Why so?

First, we don’t know how many people look at the displayed chain-of-thought of o1 anyway, and thus, if it does have something fake perhaps few people will be fooled.

Second, if the AI hallucination is exclusively in the displayable chain-of-thought and not within the hidden version, this is somewhat good news that the raw chain-of-thought presumably is not contaminated. The raw chain-of-thought would seemingly be untouched and produce a viable result or response.

Again, we don’t know what the hidden chain-of-thought contains, therefore, we must keep our fingers crossed that the underlying chain-of-thought has not also gone awry. If it does contain AI hallucinations, we should be notably suspect of the generated response. I am expecting that some clever and enthusiastic enterprising AI empiricists will perform experiments to try and assess whether the hidden chain-of-thought might in fact contain AI hallucinations. I’ll let you know.

A twist about this might make your head spin.

Suppose that the hidden chain-of-thought does at times contain AI hallucinations. We don’t know if this will happen but go with me on a thought experiment. The common assumption would be that the AI hallucination in the hidden chain-of-thought will be shown in the displayable chain-of-thought. That seems to make abundant sense.

Sorry, there isn’t any kind of guarantee about that brazen assumption.

It could be that sometimes the hidden chain-of-thought that has an AI hallucination gets cleaned up when the displayable chain-of-thought is prepared by the AI. In that case, the displayed version belies the truth that an underlying AI hallucination arose. Another disconcerting possibility is that during the process of converting or summarizing the hidden chain-of-thought the AI hallucination gets transformed. The displayable version might change the location or sequence of where the AI hallucination occurred or might alter what the AI hallucination actually was.

Do not assume that any AI hallucinations in the raw chain-of-thought, if any, will be transparently carried over and shown accordingly in the displayable chain-of-thought. We don’t know that any kind of ironclad 1-for-1 is at play.

Your Crucial Takeaways

Some final big-picture points for you.

The odds are that other AI makers are going to modify or enhance their generative AI to similarly force a hidden chain-of-thought to automatically be undertaken. They almost must do so to remain competitive. You see, it seems that a forced use of a chain-of-thought can produce better results.

One way or another, this ups the ante for everyone.

A monumental difference will be whether other AI makers choose to make the raw chain-of-thought visible. A true open-source generative AI would presumably do so. Some proprietary AI makers might be willing to take a chance and possibly reveal their secret sauce. Kudos to them.

If the raw chain-of-thought is shown, you can readily inspect it to detect any AI hallucinations that might have occurred. The idea of a summary version of the hidden chain-of-thought could still be carried forward, making life easier for users to inspect the condensed version rather than a lengthier raw version.

Right now, for those that opt to use o1, make sure to review the displayable chain-of-thought and find any AI hallucinations. Of course, you won’t know for sure if the hidden chain-of-thought suffers accordingly. Sad face. Probably the best bet would be to run the prompt again if you find an AI hallucination in the displayed chain-of-thought. Maybe just take on faith and belief that the hidden chain-of-thought is free and clear whenever the displayed version is (a large leap of faith).

May your days be free of AI hallucinations, including wherever and whenever they might arise.

Share.

Leave A Reply

Exit mobile version