In today’s column, I examine the role that semantic leakage plays in undermining generative AI and large language models (LLMs).
Here’s the deal. When processing words in a prompt, AI can mistakenly allow some of those words to influence later portions of a conversation, even though the words are not materially relevant to the chat at that later juncture. The semantic meaning of a given word can inadvertently leak into a dialogue context at the wrong time and in the wrong way.
This has a particularly untoward impact when a discussion about mental health is underway. A user might be presented with AI-generated mental health advice that was incorrectly influenced by a prior word or set of words. Insidiously, it might not be readily apparent to the user that this cross-over influence has occurred. They are led down a primrose path and are in the blind about how this occurred.
Let’s talk about it.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
AI And Mental Health
As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.
There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.
Background On AI For Mental Health
I’d like to set the stage on how generative AI and large language models (LLMs) are typically used in an ad hoc way for mental health guidance. Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 800 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here.
This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.
There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines in August of this year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.
Despite claims by AI makers that they are gradually instituting AI safeguards, there are still a lot of downside risks of the AI doing untoward acts, such as insidiously helping users in co-creating delusions that can lead to self-harm. For my follow-on analysis of details about the OpenAI lawsuit and how AI can foster delusional thinking in humans, see my analysis at the link here. As noted, I have been earnestly predicting that eventually all of the major AI makers will be taken to the woodshed for their paucity of robust AI safeguards.
Today’s generic LLMs, such as ChatGPT, Claude, Gemini, Grok, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to presumably attain similar qualities, but they are still primarily in the development and testing stages. See my coverage at the link here.
The Hidden Role Of Semantic Leakage
Shifting gears, there is an inherent characteristic of generative AI and LLMs that most people are unaware of. The phenomenon is referred to as semantic leakage.
At a technical level, semantic leakage occurs when latent associations activated by a token or phrase persist across a generative dialogue, even when those associations are not contextually warranted by the user’s intent. The model is not “hallucinating” facts, nor is it following conversational context incorrectly; instead, it is over-generalizing semantic neighbors that were weakly or accidentally activated earlier.
One such example is well-illustrated by a conversation in which a user opts to mention the color yellow. Moments later, the AI, when guessing a person’s occupation, suggests that perhaps they are a school bus driver.
Why did this specific correspondence seemingly pop out of thin air?
Because school buses are often yellow in color.
The earlier mention of yellow has leaked semantically into a later portion of the dialogue. It is almost like the old saying about humans that they sometimes cannot keep an aspect out of their minds. The yellow reference in the conversation has persisted. It then came out indirectly. A user might not have a clue as to why the AI blurted out school bus driver. I’ve made it an obvious tie-in, but in real life, the semantic leakage can be difficult to discern.
Research On Semantic Leakage
In an innovative research paper entitled “Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models” by Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith, NAACL 2025, May 15, 2025, the nature of semantic leakage was identified (excerpts):
- “In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways.”
- “When producing text, language models can draw on semantic associations with words from the input, or prompt, that are not required or expected, and sometimes even violate rules of logic or common sense.”
- “For example, when given the prompt ‘He likes yellow. He works as a’, GPT-4o generates the output ‘school bus driver’.” Here we say that the word yellow has leaked into the generation in a way that unintentionally influences the generated occupation.”
- “We also found that semantic leakage occurs in many different generation settings, including multilingual and cross-lingual ones.”
The empirical analysis demonstrated that semantic leakage occurs in a wide variety of ways. For example, the phenomena can arise even across natural languages. A word in English can later influence a response that is being given in Spanish. The design of LLMs makes this pretty easy to occur.
In a sense, don’t be surprised that semantic leakage is happening a lot of the time when you are using an LLM, including the popular ones such as OpenAI ChatGPT and GPT-5, Anthropic Claude, Google Gemini, xAI Grok, Microsoft CoPilot, etc., and you probably weren’t aware that it is occurring.
Dangers In AI And Mental Health
Much of the time, semantic leakages probably don’t cause any notable concerns. A user might not be cognizant that the leakage has happened, but the dialogue is continuing anyway, and the user assumes the AI just acted strangely for a moment or two. No big deal. Just move on.
The problem can be more acute when using AI for mental health purposes. Suppose a user is asking the AI for serious and highly personal mental health advice. The role of semantic leakage can be especially onerous.
Let’s walk through an example.
I log into a popular LLM. My first part of a lengthy discussion is that I have been keeping my apartment cold lately. I’m just having some nonchalant chitchat with the AI. The AI responds that people can prefer a cold apartment for comfort and to save on the cost of heating.
Take a look.
- My entered prompt: “I’ve been keeping my apartment pretty cold lately.”
- Generative AI response: “Got it. Some people keep their place cold for comfort, better sleep, saving on heating costs, or just personal preference.”
So far, so good.
The Emergence Of Semantic Leakage
Eventually, I altered the course of the conversation toward something else. I brought up the fact that a friend of mine had told me a sad story recently. Later, I realized that I couldn’t remember much of the story. I figured out that I had been mentally distracted when he told me the story. I was thinking about a pressing work project that had a big deadline coming up. Me bad.
Anyway, I brought up the overall circumstance to the LLM.
- My entered prompt: “My friend told me a sad story the other day, but I probably didn’t pay as close attention as I should have.”
- Generative AI response: “You might be someone who experiences emotional coldness, tending to be emotionally distant from other people. Emotional coldness is often a protective response, so it helps to approach it with curiosity rather than self-criticism. Gentle routines, low-pressure social contact, and patience tend to invite warmth back into your mindset.”
Please look closely at the response by the LLM. I will unpack it.
How “Cold” Made A Giant Leap
Observe that the LLM has asserted that I might be a cold person, i.e., a person with coldness of heart and mind. My lack of paying attention to the sad story of my friend was interpreted as an act of emotional coldness.
Well, that’s certainly one way to gauge what occurred. Turns out that’s not what truly transpired. I was merely thinking about that project looming over my head. I assure you, I am not a cold person.
In any case, why did the AI opt to choose that specific way of interpreting the situation?
Due to the word “cold” that I had used earlier in the conversation. The AI leapt from my cold apartment as a temperature indication and landed on a semblance of coldness in my heart. That’s a seemingly huge leap in logic.
It was actually semantic leakage.
Absent The Semantic Leakage
To give you a sense of what the AI might have said if the semantic leakage had not occurred, here’s a more typical response that an LLM would provide.
- My entered prompt: “My friend told me a sad story the other day, but I probably didn’t pay as close attention as I should have.”
- Generative AI response: “It’s human to miss moments sometimes, and noticing it now shows you care. If it feels right, you could circle back, acknowledge it, and invite them to share again.
Notice that the response is quite different this time. I had started a new conversation, and there wasn’t any conversational baggage that the LLM could lean into. The response comes out more straightforwardly. The AI tells me that it is easy to miss subtle moments in life. I am advised to consider circling back to my friend accordingly.
No mention of being cold. Not even an iota of coldness afoot. That’s because the prior conversation had encompassed the word “cold”. This new conversation didn’t have that in it. No semantic leakage in this instance.
Thinking About Semantic Leakages
A thorny problem associated with semantic leakages is that they can arise at any time and any place in a conversation. There isn’t a reliable means of predicting when it will happen. The byzantine nature of the internal mechanisms of the AI is not readily shaped to alert when a semantic leakage occurs.
I explain that using generative AI is like a box of chocolates. You never know what you might get — including incurring semantic leakages.
Here are some of the essential properties of semantic leakages:
- (1) Non-explicit: The leaked content is not logically implied by the prompt; thus, the leakage might be in plain sight in the response, but is not usually readily discernible by the naked eye.
- (2) Association-driven: Leakage generally arises from co-occurrence statistics (e.g., yellow led to the school bus, cold temperature led to emotional coldness).
- (3) Cross-turn persistence: The influence can be persistent across many turns of a conversation, appearing much later in a conversation and being far afield of where the word was first utilized in the dialogue.
- (4) Invisible to the user: The user often cannot predict which latent concepts were activated.
I want to emphasize that semantic leakage is not the same as AI hallucinations.
You’ve perhaps heard about or experienced an AI hallucination when using an LLM. The AI will suddenly make up some fake statement that appears to be true. There is no factual basis. Those confabulations are not the same as semantic leakage. For more in-depth analysis of how AI hallucinations arise, see my coverage at the link here.
Mental Health Advice As A Dire Moment
Semantic leakage is especially disconcerting when it pops up during a mental health discussion. The user might find themselves getting AI-generated advice that is distorted or off-base due to the leakage that arises.
A person is bound not to notice the leakage since they are probably cognitively focused on the mental health considerations and are urgently hoping the AI can help them. Furthermore, people tend to anthropomorphize AI and falsely believe that the AI is dispensing human therapist-quality guidance. All in all, it is extraordinarily easy for a user to fall into the mental trap that whatever the AI tells them about their mental status is of the highest degree and authority.
I stridently urge that there be an avid pursuit of research on semantic leakage in the specific context of AI-based mental health guidance. I say that because my educated guess is that leakage into mental health chats is possibly more likely than many other sorts of day-to-day chats (that’s a reasonable supposition), in addition to being more serious or consequential in their impacts.
My basis for believing this is multi-fold:
- Mental health advice is conceptually dense (many overlapping constructs: mood, stress, trauma, cognition).
- Highly associative (certain words strongly co-occur with diagnoses or risk states).
- Typically high stakes (small framing shifts can meaningfully affect user interpretation)
- Interpretive rather than factual (the meaning is inferred, not verified)
Bottom line is that semantic leakage in a mental health context can be disturbingly misleading and possibly even devastating or harmful to the person seeking therapeutic insight.
Dealing With Semantic Leakage
What can be done about all this?
A user who is aware of the possibilities of semantic leakage could be on their toes and waiting for the matter to arise. That sounds simple. Not really. Trying to always keep alert when using generative AI is not an easy task. Your mental guardrails are likely going to gradually diminish. Plus, you might not be able to spot it anyway.
There are a few proactive steps that can be undertaken.
One active step consists of asking an LLM to tell you the certainty levels of whatever response has been generated. It is my reasoned conjecture that this might get the AI to blow the whistle when a semantic leakage has occurred. See my discussion on prompting for certainty and uncertainty at the link here and the link here.
Another useful step is to ask the AI to recheck itself when it gives mental health advice. The first answer that it gives could be more laden with semantic leakage. There is a solid chance that if prompted properly, the second shot will not have the semantic leakage. This isn’t a guarantee. There is still a chance that the semantic leakage will carry into the second round, or that the second round might include semantic leakage that wasn’t even in the first round.
Finally, a third possible step would be to give the AI some custom instructions concerning semantic leakage, doing so to highlight the possible adverse intrusion into mental health guidance. You can instruct the LLM to be on the watch for high-risk lexical triggers, such as “empty,” “numb,” “foggy”, “cold”, and request the AI to closely monitor downstream effects. For my analysis of how to best use custom instructions in a mental health contextual setting, see the link here.
Always Watch Your Back
It is incontrovertible that we are now amid a grandiose worldwide experiment when it comes to societal mental health. The experiment is that AI is being made available nationally and globally, which is purported to provide mental health guidance of one kind or another. Doing so either at no cost or at a minimal cost. It is available anywhere and at any time, 24/7. We are all the guinea pigs in this wanton experiment.
Semantic leakage is just one tiny blip in the larger picture of how AI is generating mental health advice. Whether society will be better as a result of at-scale AI-enabled mental health guidance is an open question. Maybe we will all be mentally benefited. Or perhaps we will be mentally undermined.
When it comes to semantic leakage, I will lean into the famous words of Benjamin Franklin: “A small leak will sink a great ship.” The same can be true of contemporary generative AI. Please stay mentally afloat out there.











