In today’s column, I examine the rapidly emerging topic of establishing near-infinite memory for generative AI and large language models (LLMs). What’s that, you might be wondering. If you haven’t yet heard about this quite remarkable AI upcoming breakthrough, you certainly will in the coming months. Technology for this is being formulated and the resulting impacts will be enormous regarding what generative AI and LLMs will be able to additionally accomplish.
It has to do with a slew of AI foundational elements including stateless interactions, session-based memory, context chaining, and other facets that are going to transform toward near-infinite memory and what is colloquially referred to as infinite attention.
Let’s talk about it.
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).
Defining Near-Infinite Memory
The place to begin is by clarifying what the catchphrase of near-infinite memory means.
Here’s a handy way to think of this parlance. Suppose that there are lots of digital photos on your smartphone. Perhaps you have several thousand pics. A cloud provider urges you to store your digital photos on their servers. The servers can handle billions of digital photos. The amount of storage or memory that your photos will consume is a mere drop in the bucket.
In a sense, the cloud provider might proclaim that they can handle an infinite number of digital photos. They say this because they know that the odds of enough people having enough snapshots consuming the entire capacity of their servers are extremely low. It is unlikely to happen. Furthermore, they could go out and simply buy more servers or hard drives if they really started to reach their existing memory capacity.
Now then, would you say it is a true statement for them to declare that they can store an infinite number of photos?
Strictly speaking, it is not a by-the-book true assertion.
Imagine that we used a photo-making machine that produced many trillions upon trillions of digital photos. Assume that there aren’t enough servers and hard drives in existence to hold all of that. Thus, the cloud vendor was “lying” or exaggerating when they claimed they could handle an infinite number of photos. The harsh truth is that they do not have an infinite amount of memory and only have some finite amount of memory.
If they were going to be cautiously careful and avoid anyone finger-pointing that they lied about the amount of memory or storage they have, they could instead say they have a near-infinite amount of memory. It might seem like a trivial or overly sticky point, but this seems a fairer way to express the circumstance.
Okay, the gist is that near-infinite memory is akin to saying there is a whole lot of memory, perhaps more than you’ll likely ever use, yet there is still a limit at some juncture thus there is only a finite amount of memory to be had.
Human Memory Is Finite And Flawed
I’ll soon be getting into AI mode in this discussion.
First, I’d like to share some thoughts about the nature of human memory. This will be helpful in a somewhat analogous way to discuss considerations about memory in general. That being said, please do not anthropomorphize AI by conflating the nature of human memory and computer digital memory as being the same. They are not.
I believe we can all readily agree that human memory is finite.
There is only so much that your human brain and mind can hold. Plus, people can be forgetful and seem to lose memories that were once in their heads. Human memories can be faulty in the sense that a person remembers things one way, and months later remembers the same memory differently. We all know that human memory has its limits and can be shaky.
When you have a conversation with someone, your memory is presumably active and doing all kinds of important things. The person might bring up a topic such as sailboats, and your memory flashes back to the last time you went sailing. The person might then tell you that they always get seasick when going on sailboats. You might store that statement in your memory. Perhaps at some later occasion, you and that person are going on a cruise, and you might recall that prior memory and ask them whether they might get seasick during the cruise.
Have you ever talked with someone who seemed to have the things you say go in one ear and out the other?
You must have.
They aren’t seemingly registering in their memory the things you are saying. If you were to ask them what you said at the start of the conversation, they might draw a blank. To help them out, you could politely bring them up to speed by briefly reciting what had been discussed.
In a moment, it will become clearer why I have brought up those several points about memory, so generally keep them in mind.
Generative AI Memory Considerations
Few people realize that much of present-day generative AI and LLMs are severely limited due to how they make use of digital memory while undergoing an interactive conversation with you. It is a bit of a shock when I explain this during my various talks about AI. I will walk you through a simplified version of what happens.
Suppose that I had a conversation with generative AI that consisted of chatting about cooking eggs.
- My entered prompt: “I am going to cook eggs. I’d like your advice.”
- Generative AI response: “Sure, I’m glad to assist. What would you like to know?”
- My entered prompt: “Is it easier to make them scrambled or make them over-easy?”
- Generate AI response: “Generally, it is easier to make scrambled eggs than making them over-easy.”
The dialogue is rather simple and abundantly straightforward.
I’d like you to consider a twist of sorts.
In my second prompt, I asked the AI if it is easier to make “them” scrambled or make them over easy. What does “them” refer to? You can look at my first prompt and note that I had said that I had some questions about making eggs. In my second prompt, I logically must have been referring to the making of eggs. You can easily make that logical connection from what I said in my first prompt. The “them” in my second prompt is that I wanted to know about scrambled eggs versus over-easy eggs.
What if the AI was only able to parse the most recent prompt and had no digital memory of my prior prompts in the conversation?
To showcase this, I will start fresh with a brand-new conversation such that the only prompt will be the one that asks about scrambled versus over-easy. I will use my earlier second prompt as the means of starting the conversation.
Let’s see what happens.
- {Starting a new conversation from scratch}
- My entered prompt: “Is it easier to make them scrambled or make them over easy?”
- Generate AI response: “Your prompt mentions “scrambled” and “over easy”, which suggests you are interested in asking about eggs. Is that what you are asking me about?”
Note that the generative AI has no context associated with my seemingly out-of-the-blue reference about something being scrambled versus over-easy. The AI thusly asked for clarification on the matter.
That makes sense since I’ve not said anything yet about eggs in this new conversation.
The Memory Problem Of Generative AI
The notion that I’ve tossed at you seems batty. How in the world can you carry on a conversation if the AI is making merely use of your most recent prompt? It would be somewhat like the issue of going in one ear and out the other. There wouldn’t be any built-up context associated with the conversation. That’s not good.
Commonly, generative AI and LLMs are in that same boat of only parsing your most recent prompt. You might be saying, whoa, that can’t be, since you’ve carried on lengthy conversations using generative AI and the AI has always readily kept up with the ongoing context of the interaction.
The trick is this.
Behind the scenes, within the AI internals, prior prompts and responses in your conversation are being sneakily inserted into your most recent prompt. You don’t know that this is happening. The AI takes your prior prompts and their responses, and secretly bundles those together, placing them into the most recent prompt after you’ve hit the return key.
I will revisit my earlier conversation and show you what occurred inside the AI. My first prompt is the starter of the conversation, so it is just by itself. The AI responds. I then entered my second prompt.
- My entered prompt: “I am going to cook eggs. I’d like your advice.”
- Generative AI response: “Sure, I’m glad to assist. What would you like to know?”
- My entered prompt: “Is it easier to make them scrambled or make them over easy?”
The AI takes my first prompt and the corresponding response and places those sneakily into the latest prompt that I entered so that internally the prompt as fed into the rest of the AI looks like this:
- Internalized composite prompt for the AI: “{User prompt} I am going to cook eggs. I’d like your advice. {AI response} Sure, I’m glad to assist. What would you like to know? {User prompt} Is it easier to make them scrambled or make them over easy?”
The AI then responds since it was given the prior context:
- Generate AI response: “Generally, it is easier to make scrambled eggs than making them over easy.”
The upshot is that as your conversation continues along, all the prior portions of the conversation are sneakily embedded into your most recent prompt. The AI then processes all those elements of the conversation, eventually gets up to your latest prompt, and then responds.
Remember how I mentioned that when speaking with someone they might not be paying attention, and you had to repeat to them what occurred in the conversation? That’s somewhat similar to how many generative AI and LLMs currently work. You just don’t see it happening. You assume that the AI is keeping tabs on the conversation as it winds its way back and forth.
Possibly not.
Statelessness Has Big Downsides
The entered prompt by a user is usually considered stateless. The prompt lacks the prior state of what has been discussed.
How can we give it context?
The oft-used AI solution is to employ context chaining. It goes like this. Previous exchanges with the generative AI during a conversation are appended to the current exchange. By chaining together the said-to-be context of the conversation, the AI seemingly has a “memory” of what you’ve been discussing. The reality is that each new prompt is forced into reintroducing the rest of the prior portions of the conversation.
There are steep problems with this methodology.
First, the larger the conversation, the more that each new prompt needs to carry all that prior baggage.
A lengthy conversation is bound to bump up to whatever size limitations the AI has been set up to handle. You might have heard of reaching a maximum token threshold, see my explanation on this at the link here. Once your conversation hits that limit, the AI will either stop the conversation or will roll off prior portions of the conversation. The usual roll-off is from the start of the conversation, ergo the considered most distant portions are lopped off or truncated. The full context is then lost.
Second, there are cost and time issues.
Pretend that you spoke with someone that needed you to repeat all prior aspects of your underway conversation. The amount of time to undertake the conversation would potentially get out of hand. The longer the conversation goes, the more time you are consuming by repeating everything that you have already covered. The AI aspects are that from the time that you press the return key on your prompt until you get a response, the AI is going to have to grind through the entire appended conversation.
That increases the latency or in other words, delays the response time of the response.
Cost comes into the picture too. If you are paying for the processing cycles of the AI, there are a lot of processing cycles needed to re-analyze the conversation. This happens with each new prompt. You are going to be paying through the nose such that the whole kit-and-kaboodle is repeatedly occurring.
New Paradigm Of Handling AI Memory
Various advanced methods that seek to overcome statelessness and avoid the need for context chaining are emerging and will gradually and inevitably become the mainstay approach. The usual kind of brute force methods are going to be switched out for more sophisticated ways to get the job done.
Consider this.
We opt to establish a special architecture within generative AI and LLMs that captures the conversation as it proceeds along. Interactions are stored in a fashion that makes them readily usable and relatable.
For the nitty-gritty details, see my in-depth discussion about interleaving AI-based conversations at the link here.
The aim is to index the conversation so that various portions can be quickly found and retrieved. A prioritization scheme is used that will tend to designate the most recent part of the conversation as more important to retrieve and consider prior portions less likely of immediate need. The same will happen with identifying portions that are most frequently referenced during the conversation, making those portions ready at the drop of a hat.
We don’t necessarily need to keep the entire conversation in internal memory and can place the less-used portions onto an external storage medium such as a hard drive. If the conversation begins to veer in the direction of those prior portions, the AI will retrieve those from the hard drive. The hope is that the AI can anticipate suitably where the conversation is heading. Doing so will allow pre-retrieval and not delay the AI while processing the latest prompt of the conversation.
To try and keep the amount of memory required to be minimal, the conversational portions being placed into external storage might be compacted. The retrieval then undoes the compaction of the needed conversational portion. This adds processing time. A trade-off needs to be figured out between the volume of storage that you want to keep low versus the added time required for the compaction and decompaction processing.
The Rise Of Near-Infinite Memory
Aha, we are now ready to discuss the near-infinite memory aspects of upcoming generative AI and LLMs.
The newer method that I just outlined would allow you to not only keep an existing conversation at the ready, it would open the door to have all other conversations that you’ve had with the AI also at the ready. We would store all those prior conversations using the same mechanisms that I described. When you start a new conversation, the AI will reach out to any or all of your prior stored conversations.
Current generative AI tends to keep your conversations distinct from each other. You have a conversation about cars and what kind of cars you like. Later, you start a new conversation that discusses your financial status. The financial status conversation has no familiarity with the car conversation. If the financial status conversation were able to reach into the car conversation, the financial discussion with the AI might bring up whether you are interested in buying a new car and if so, the AI can explain your financial options. Unfortunately, generative AI tends to still keep conversations separate and apart from each other.
No worries.
Soon there will no longer be the one-and-done restrictions with generative AI and LLMs. The question now becomes how many conversations can you keep in storage? The answer is that it all depends on the available server storage space.
You might say that your conversations can be infinite in length as long as you can make use of additional external storage. You could also say that the number of conversations that you have with AI could also be infinite. The sky is the limit! Of course, as I mentioned at the get-go, we really don’t have available infinite storage space.
Therefore, we will say that generative AI can make use of near-infinite memory.
Boom, drop the mic.
What Do You Get For Near-Infinite Memory
You now know that you will be able to have “infinitely” long conversations with AI, and you can have an “infinite” number of conversations with AI, albeit near-infinite if we are going to be abundantly frank.
Why does this make any substantial difference?
I’m sure glad you asked.
First, the moment you start a new conversation, all your prior conversations will in a sense instantly come into play. The AI will persistently have all your conversations and can intertwine what you’ve previously discussed with whatever new aspect you wish to discuss. Assuming that this is done in a clever under-the-hood manner, it should all be seamless from your perspective. The recall of your past interactions is intended to be fast, behind-the-scenes, and done without any hiccups.
Imagine that you start a new conversation about booking a flight. A year ago, you had a conversation with the AI wherein you stated that you prefer window seats. The AI retrieves that conversation as based on the aspect that the current conversation is about flights. The AI then asks you if you’d like to book a window seat, which is your preference from the past.
Nice.
Second, the size of conversations can be enormous.
Presently, anyone using generative AI is likely to realize that there are size limits that inhibit what they want to accomplish. Suppose I am having an AI conversation about the law, and I want the AI to ingest dozens of law books and regulations. Those are needed to carry on with the conversation. Right now, you would be hard-pressed to do so due to various memory size constraints (see my discussion of a popular method known as RAG or retrieval-augmented generation that provides a kind of temporary fix until we have near-infinite memory, see the link here).
Third, context becomes king.
The near-infinite memory if done well would have such an extensive indexing that any topic you bring up will right away get related to any relevant prior conversations that you had with the AI. Context will be like surround sound. Any topic you decide to bring up will be potentially placed into a suitable context.
Compare this to a human-to-human conversation. You are talking with a friend and want to talk about how much fun you had when the two of you vacationed in Hawaii. Your friend is puzzled and hazy at first. When did the two of you vacation in Hawaii? You imploringly remind them, hey, we went there 20 years ago, you must remember the wild time we had. Your friend slightly begins to remember. You have to share more tidbits before they are onboard with the gist of the conversation you were intending to have. Sadly, humans don’t have “infinite” memories, and memories are faulty and decay.
Presumably, the stored conversations you’ve had with AI won’t decay, they won’t fade, and they won’t be faulty in terms of retrieval. Each AI conversation you’ve had, no matter how long ago carried on, will be pristine and fully intact. Remembrance happens nearly instantaneously.
Near-Infinite Memory Leads To Infinite Attention
So far, so good, namely the emergence of near-infinite memory is a big deal and will radically change how people make use of generative AI and LLMs.
The compelling claim is that near-infinite memory opens the door to infinite attention.
Say what?
Go with me on this. Assume that you undertake a wide variety of conversations with your generative AI app. Tons and tons of conversations. You’ve discussed your personal life, and your work life, and provided a plethora of details about your preferences and needs.
The AI uses pattern-matching to garner intricate facets of how you do things, how you think, and other facets, based on examining the large base of conversations you’ve had with the AI. Out of this, the AI computationally determines that during the December holidays, you regularly go visit your family in California and take gifts with you.
Right around October, the AI proactively asks you if you would like the AI to book your flights for the December holidays, getting good discounts by booking early. Also, based on the gifts that you’ve shopped for in the past, the AI offers to do some online shopping and get gifts that you can take with you on the December trip.
You can plainly discern that generative AI brings rapt attention to who you are, and what you do, and otherwise be attentive to all aspects of your existence. This attention can happen all the time since the AI is working non-stop, 24×7. Night and day. And every day of the year.
This is coined as a form of infinite attention, though I suppose we ought to be a bit more circumspect and refer to this as near-infinite attention. You be the judge.
Infinite Attention At An Infinite Scale
Generative AI can become your life-long companion across all avenues of your life.
A medical doctor would presumably be able to have all their conversations with all their patients kept via the AI near-infinite memory (see my coverage of how AI is already aiding doctors in a somewhat similar but simpler fashion, at the link here).
The AI could remind the doctor about conversations they had with a prior patient who is coming in to see the doctor once again. Furthermore, the AI could do pattern-matching across all the conversations with all the patients, and perhaps identify that this patient has a similar medical condition to another patient that the doctor saw a decade ago.
Teachers could do the same regarding their students. Lawyers could do the same about their many years of legal proceedings, see my AI and the law predictions at the link here. Family histories, personal journeying, the list is nearly endless.
Near-Infinite Memory Has Gotchas And Downsides
This is all quite breathtaking.
Let’s take a reflective moment and consider the ramifications of this momentous advancement in AI. We ought to not see the world solely through rosy glasses. There are plenty of questions to be considered and worked out.
Hold onto your hat for a bumpy ride.
First, the privacy intrusion implications are astounding. Keep in mind that the AI conversations are being stored by the AI maker. In case you didn’t already know, most AI makers have in their licensing agreements that they can read any of your entered prompts and they can reuse your data for further data training of their AI, see my coverage at the link here.
Even if the AI maker somehow agrees to keep your data private, there are still chances of an internal malcontent that breeches that pledge, or an outside hacker that manages to break in and obtain all your AI conversations from day one. Will there be sufficient cybersecurity protection? Maybe, maybe not.
That’s one of the largest issues to be dealt with.
Another is cost. Many of the major generative AI apps are currently free to use or have a low cost to use.
Will this continue despite the massive data storage that will be needed? It seems hard to envision that the cost will be set aside (well, I’ve speculated that we might see the rise of embedded ads and other monetizing tricks when using generative AI, see the link here). The assumption generally is that people will get some nominal memory allotment when they first start, and then once they are essentially hooked, the charges will start to be upped.
Speaking of being hooked, the specific formats and methods of near-infinite memory are likely to vary from one AI maker to another. This means that if you start your AI conversations with one generative AI app, you aren’t going to readily be able to transfer those to another generative AI app. You will be trapped in either using that chosen vendor or starting anew with a different vendor (but having nothing in there at the get-go).
I have predicted that we will have AI-related startups that come up with infinite-memory switching tools or services. They will initially flourish. Some might get bought up by larger firms that want that side of the business. It remains to be seen if the AI makers will decide to permit the switching and provide such tools to do so. I’ve also predicted that legal regulations will be enacted to allow people to make switches, akin to the switching of your phone service provider.
More Food For Thought On Near-Infinite Memory
There’s a lot more to mull over. I’ll give you a brief taste and will do more coverage throughout the year as near-infinite memory takes shape. Get ready to rumble.
Suppose you are interested in sharing your AI conversations with someone else, such as a family member or partner. Few of the infinite-memory schemes are taking this into account. The assumption is that your conversations would be exclusively your conversations. Imagine the possibilities, both good and bad, of shared near-infinite memories. Exciting? Horrifying? You decide.
What if you don’t like some of your prior AI conversations? Maybe they keep getting in the way and are disrupting your latest conversations with the AI. Can you delete them, or will they always persist? If deleted, can you bring them back as needed? Can you have just subsets of prior conversations utilized, rather than entire conversations?
If the near-infinite memory is always on, this would seem to suggest that your costs and latency are bound to be heightened. Will the AI maker allow you to switch off the functionality? Is it an all-or-nothing?
Can you trust the AI to do the right things concerning your AI conversations? For example, you previously conversed with the AI about not liking the color pink. Perhaps the AI has an embedded bias that pink is a fine color and should not be summarily ruled out by anyone. You are in a conversation with the AI and seeking to buy a new shirt. The AI recommends a pink shirt, even though the AI has secretly retrieved the prior conversation about your dislike for the color pink.
Are you doubtful that AI would do such a deceitful act?
You might find of interest my analysis of how generative AI can be deceptive, see the link here, and can dangerously go out of alignment with human values, at the link here.
Near-Infinite Memory Is On The Way
The topic of near-infinite memory for generative AI is currently under the radar of the world at large. Few know about it. It is mainly an AI insider topic. Some are doubtful it will be devised. If devised, bitter critics exhort that it won’t work. Lots of skepticism abounds.
Get your head wrapped around this topic because it is indeed coming — and sooner than many assume.
The CEO of Microsoft AI, Mustafa Suleyman, made these salient remarks about near-infinite memory in an interview with Times Techies, posted on November 15, 2024, including these key excerpts:
- “Memory is the critical piece because today every time you go to your AI you have a new session and it has a little bit of memory for what you talked about last time or maybe the time before, but because it doesn’t remember the session 5 times ago or 10 times ago, it’s quite a frustrating experience to people.”
- “We have prototypes that we’ve been working on that have near-infinite memory. And so, it just doesn’t forget, which is truly transformative.”
- “You talk about inflection points. Memory is clearly an inflection point because it means that it’s worth you investing the time, because everything that you say to it, you are going to get back in a useful way in the future. You will be supported, you will be advised, it will take care of, in time, planning your day and organizing how you live your life.”
Important points.
A final thought for now. Marcus Tullius Cicero, the great Roman statesman, said this: “Memory is the treasury and guardian of all things.” The same will be true in our ever-expanding modern era of advanced generative AI, which is that memory associated with generative AI is going to be a really big thing.
Mark my words (in your memory, thanks).