There is a whole lot of merging going on.

If that doesn’t ring a bell, let me be a bit more specific. In the realm of generative AI, there are ongoing and relatively widespread efforts underway to take multiple generative AI systems and merge them together. This is mainly being done by those within the AI insider community. Not many people outside the AI realm are aware that this is taking place.

There are plenty of busy bees seeking to merge a generative AI model of one kind with a generative AI model of either a similar kind or an entirely different kind.

Why do any of this?

One aim is to garner the best of both worlds.

Go with me on this.

Suppose that there is a generative AI model that does great stuff when it comes to generating text essays and the like. People use it for writing up material, summarizing narratives, and otherwise interacting in a text mode with the AI. That’s wonderful. But let’s pretend or assume that this particular generative AI app or model is lousy at doing mathematics such as solving word problems involving algebra. Sad face.

I will refer to this generative AI instance as Model A, doing so merely for the sake of discussion.

Well, pretend that there is a different generative AI model, which I’ll coin as Model B, that does wonders when it comes to mathematical problems such as solving algebraic equations. Let’s assume that this Model B is not very good at text generation. It is less capable of text generation than Model A.

If you wanted to generate text of high quality, you would need to log into Model A. That’s fine. But if you suddenly realize that you want to solve a mathematics problem and want a more likely complete answer, you’ll need to separately log into Model B. This is going to undoubtedly be annoying and get out of hand. You will need to constantly switch between Model A and Model B. Neither of them is connected to each other and thus you need to start anew on whatever you are doing whenever you go from one to the other.

Frustrating, exasperating, irritating, obnoxious, time-consuming, and altogether a pain in the neck.

We have this sticky predicament:

  • Model A: Strong on text, weak on mathematics.
  • Model B: Weak on text, strong on mathematics.

What are we to do?

You could just shrug your shoulders, grit your teeth, and live with life as it presents itself.

Or you could boldly seek to merge the Model A and the Model B, arriving at a new Model C.

Imagine the immense joy and sense of satisfaction. You could log into Model C, setting aside entirely Model A and Model B, and simplify your world by always using Model C to do everything you need to do. To satisfy your text generation needs, use Model C. Have a mathematics problem to be figured out, yes, still go ahead and use Model C. You’ve got an all-in-one solution.

The hope of course is that the merger will produce this:

  • Model C: Strong on text, strong on mathematics.

Mull that over.

I know that some of you are thinking that instead of merging Model A and Model B, maybe we should have built Model C from scratch. If you want generative AI that is strong on both fronts of text and mathematics, build it from zero. I dare say that building generative AI from scratch can be quite a hefty chore. It can take a long time to do. It can be costly. All sorts of resources are needed and consumed.

There might be a better path.

The better path could be to merge generative AI models. This potentially can be faster to undertake than building an akin Model C from scratch. This might be cheaper to do. Various advantages potentially arise.

Wow, this seems convincing that mergers are the way to go.

Alas, a merger might get us this:

  • Model C (but not what we intended): Weak at text, Weak at mathematics.

The rub is that merging generative AI models is tricky, it is risky, and it might not go well. The end result could be the worst of both worlds.

There you have it, the upside is the best of all worlds, and the downside is the worst of all worlds.

Let’s talk about it.

For my ongoing readers and those new to my column, this topic is a continuation of my coverage of trending AI advances that are worthy of in-depth analysis and avid attention.

Recent examples include that generative AI is as valuable for being able to come up with intelligent questions as it is for providing answers, see the link here, and that agentic AI is opening the door to end-to-end AI processing such as for scientific discovery, see the link here. Another and quite popular example is my explanation of the so-called shared imagination among multiple but disparate generative AI apps, see the link here, and the role of deductive versus inductive reasoning for generative AI and large language models (LLMs), see the link here.

On with the show.

The Big Picture About Generative AI And LLMs

I’m sure you’ve heard of generative AI, the darling of the tech field these days.

Perhaps you’ve used a generative AI app, such as the popular ones of ChatGPT, GPT-4o, Gemini, Bard, Claude, etc. The crux is that generative AI can take input from your text-entered prompts and produce or generate a response that seems quite fluent. This is a vast overturning of the old-time natural language processing (NLP) that used to be stilted and awkward to use, which has been shifted into a new version of NLP fluency of an at times startling or amazing caliber.

The customary means of achieving modern generative AI involves using a large language model or LLM as the key underpinning.

In brief, a computer-based model of human language is established that in the large has a large-scale data structure and does massive-scale pattern-matching via a large volume of data used for initial data training. The data is typically found by extensively scanning the Internet for lots and lots of essays, blogs, poems, narratives, and the like. The mathematical and computational pattern-matching homes in on how humans write, and then henceforth generates responses to posed questions by leveraging those identified patterns. It is said to be mimicking the writing of humans.

Generative AI and LLMs tend to be designed and programmed by using mathematical and computational techniques and methods known as artificial neural networks (ANNs).

The keystone idea behind this approach is inspired by the human brain consisting of real neurons biochemically wired together into a complex network within our head. I want to clarify and emphasize that how artificial neural networks or ANNs work is not truly akin to the real-world complexities of so-called wetware or the human brain, the real neurons, and the real neural networks. Artificial neural networks are a tremendous simplification of the real thing. It is at best a modicum of a computational simulation. Indeed, various aspects of artificial neural networks are not viably comparable to what happens in a real neural network. ANNs can somewhat be used to try and simulate some limited aspects of real neural networks, but at this time they are a far cry from what our brains do, see my detailed explanation at the link here.

When people read or hear that a computer system is using “neurons” and doing “neuron activation” they would make the reasoned leap of faith that the computer is acting exactly like our brains do. Wrong. This is regrettably anthropomorphizing AI. The dilemma for those of us in AI is that the entire field of study devoted to ANNs makes use of the same language as is used for the biological side of the neurosciences. This is certainly sensible since the inspiration for the mathematical and computational formulation is based on those facets. Plus, the hope is that someday ANNs will indeed match the real things, allowing us to fully emulate or simulate the human brain.

Here’s what I try to do.

When I refer to ANNs and their components, I aim to use the word “artificial” in whatever related wording I use. For example, I would say “artificial neurons” when I am referring to the inspired mathematical and computational mechanisms. I would say “neurons” when referring to the biological kinds. This ends up requiring a lot of repeated uses of the word “artificial” when discussing ANNs, which some people find annoying, but I think it is worth the price to emphasize that artificial neurons are not the same today as true neurons.

You can envision that an artificial neuron is like a mathematical function that you learned in school.

An artificial neuron is a mathematical function implemented computationally that takes an input and produces an output, numerically so. We can implement that mathematical function via a computer system, either as software and/or hardware. The artificial neurons or mathematical functions usually involve the use of arithmetic weights and values, all of which are customarily grouped and organized into a series of layers.

I think that is sufficient for the moment as a quickie backgrounder. Take a look at my extensive coverage of the technical underpinnings of generative AI and LLMs at the link here and the link here, just to name a few.

Generative AI And LLMs Are Different From Each Other

When an AI maker develops a generative AI or LLM from scratch, they typically use an approach that is relatively commonly used by other AI makers. In that sense, the internal mechanisms are roughly similar much of the time.

The specific values, weights, groupings, layers, and other elements will differ, but nonetheless, the same overarching structures are being utilized under the hood. You might broadly say that they are all using Legos even though the assembly of the Legos differs. If you were to use ChatGPT by OpenAI, and then use Claude by Anthropic, they are both at a 30,000-foot level generally leveraging underpinnings that are roughly the same.

The data training was done differently, though they each undoubtedly touched upon much of the same data found on the Internet, and meanwhile, each encountered distinctly different data along the way too. The values, weights, groupings, and layers will thusly be different. Yet the structure is of a comparable nature.

I have dragged you through this fast-paced review to bring up an important consideration.

In some ways, merging disparate generative AI or LLM models is eased due to the across-the-board commonly utilized structures, but do not be fooled into assuming that these mergers are a piece of cake. They aren’t. I will be sharing with you some of the complexities that make this merging task quite an arduous chore.

As noted earlier, the impetus for seeking to do mergers of generative AI models is that you might be able to get the best of all worlds. Like my example of Model A and Model B being merged into Model C, you could leverage all that hard work that went into making Model A and Model B, potentially conjuring up an even better Model C.

This seems like a perhaps obvious and sensible thing to do.

There are numerous technological hurdles involved, which I’ll mainly focus on in this discussion.

One momentous hurdle is not technological difficulties but instead business and economic issues.

If you were to have spent many millions or maybe billions of dollars to craft Model A, and some other firm did likewise to create Model B, your desire to merge the two into a Model C is probably not going to be very high. You want to squeeze as much profit out of your Model A, and so does the firm that made Model B. They will each cherish the proprietary nature of what they have built.

This is why much of the merging of generative AI and LLMs is typically done with open-source generative AI and LLMs. All in all, the proprietary hiccups are lessened when using open-source models. It isn’t all roses and wine. There are still some potential sticking points about licensing stipulations. Also, some purported open-source generative AI and LLMs are only partially open, thus not all the internal bits and pieces are available for inspection and reuse.

It is noteworthy to realize that the straight-ahead path to merging multiple generative AI and LLMs is when the whole kit and kaboodle are openly available. The moment you land into the zone of unrevealed facets, any merging effort will be faced with a larger uphill battle. I am not saying that you still can’t attempt a merger. It is just that the effort to attain a merger and end up with something of equal or higher caliber is indubitably uncertain.

Merging Across Differences Of Generative AI And LLM

Let’s consider my four major considerations about merging multiple generative AI models:

  • (1) Of a like kind. Merging of generalized generative AI models that are of a like kind.
  • (2) Of differing specialties. Merging of different specialized generative AI models.
  • (3) Of differing modalities. Merging generative AI models of varying modalities.
  • (4) Of different natural languages. Merging generative AI models of different natural languages.

I’ll briefly elaborate on those.

You might have a generative AI and another different generative AI that are quite similar in what they do. Suppose they both are quite good at text-based essay generation. Assume they are on par with each other. Neither is less or more than the other.

The merging of those two is probably going to be somewhat easier than when faced with the other listed circumstances, though it also depends greatly on how they each were devised. You might also question whether the merger in this use case is worthwhile since they both already do the same thing to the same level of proficiency. What are you gaining by merging them?

That’s a merger of a like kind.

The merger of differing generative AI specialties such as one that is adept at text while the other is adept at mathematics might be a more logically prudent operation to undertake. I gave the example of one generative AI adept at text-based generation but weak on math problems, and the other generative AI was weak on the text side and strong on the mathematics side. We can potentially do ourselves some real good by pulling both specialties into one generative AI.

That being said, the merger might be trickier. You want to somehow ensure that the strengths of each get carried over into the merged generative AI. And you want to somehow avoid having the weaknesses of each one get carried over. They can potentially clobber each other too, making a big mess that doesn’t do anything right.

My third route entails merging generative AI models that are each devised to handle particular modalities. Some generative AI apps are only text-based generators. Some are only audio generators. Some are only video generators. If you want a generative AI that does text, audio, and video, you can either build it that way or seek to merge the respective types into a merged model.

My fourth mode is a twist that not many realize exists, namely that some generative AI models are principally built on a chosen natural language (English is the usual default). You see, the data training is conventionally done on Internet data that is in the English language. A curious outcome is that often the generative AI gets exposed to some data in other natural languages, and relatively quickly can be brought up-to-speed in other languages accordingly, see my analysis of how this happens at the link here.

As an example of models based entirely on different natural languages, imagine that we had a generative AI model that was data-trained on content in English, and we had another generative AI model that was data-trained on content entirely or principally in Japanese. It might be interesting and a notable payoff to merge the two models. The merger is not merely merging rote language differences. The odds are that the underlying content will differ too. Perhaps there are inherently different philosophies exemplified in the content, along with differing uses of language, and stories told, and could make for a very enriched merged model.

There are other categories of different types of generative AI models, but I’ve found that those four types seem to account for the bulk of merging endeavors underway.

Strategic Approaches To Merging Generative AI

Let’s continue with the magic number of four.

Consider my four overarching strategic approaches to merging generative AI models:

  • (1) The Outsider. Output Combiner Approach: Collect outputs from multiple generative AI models (an ensemble) and combine the outputs externally, outside of the respective models, giving the appearance of a merged model existence.
  • (2) Big Brother/Sister. Train Into One Approach: Use each of multiple generative AI models to train a fresh generative AI from scratch (the merged model) or do this by dovetailing into a pre-made chosen base model. Also known as training transfer.
  • (3) Mosaic. Fusion Among Disparates Approach: Use distinct generative AI models such as ones devoted to different individual modalities such as text, audio, video, and merge them into one (referred to as multi-modal fusion, though other types of fusion exist too).
  • (4) Pick-and-Choose. Architectural Piecemeal Approach: Selectively identify internal architectural facets and values of multiple generative AI models and pick and choose to form a merged model. Sometimes said to be a hybrid model, but there are other kinds of hybrids such as the neuro-symbolic models that I describe at the link here, and are considered not in this same bailiwick.

I’ll briefly elaborate on each of the four.

The first approach is almost a form of cheating. Here’s the deal. You take the outputs from two or more generative AI, and you combine them post-generation. The user is presented with the combined response. The models aren’t truly being merged, just their outputs are being merged.

For example, someone asks how to open a jar that has a stuck screw-on top that won’t seem to budge. Suppose we secretly ask ChatGPT about this dire need and ChatGPT says to wrap a towel around the top and twist with that for leverage. We also secretly ask Claude, and the AI says to put the top of the jar under hot water to loosen the lid. The user hasn’t seen any of those replies as yet. We then take both of those replies and present them to the user.

Voila, from a user’s perspective, it seems as though they are interacting with a merged generative AI.

I tongue-in-cheek refer to this as cheating because you really haven’t merged the two generative AIs. Things might seem that way to the user, but the reality is that we aren’t likely to get the best of both worlds via this approach. Some emerging add-on formulations sit outside of generative AI and do actions like this, generally referred to as trust layers, see my analysis at the link here.

The second approach involves using generative AI to train another generative AI. Again, this is somewhat at the edges of doing an actual merger. I would tend to rate this a tad above the output-based merger-like endeavors. For more on this approach, see my discussion at the link here.

The third approach is about the modalities merging that I mentioned a few moments ago. The modalities merger can be done in an easy surface manner or an intricacies fashion. For the surface approach, you tie together the disparate modality generative AIs by connecting them via APIs. Once again, you aren’t formally merging them. The harder approach involves merging the internal mechanisms into a merged model.

The fourth approach is what most would tend to agree is an actual merger. You figure out the internals and decide what will be merged into the resultant merged model. Sometimes, you might begin with an empty merged model that is nothing more than a kind of shell. On other occasions, you might decide to use one of the to-be-merged models as a source and a target.

That last point probably sounds odd, so let me unpack it.

I have a generative AI that I’ll refer to as Model A and a different one is Model B. I am determined to merge them. I prepare from scratch a Model C that contains no data, no values, no weights, etc. I take Model A and Model B and merge them into Model C. All done.

Let’s try that again, differently. I have Model A and Model B. Model B is going to end up as the merged model. I don’t want to clobber Model B, so I make a copy of it, which I’ll refer to as Model C, wanting to avoid any confusion that it is the original Model B. I merge Model A into Model C. All done.

One other point about this merger effort is that sometimes the merger is done on a serial or one-by-one basis, while other times you might proceed in a parallel means. Suppose I have Model X, Model Y, and Model Z, all of which are going to be merged into a brand-new Model M. I could merge one at a time, such as first Model X into Model M, then Model Y into Model M, and finally Model Z into Model M. That’s considered a serial merger. Alternatively, I could use Model X, Model Y, and Model Z on a somewhat simultaneous basis to merge into Model M, picking and choosing from them on a round-robin basis. Each of those techniques has numerous tradeoffs and it all depends on what preferences you have when doing these mergers.

By Hand Versus Automated Merging Of Generative AI

I am switching from the magic number four to the magic number two.

There are two major ways to devise a merged model:

  • (a) Do a merger by hand. AI developers and researchers do the heavy lifting during the merging process, possibly using tools as a form of assistance.
  • (b) Do a merger via an automated process. A fully automated or semi-automated process consisting of various merging tools stitched together that can take multiple generative AI models and craft a merged model, either at the explicit overall guidance by AI developers or via the press of a button (often known as a neural architecture search or NAS).

Merging by hand is the mainstay of today’s efforts. Gradually, automated processing is being advanced and growing in use.

If you are interested in the different kinds of tools and kits for doing these mergers, you’ll find a lot of them posted on GitHub and similar repositories. By and large, I would estimate that most of the tools and kits, and mergers of open-source generative AI, have been created on the side by AI researchers, AI hobbyists, and the like. I’m not saying there aren’t professional tools, which there are, and indeed there are vendors in this space.

Lastly, I would characterize the merging of generative AI and LLMs as the Wild West at this time. It seems like just about everyone has their own proprietary merging recipe they prefer or have devised. It is interesting, fun, exciting, and offers great potential.

Gotchas With Generative AI Model Mergers

Is the merging of generative AI a silver bullet and an ironclad way of garnering the best of all worlds?

Nope.

Here are my ten topmost gotchas associated with merged generative AI models:

  • (i) Falters On All Counts. The merged model is a mess and does not rise to either of the sourced individual models and falters so badly that it is unreliable and likely unusable.
  • (ii) Amplifies Weaknesses. The merged model works but regrettably has amplified the weaknesses from the sourced generative AI models and tends to exhibit heightened frequency and degrees of AI hallucinations.
  • (iii) Complexity Explosion. The merged model is so complex that trying to maintain it and decipher how it works is highly problematic.
  • (iv) Computational Hog. The magnitude of computational resources required to run the merged generative AI model is far beyond that of the individual generative AI models (if being run separately on a collective basis).
  • (v) Generalization Lost. Ruinous overfitting might arise in the merged model such that it does not generalize as well as the sourced generative AI models and therefore becomes unduly narrow in its capabilities.
  • (vi) Specialization Deficit. Generalization became the primary attainment but at the loss of specializations that were in the sourced generative AI models.
  • (vii) Performance Pig. The merged generative AI model is bloated and runs extremely slowly, possibly so piggish that using it on everyday tasks is exasperating and imprudent due to enormous delays while processing.
  • (viii) Explainability Goes Out The Window. Even if the individual generative AI sourced models had some form of explainability or interpretability built in, the merged generative AI model no longer has that capability and becomes a disconcerting black box.
  • (ix) Biases Carry Overs. The biases in the individual generative AI sourced models are inadvertently carried over into the merged generative AI model, possibly still hidden and perhaps even magnified.
  • (x) Other Complications. All kinds of problems can occur when creating a merged model, some of which can be anticipated and some are likely hard to predict, let alone ferret out once the merged model is presumably completed.

There you have it, my magic number ten of the big-time gotchas.

State-Of-The-Art Research On Merging Generative AIs

The field of generative AI or LLM model mergers is rapidly advancing.

I will go ahead and share with you a recent research study that illustrates the kinds of advancements that are taking place. In this instance, the research I selected is one of those focused on trying to automate the process of doing mergers.

In the research paper entitled “Evolutionary Optimization of Model Merging Recipes” by Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, and David Ha, arXiv, March 19, 2024, these salient points were made (excerpts):

  • “Model merging strives to create a versatile and comprehensive model by combining the knowledge from multiple pre-trained models, potentially yielding a model capable of handling various tasks simultaneously.”
  • “While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential.”
  • “Given the large diversity of open models and benchmarks in the community, human intuition can only go so far, and we believe a more systematic approach for discovering new model combinations will take things much further.”
  • “We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models.”
  • “Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models.”

The points above note that so far, the manual method has been the mainstay approach.

A less manual and more systematic approach that relies on automation would seem highly beneficial. The efficiency of the merger process is presumably going to rise. The cost will hopefully be lower. The speed of merging will quicken. And, significantly, the merged model will possibly be more robust and effective. That’s at least the aspiration.

At a high level, here’s what they did (excerpts):

  • “Our goal is to create a unified framework capable of automatically generating a merged model from a selection of foundation models, ensuring that the performance of this merged model surpasses that of any individual in the collection.” (ibid).
  • “Our approach encompasses (1) evolving the weights for mixing parameters at each layer in the parameter space (PS); (2) evolving layer permutations in the data flow space (DFS); and (3) an integrated strategy that combines both methods for merging in both PS and DFS.” (ibid).
  • “It is possible to first apply PS merging to a collection of models, and then put back this merged model in the collection and apply DFS merging from this enlarged collection.” (ibid).
  • “Our motivation is for the evolutionary search to discover novel ways to merge different models from vastly different domains (e.g., non-English language and Math, or non-English language and Vision) which might be difficult for human experts to discover effective merging solutions themselves.” (ibid).
  • “Furthermore, effectively merging models from very different domains can lead to models of wider real-world applicability and enable us to develop models beyond the large population of models that are optimized for the narrow range of tasks defined by a leaderboard.” (ibid).
  • “With these promising initial results, we believe we are just scratching the surface of unlocking the full capabilities of evolutionary model merging, and this is the inception of a long-term development of applying evolutionary principles to foundation model development.” (ibid).

I tend to prefer studies like this that not only provide an insightful theoretical premise but also put rubber to the road. The researchers depicted how they merged generative AI with different specialties such as text versus math and went the extra mile by also merging across modalities, plus merging across natural languages.

That is some handy-dandy road testing.

Some Mind-Bending Twists And Turns

Prepare yourself for a dollop of mind-bending.

If we were to wave a magic wand and make a new grandiose law that required all the AI makers to allow their generative AI or LLMs to be merged into one gigantic model, would that get us to AGI (artificial general intelligence)?

That’s a question I often get asked when I give talks and presentations on the latest in AI. Here’s my answer. First, it seems doubtful you could make such a law and/or enforce such a law, see my discussion on AI laws at the link here. Second, putting aside that limitation, the question is more so about whether we have what it takes, right now, to produce AI that is on par with overall human intelligence or AGI.

I say no.

I assert that you would get a generative AI that perhaps is somewhat better than the individual ones (assuming you did the merger with amazing adroitness and deftness), but you aren’t going to dramatically move the needle.

The whole is not going to impressively be much greater than the sum of the parts. I am of the viewpoint that we need to find other avenues beyond the path we are taking today to reach AGI, see my discussion at the link here. A building and simmering sentiment by some AI insiders is similar, suggesting that we are going to reach a ceiling with our present approaches. Boom, drop the mic.

I’ve got another mind twister for you.

There are a bunch of conspiracy theories about AI, see my analysis of them at the link here. A variation is that if we did somehow jam together all or a lot of the existing generative AI or LLMs we would have a humongous existential risk on our hands. The idea is similar to the question of giving rise to AGI. The key to this is whether the combined AI would seek to wipe out humankind. Or maybe enslave us. Or both at the same time.

I’m not buying into that premise per se. I’ve repeatedly indicated that for the time being our heightened risk is somewhat more mundane than the AI takeover prophecy.

This is about the dual-use AI dilemma, see my coverage at the link here. We can choose to use AI responsibly or we can shoot our own foot.

Think of it this way. A gigantic AI is crafted via a merger of AIs and we give access to our nuclear arsenal to this AI. The AI isn’t sentient. It isn’t AGI. It is still just everyday AI. But due to a glitch or due to some sneaky embedded hack, or whatever, the AI proceeds to launch the arsenal. This is not because the AI has it out to get us. It is because we have put automation consisting of AI into a position of grand power and are allowing said automation to do things that endanger our lives and existence.

We need to be mindful of what we do with AI.

Back to the mainstay. Regardless of those conceptually disconcerting considerations, there is little doubt that right now there is an urge and tremendous drive toward bigger is better when it comes to AI. An erstwhile path to bigger is better consists of merging AIs. You can expect this to be a growth niche.

Conclusion

Congratulations, you are now generally versed in the hidden world of merging generative AI or LLMs. I’ve tried to arm you with the secrets and incantations that are used. Welcome to the inner sanctum.

Many have grown up believing that Aristotle supposedly uttered that the whole is greater than the sum of the parts. Scholars tend to indicate that what he really said was this: “The totality is not, as it were, a mere heap, but the whole is something besides the parts.”

I’d strongly suggest we keep that bit of wisdom in mind when it comes to merging generative AIs or LLMs. The merger will almost certainly be something besides the parts, but not necessarily greater than the parts. This doesn’t imply we are to be disappointed or discouraged in such pursuits. Just be aware and realistic.

Go ahead and get on with the merging, carefully and mindfully, thanks.

Share.

Leave A Reply

Exit mobile version