When you talk to most people who have been casually following AI – unsurprisingly the first thing that comes to mind is Chat GPT. Open AI has done an absolutely masterful job of marketing its flagship product, which was the fastest-growing consumer product ever at one point.
But now that we’ve had more than a year of Chat GPT on the market, there are not only plenty of alternative options, but a desire in the business community to pause and understand what the technology is behind LLMs.
This article is intended to provide a basic overview of what LLMs are, some potential use cases, and what your options are – since Open AI is obviously not the only vendor of LLMs on the market.
What are LLMs?
Think of LLMs as sophisticated language learning machines. Trained on massive datasets of text and code, they develop an ability to understand and generate human-quality language. While chatbots excel at scripted interactions, LLMs go further, grasping context, nuances, and complex sentence structures. They can write different kinds of creative content, translate languages, analyze sentiment, and even answer your questions in an informative way.
What is the technology behind LLMs?
Architecture: LLMs are typically based on the Transformer architecture, introduced in the paper “Attention is All You Need” by Vaswani et al. in 2017. Without getting too into the weeds, transformers use self-attention mechanisms to weigh the importance of different words within a sentence, enabling a deeper understanding of context.
Pre-training: LLMs undergo an extensive pre-training phase on vast datasets of text. During pre-training, the model learns to predict the next word in a sentence given the previous words among other tasks designed to improve its understanding of language syntax, semantics, and context. This phase requires massive amounts of computational resources and time, which is why you hear about companies like NVIDIA really thriving in this environment.
Fine-tuning: After pre-training, LLMs can be fine-tuned on smaller, domain-specific datasets. This process adapts the model to specific tasks, such as question answering, sentiment analysis, or document summarization, enhancing its performance on those tasks by tailoring its responses to the nuances of the target domain. You are able to fine-tune existing LLMs like Open AI’s GPT 3.5 – with your own data.
The time and cost of building your own LLM would be prohibitively expensive in all likelihood. It is estimated that models like Anthropic, Google Gemini and GPT-4 are trained on trillions of words. So the best option for most of the world is to build products on top of existing LLMs rather than create your own (unless you are sitting on robust amounts of proprietary data).
How can you use LLMs?
I am separately writing some potential use cases for LLMs as part of an ongoing series in this column. But some of the most oft-used reasons to use these LLMs are:
- Code generation
- Writing marketing copy
- Customer service
- Translation
The list goes on, but the pace of innovation in the space is mind-boggling. As an example – Open AI just released a product called Sora in the past week – which allows those with access to generate one-minute long videos from text prompts.
What are some options outside of Open AI technology?
As I alluded to at the start of the article, there are many different LLM options available on market currently. One consideration is whether to use open-source LLMs, or closed-source. Open-source models offer transparency and community development, but might require more technical expertise and raise data security concerns. Closed-source models generally provide ease of use, support, and security, but can be expensive and limit customization.
Some LLMs to consider:
- BLOOM – Science specific
- PaLM – by Google
- Claude – by Anthropic.
- Cohere – Enterprise focused
- Llama – by Meta
There are of course many other options, but be sure to research what the best option is for you as you embark on using AI for your company.