RAG, AI Agents & Context Engineering
RAG
Though you’re able to optimize prompts with prompt engineering, if a model isn’t well trained, for your use case, you’ll still get irrelevant or no results at all. There are many ways to solve problems with LLMs and AI in general. One of the most effective and popular approaches is a technique called RAG.
Retrieval Augmented Generation (RAG) is a technique in AI that combines data retrieval with a generative model to improve its output. To give a simplistic overview of how models generate output, think of a 3D space with items arranged at various positions. Some will be above each other, beside, at an angle, and so on. When a model receives a prompt, it compares it with this data. It then responds with tokens that are closest in space to the given prompt based on the model’s understanding of the prompt - patterns it has learned.
So when a model doesn’t have enough data about your given prompt, its responses will be tokens much farther away from what is desired. This usually means the results will be less relevant and at other times, wildly inaccurate. There are many reasons for these inaccuracies, but they are often caused by the quality of the model’s training. The training data may be incomplete or biased. If you train a model to believe that 1 + 1 = 3, that’s what you’ll get when you ask it.
In reality though, no matter how good an LLM is, it will never know everything. Sure, it may know many generally available basic facts, but it’ll never be able to tell you about research work documented in a university somewhere that wasn’t part of its training data. Neither can it tell you who won the last FIFA world cup if it was never fed such information, and doesn’t have any way of acquiring that information at the time of asking.
This is where RAG comes in. It allows LLMs to generate responses based on extra data. This greatly reduces hallucinations, improving the quality of the responses. The LLM doesn’t have to guess any longer because it now has all that it needs in addition to what it already knows. This also means that the additional data needs to be high-quality in order to have a better response.
You may not always be able to give the LLM the extra data you need. It could be because it’s way too much data, or it contains inaccuracies that will require some massaging to get a good final response. Allowing RAG systems to retrieve data on their own greatly enhances their abilities. These are known as tools. You can go even further by empowering your application to use these tools to accomplish tasks on your behalf without your direct input. Keep reading to learn more about these advanced AI uses.
AI Agents
AI Agents are software programs that use AI to make decisions and perform tasks with minimal to no human intervention. AI Agents typically have a persona, memory, tools and are backed by a model, usually an LLM. They’re designed to take on a specific role and are equipped with external resources as tools to aid in accomplishing their given tasks. Given that they have memory, they’re able to stay on a topic and delve deeper to reason about their tasks until they’re done. Common applications of AI Agents include customer service chatbots, autonomous vehicles, recommendation systems, fraud detection systems, and HR recruitment tools.
Building AI agents is an advanced practice when seeking to make the most of AI. It’s one of the most advanced use cases of AI today. Consider autonomous vehicles, rockets, and other machines that accomplish complex tasks in various fields. They take time to build and even more time to refine continuously until they become commercially usable. Given their complex nature, they’re also expensive to run. They use lots of tokens and, consequently, significant computing resources. Since they act autonomously for the most part, AI agents require extensive checks and balances to ensure efficiency, accuracy, and high performance. In a bid to address this, AI scientists and engineers have gradually shifted their attention to something now known as context engineering.
Context Engineering
Context engineering is widely seen as a superset of prompt engineering and many of the other components that make a great AI agent. While prompt engineering focuses on optimizing prompts, context engineering focuses on optimizing everything included in a prompt. The context therefore comprises components like the system prompt, user prompt, memory, RAG, tools, and a defined output format. Context engineering is the process of designing and optimizing instructions together with all relevant context for models to perform a given task.
A key feature of context engineering is iteration. Whereas in prompt engineering the focus is on optimizing a single prompt for maximum output, context engineering iteratively optimizes every instruction for every one of the components mentioned earlier relevant for achieving the given task. Apart from a few differences, a prompt in context engineering closely resembles what you’d get from prompt engineering. In the subsequent lessons, you’ll see practical examples of context engineering in action.