Since the release of ChatGPT, AI and Generative AI have changed the world and business. We see the use of Generative AI in almost all areas, whether on a smaller or larger scale, but Gen AI have made an impact. One of the most common use cases, other than content creation is Retrieval Augmentation Generation aka RAG. Since Large Language Model are trained on the internet data, and they come with the known limitation of cut off date, RAG has emerged to help businesses make use of Large Language Models in the various industries. RAG has become state of the art for many industry use cases.
What is RAG?
RAG is an AI framework which enables you to leverage capabilities of Large Language Model (LLM) to retrieve information, answers from a given knowledge base. It allows you to use LLM for specific domain, resulting in more accuracy, has up to date information.
RAG is used in various applications like,
Knowledge ingestions:
We gather data from different sources, like documents (docx, pptx, pdf), audio (mp3), video (mp4), and images (png/jpeg). Using techniques like Optical Character Recognition (OCR) for converting pdf to text, and transcriptions for converting videos or audio to text, we standardize all data from different format to a common format like text or JSON(which is helpful for handling metadata as well).
Some documents can be very large. For Example. Harry Potter and the Deathly Hallows has 784 pages, proceeding such a large context can be costly, and LLM will find difficulties to handle efficiently. To handle this issue, we use a method called Chunking, which is a cognitive process used to break down large pieces of information into smaller, more manageable units or “chunks.”
After chunking, we do the embeddings. An Embedding is a numerical representation of text, typically a word, sentence, or document, in a high-dimensional vector space. This step is required as system understand numbers, and not text.
Next, we use specialized database, which are different than traditional relational database. A vector database is a database designed to store and manage high-dimensional vectors that represent text or other types of data. Vector database are amazing while searching data from large volume of data using semantic similarity rather than exact matches.
Asking Questions & Generating Answers:
Once the user asks a question, we generate an embedding for that query, and find the most similar chunks from the vector database. The vector database provides a matching score, allowing us to rank the chunks based on score, and select first top-k chunks (can be 1, 3, 5 or 6 chunks, depend on the use case) and sending these chunks, along with the question, to a LLM (Large Language Model) like GPT, Gemini, LLaMA. The LLM will then respond you with appropriate answer based on the chunks/context.
This simple RAG has lot of limitations,
Inaccurate or Partial correct answer: When a user asks the question, the context behind question can vary significantly. For example, if someone asks, “Can you tell me about Amazon?”, the question could be related to Amazon river or multi national e-commerce company. In real-world business scenarios, questions are even more context-dependent, which could be industries, country, or location specific. RAG in many cases fails to understand the context if it is not provided accurately, leading to partial correct or totally wrong answers. These could be because chunks retrieved are not correct, or context was not provided correctly in question.
Cost: If we keep asking the questions to the LLM, cost could increase significantly. Even retrieving chunks/context from a vector database can be expensive, and this cost could rise further if users are not satisfied with answers, and continue asking.
Hallucinations: Hallucinations is a perception of seeing something, which doesn’t exist in reality. Even though with powerful LLM are coming, LLMs will always hallucinate, and we need to live with this. Like Air Canada example where bot created policy out of no where.
These are very few limitations we talked about, but there can be lot of different limitations in RAG.