WHY RE RANKING IN RAG?

https://github.com/BlueBash/RAG-Raptor-RE-Ranker-demo

The method of re-ranking involves a two-stage retrieval system, with re-rankers playing a crucial role in evaluating the relevance of each document to the query. RAG systems can be optimized to mitigate hallucinations and ensure dependable search outcomes by selecting the optimal reranking model.

Introduction

This project simplifies the process of extracting and querying information from complex PDF documents, including intricate content such as text, tables, graphs, and images. By leveraging state-of-the-art natural language processing models and Unstructured.io for document parsing, the system employs a Cohere Reranker to enhance the accuracy and relevance of the information retrieval. The Cohere Reranker refines the initial search results to ensure that the most pertinent information is prioritized.

Following this, the system integrates RAPTOR, which introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This method allows for more efficient and context-aware information retrieval across large texts. Additionally, Raptor Rag is utilized for retrieving semantic chunks, further refining the search results. The chatbot thus provides a user-friendly interface to interact with and retrieve detailed information from these documents.

Re-Ranker

Before jumping into the solution, let's talk about the problem. With RAG, we are performing a semantic search across many text documents — these could be tens of thousands up to tens of billions of documents.

To ensure fast search times at scale, we typically use vector search — that is, we transform our text into vectors, place them all into a vector space, and compare their proximity to a query vector using a similarity metric like cosine similarity.

For vector search to work, we need vectors. These vectors are essentially compressions of the "meaning" behind some text into (typically) 768 or 1536-dimensional vectors. There is some information loss because we're compressing this information into a single vector.

Because of this information loss, we often see that the top three (for example) vector search documents will miss relevant information. Unfortunately, the retrieval may return relevant information below our top_k cutoff.

What do we do if relevant information at a lower position would help our LLM formulate a better response?

The solution to this issue is to maximize retrieval recall by retrieving plenty of documents and then maximize LLM recall by minimizing the number of documents that make it to the LLM. To do that, we reorder retrieved documents and keep just the most relevant for our LLM — to do that, we use reranking.

Shows an illustrated sun in light color mode and a moon with stars in dark color mode.

What is Re-Ranker?

The method of re-ranking involves a two-stage retrieval system, with re-rankers playing a crucial role in evaluating the relevance of each document to the query. RAG systems can be optimized to mitigate hallucinations and ensure dependable search outcomes by selecting the optimal reranking model.

Power of Rerankers.

A reranking model — also known as a cross-encoder — is a type of model that, given a query and document pair, will output a similarity score. We use this score to reorder the documents by relevance to our query.

Search engineers have used rerankers in two-stage retrieval systems for a long time. In these two-stage systems, a first-stage model (an embedding model/retriever) retrieves a set of relevant documents from a larger dataset. Then, a second-stage model (the reranker) is used to rerank those documents retrieved by the first-stage model.

Why Rerankers?

If a reranker is so much slower, why bother using them? The answer is that rerankers are much more accurate than embedding models.

For detailed methodologies and implementations, refer to the original paper:

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Shows an illustrated sun in light color mode and a moon with stars in dark color mode.

RAPTOR introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This allows for more efficient and context-aware information retrieval across large texts, addressing common limitations in traditional language models.

For detailed methodologies and implementations, refer to the original paper:

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Features

Table Extraction: Identify and parse tables to retrieve structured data, making it easier to answer data-specific questions.
Text Extraction: Efficiently extract and process text from PDFs, enabling accurate and comprehensive information retrieval.
Image Analysis: Extract and interpret images within the PDFs to provide contextually relevant information.