網易首頁 > 網易號 > 正文申請入駐

RAG技術全探索：20種方法源碼解讀與實踐

2025-03-20 12:23:38　來源: 機器學習與Python社區

北京舉報

分享至

檢索增強生成（RAG）是一種結合信息檢索與生成模型的混合方法。它通過引入外部知識來提升語言模型的性能，從而提高回答的準確性和事實正確性。

基礎RAG（Basic RAG）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/1_simple_rag.ipynb

在簡單RAG設置中，我們遵循以下步驟：

數據讀取：加載并預處理文本數據。
數據分塊：將數據拆分成更小的塊，以提升檢索性能。
嵌入生成：利用嵌入模型將文本塊轉換為數值表示。
語義搜索：根據用戶查詢檢索相關文本塊。
響應生成：基于檢索到的文本，利用語言模型生成回答。

def chunk_text(text, n, overlap):     chunks = []  # Initialize an empty list to store the chunks          # Loop through the text with a step size of (n - overlap)     for i in range(0, len(text), n - overlap):         # Append a chunk of text from index i to i + n to the chunks list         chunks.append(text[i:i + n])     return chunks  # Return the list of text chunks

# Initialize the OpenAI client with the base URL and API key client = OpenAI(     base_url="https://api.studio.nebius.com/v1/",     api_key=os.getenv("OPENAI_API_KEY")  # Retrieve the API key from environment variables )

# Define the system prompt for the AI assistant system_prompt = "You are an AI assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'" def generate_response(system_prompt, user_message, model="meta-llama/Llama-3.2-3B-Instruct"):     """     Generates a response from the AI model based on the system prompt and user message.     Args:     system_prompt (str): The system prompt to guide the AI's behavior.     user_message (str): The user's message or query.     model (str): The model to be used for generating the response. Default is "meta-llama/Llama-2-7B-chat-hf".     Returns:     dict: The response from the AI model.     """     response = client.chat.completions.create(         model=model,         temperature=0,         messages=[             {"role": "system", "content": system_prompt},             {"role": "user", "content": user_message}         ]     )     return response

語義分塊（Semantic Chunking）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/2_semantic_chunking.ipynb

與傳統的固定長度分塊方法不同，語義分塊會根據句子之間的語義相似性來確定分塊的邊界。

這種方法通過計算句子嵌入向量的相似度來確定分塊。當句子之間的語義相似度低于某個閾值時，就會將文本劃分為不同的塊。例如，可以使用滑動窗口技術計算句子之間的語義相關性。

def compute_breakpoints(similarities, method="percentile", threshold=90):     # Determine the threshold value based on the selected method     if method == "percentile":         # Calculate the Xth percentile of the similarity scores         threshold_value = np.percentile(similarities, threshold)     elif method == "standard_deviation":         # Calculate the mean and standard deviation of the similarity scores         mean = np.mean(similarities)         std_dev = np.std(similarities)         # Set the threshold value to mean minus X standard deviations         threshold_value = mean - (threshold * std_dev)     elif method == "interquartile":         # Calculate the first and third quartiles (Q1 and Q3)         q1, q3 = np.percentile(similarities, [25, 75])         # Set the threshold value using the IQR rule for outliers         threshold_value = q1 - 1.5 * (q3 - q1)     else:         # Raise an error if an invalid method is provided         raise ValueError("Invalid method. Choose 'percentile', 'standard_deviation', or 'interquartile'.")     # Identify indices where similarity drops below the threshold value     return [i for i, sim in enumerate(similarities) if sim < threshold_value]

評估分塊大小（Evaluating Chunk Size）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/3_chunk_size_selector.ipynb

評估忠實度和相關性：評估生成回答的忠實度（是否準確反映了檢索到的分塊內容）和相關性（是否與用戶查詢相關）。
比較不同分塊大小的結果：通過對比不同分塊大小下的檢索和生成效果，確定最適合的分塊大小。

# Define strict evaluation prompt templates FAITHFULNESS_PROMPT_TEMPLATE = """ Evaluate the faithfulness of the AI response compared to the true answer. User Query: {question} AI Response: {response} True Answer: {true_answer} Faithfulness measures how well the AI response aligns with facts in the true answer, without hallucinations. INSTRUCTIONS: - Score STRICTLY using only these values:     * {full} = Completely faithful, no contradictions with true answer     * {partial} = Partially faithful, minor contradictions     * {none} = Not faithful, major contradictions or hallucinations - Return ONLY the numerical score ({full}, {partial}, or {none}) with no explanation or additional text. """

上下文增強檢索（Context-Enriched Retrieval）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/4_context_enriched_rag.ipynb

帶重疊上下文的分塊：將文本分割成帶有重疊上下文的塊，以保留語義連貫性。
嵌入生成：將文本塊轉換為數值表示（嵌入向量）。
上下文感知檢索：檢索相關文本塊及其相鄰塊，以提高回答的完整性。

def context_enriched_search(query, text_chunks, embeddings, k=1, context_size=1):     """     Retrieves the most relevant chunk along with its neighboring chunks.     """     # Convert the query into an embedding vector     query_embedding = create_embeddings(query).data[0].embedding     similarity_scores = []     # Compute similarity scores between query and each text chunk embedding     for i, chunk_embedding in enumerate(embeddings):         # Calculate cosine similarity between the query embedding and current chunk embedding         similarity_score = cosine_similarity(np.array(query_embedding), np.array(chunk_embedding.embedding))         # Store the index and similarity score as a tuple         similarity_scores.append((i, similarity_score))     # Sort chunks by similarity score in descending order (highest similarity first)     similarity_scores.sort(key=lambda x: x[1], reverse=True)     # Get the index of the most relevant chunk     top_index = similarity_scores[0][0]     # Define the range for context inclusion     # Ensure we don't go below 0 or beyond the length of text_chunks     start = max(0, top_index - context_size)     end = min(len(text_chunks), top_index + context_size + 1)     # Return the relevant chunk along with its neighboring context chunks     return [text_chunks[i] for i in range(start, end)]

上下文塊標題（Contextual Chunk Headers, CCH）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/5_contextual_chunk_headers_rag.ipynb

CCH通過在每個分塊的前面添加高級別的上下文信息（例如文檔標題或章節標題），然后再對分塊進行嵌入處理。這種方法能夠顯著提升檢索質量，避免生成與上下文無關的回答。

添加上下文標題：在對文本進行分塊之前，提取文檔的高級別上下文信息，如標題、章節名或小標題等。
構建帶上下文的分塊：將提取到的上下文信息作為“標題”附加到每個分塊的開頭，形成帶上下文的分塊。

def generate_chunk_header(chunk, model="meta-llama/Llama-3.2-3B-Instruct"):     """     Generates a title/header for a given text chunk using an LLM.     """     # Define the system prompt to guide the AI's behavior     system_prompt = "Generate a concise and informative title for the given text."          # Generate a response from the AI model based on the system prompt and text chunk     response = client.chat.completions.create(         model=model,         temperature=0,         messages=[             {"role": "system", "content": system_prompt},             {"role": "user", "content": chunk}         ]     )     # Return the generated header/title, stripping any leading/trailing whitespace     return response.choices[0].message.content.strip()

問題生成（Document Augmentation）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/6_doc_augmentation_rag.ipynb

通過為每個文本塊生成相關的問題，我們能夠改進檢索過程，從而讓語言模型生成更優質、更準確的回答。

def generate_questions(text_chunk, num_questions=5, model="meta-llama/Llama-3.2-3B-Instruct"):     """     Generates relevant questions that can be answered from the given text chunk.     Args:     text_chunk (str): The text chunk to generate questions from.     num_questions (int): Number of questions to generate.     model (str): The model to use for question generation.     """          # Generate questions using the OpenAI API     response = client.chat.completions.create(         model=model,         temperature=0.7,         messages=[             {"role": "system", "content": system_prompt},             {"role": "user", "content": user_prompt}         ]     )          # Extract and clean questions from the response     questions_text = response.choices[0].message.content.strip()     questions = []          # Extract questions using regex pattern matching     for line in questions_text.split('\n'):         # Remove numbering and clean up whitespace         cleaned_line = re.sub(r'^\d+\.\s*', '', line.strip())         if cleaned_line and cleaned_line.endswith('?'):             questions.append(cleaned_line)          return questions

查詢轉換（Query Transformations）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/7_query_transform.ipynb

查詢重寫（Query Rewriting）：通過分析用戶查詢的意圖，添加更具體的關鍵詞或限定條件；
后退提示（Step-back Prompting）：通過擴展查詢的范圍，提取更廣泛的背景信息；
子查詢分解（Sub-query Decomposition）：將一個復雜的查詢拆分為多個簡單查詢，分別檢索，然后整合結果；

def rewrite_query(original_query, model="meta-llama/Llama-3.2-3B-Instruct"):     # Define the system prompt to guide the AI assistant's behavior     system_prompt = "You are an AI assistant specialized in improving search queries. Your task is to rewrite user queries to be more specific, detailed, and likely to retrieve relevant information."          # Define the user prompt with the original query to be rewritten     user_prompt = f"""     Rewrite the following query to make it more specific and detailed. Include relevant terms and concepts that might help in retrieving accurate information.          Original query: {original_query}          Rewritten query:     """          # Generate the rewritten query using the specified model     response = client.chat.completions.create(         model=model,         temperature=0.0,  # Low temperature for deterministic output         messages=[             {"role": "system", "content": system_prompt},             {"role": "user", "content": user_prompt}         ]     )          # Return the rewritten query, stripping any leading/trailing whitespace     return response.choices[0].message.content.strip()

重排序（Reranker）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/8_reranker.ipynb

對初始檢索得到的每個文檔或文本塊進行相關性評估，計算其與用戶查詢的匹配程度。此外將最相關的文檔或文本塊排在前面，確保高質量的內容優先被考慮用于回答生成。

相關片段提取（Relevant Segment Extraction, RSE）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/9_rse.ipynb

從檢索到的文本塊中識別出與用戶查詢高度相關的片段，將識別出的相關片段重新組合成連續的文本段落。

上下文壓縮（Contextual Compression）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/10_contextual_compression.ipynb

在RAG中檢索文檔時，我們常常會得到包含相關和不相關信息的文本塊。上下文壓縮可以幫助我們：

去除無關的句子和段落；
僅關注與查詢相關的信息；
在上下文窗口中最大化有用信號。

反饋循環（Feedback Loop）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/11_feedback_loop_rag.ipynb

傳統的RAG系統是靜態的——它們僅基于嵌入向量的相似性檢索信息。而引入反饋循環后，我們創建了一個動態系統，它能夠：

記住哪些有效（哪些無效）：通過記錄用戶對回答的評價，系統可以了解哪些檢索結果和回答是高質量的，哪些需要改進。
調整文檔相關性評分：根據用戶反饋，系統會動態調整文檔或文本塊的相關性評分，使其在未來的檢索中更精準地反映用戶需求。
將成功的問答對納入知識庫：系統會將用戶認可的問答對作為新的知識納入其知識庫，以便在后續交互中直接使用。
隨著每次用戶交互變得更智能：通過不斷學習用戶的偏好和需求，系統能夠逐步優化其檢索和生成策略，從而提供更優質的回答。

自適應檢索（Adaptive Retrieval）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/12_adaptive_rag.ipynb

不同的問題需要不同的檢索策略。我們的系統通過以下步驟實現自適應檢索：

識別用戶查詢的類型，例如事實性問題（Factual）、分析性問題（Analytical）、觀點性問題（Opinion）或上下文性問題（Contextual）。
為不同類型的問題設計不同的檢索策略；
根據選定的檢索策略，執行具體的檢索技術；

Self-RAG

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/13_self_rag.ipynb

Self-RAG在檢索和生成過程中引入了動態決策機制，能夠根據具體需求決定何時以及如何使用檢索到的信息。這種機制使得Self-RAG能夠生成更高質量、更可靠的回答。

判斷對于給定的查詢是否需要進行檢索。
在需要時檢索可能相關的文檔。

命題分塊（Proposition Chunking）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/14_proposition_chunking.ipynb

命題分塊用于將文檔分解為原子化的事實性陳述，從而實現更精準的檢索。與傳統的基于字符數的簡單分塊方法不同，命題分塊能夠保留每個事實的語義完整性。

多模態RAG

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/15_multimodel_rag.ipynb

多模態檢索增強生成（Multi-Modal RAG）系統能夠從文檔中提取文本和圖像內容，為圖像生成描述性標題，并結合這兩種內容類型來回答用戶查詢。這種方法通過將視覺信息納入知識庫，顯著增強了傳統RAG系統的能力。

傳統RAG系統僅處理文本數據，但許多文檔中的關鍵信息實際上包含在圖像、圖表和表格中。通過為這些視覺元素生成描述，并將其納入檢索系統，我們能夠：

獲取圖表和示意圖中的信息：解鎖圖像和圖表中隱藏的信息，使其能夠被檢索和利用。
理解表格和圖表：通過圖像描述理解與文本內容相輔相成的表格和圖表。
構建更全面的知識庫：將視覺信息與文本信息相結合，形成更豐富、更全面的知識庫。
回答依賴視覺數據的問題：能夠回答那些需要結合視覺信息才能解答的問題。

融合檢索（Fusion Retrieval）

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/16_fusion_rag.ipynb

融合檢索（Fusion Retrieval）系統結合了語義向量搜索（Semantic Vector Search）和基于關鍵詞的BM25檢索的優勢。這種方法通過捕捉概念上的相似性和精確的關鍵詞匹配，顯著提升了檢索質量。

Graph RAG

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/17_graph_rag.ipynb

與傳統的RAG系統相比，Graph RAG通過將知識組織為一個連接的圖結構，而不是一個平面的文檔集合，從而顯著提升了系統的檢索能力和生成質量。這種方法允許系統在相關概念之間進行導航，檢索出比標準向量相似性方法更具上下文相關性的信息。

層次化索引

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/18_hierarchy_rag.ipynb

層次化索引（Hierarchical Indices）方法通過采用兩層檢索策略來提升檢索效率和質量：首先通過摘要識別相關的文檔部分，然后從這些部分中檢索具體的細節。

Hypothetical Document Embedding (HyDE)

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/19_HyDE_rag.ipynb

HyDE通過將用戶查詢轉換為假設的文檔答案，再進行檢索，從而彌合了簡短查詢與長篇文檔之間的語義鴻溝。

Corrective RAG (CRAG)

https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/20_crag.ipynb

糾正型檢索增強生成（Corrective RAG, CRAG）通過動態評估檢索到的信息，并在必要時使用網絡搜索作為備用手段來糾正檢索過程，從而顯著提升了系統的準確性和可靠性。

特別聲明：以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.