檢索增強生成(RAG)是一種結合信息檢索與生成模型的混合方法。它通過引入外部知識來提升語言模型的性能,從而提高回答的準確性和事實正確性。
基礎RAG(Basic RAG)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/1_simple_rag.ipynb
在簡單RAG設置中,我們遵循以下步驟:
數據讀取:加載并預處理文本數據。
數據分塊:將數據拆分成更小的塊,以提升檢索性能。
嵌入生成:利用嵌入模型將文本塊轉換為數值表示。
語義搜索:根據用戶查詢檢索相關文本塊。
響應生成:基于檢索到的文本,利用語言模型生成回答。
def chunk_text(text, n, overlap): chunks = [] # Initialize an empty list to store the chunks # Loop through the text with a step size of (n - overlap) for i in range(0, len(text), n - overlap): # Append a chunk of text from index i to i + n to the chunks list chunks.append(text[i:i + n]) return chunks # Return the list of text chunks
# Initialize the OpenAI client with the base URL and API key client = OpenAI( base_url="https://api.studio.nebius.com/v1/", api_key=os.getenv("OPENAI_API_KEY") # Retrieve the API key from environment variables )
# Define the system prompt for the AI assistant system_prompt = "You are an AI assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'" def generate_response(system_prompt, user_message, model="meta-llama/Llama-3.2-3B-Instruct"): """ Generates a response from the AI model based on the system prompt and user message. Args: system_prompt (str): The system prompt to guide the AI's behavior. user_message (str): The user's message or query. model (str): The model to be used for generating the response. Default is "meta-llama/Llama-2-7B-chat-hf". Returns: dict: The response from the AI model. """ response = client.chat.completions.create( model=model, temperature=0, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_message} ] ) return response
語義分塊(Semantic Chunking)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/2_semantic_chunking.ipynb
與傳統的固定長度分塊方法不同,語義分塊會根據句子之間的語義相似性來確定分塊的邊界。
這種方法通過計算句子嵌入向量的相似度來確定分塊。當句子之間的語義相似度低于某個閾值時,就會將文本劃分為不同的塊。例如,可以使用滑動窗口技術計算句子之間的語義相關性。
def compute_breakpoints(similarities, method="percentile", threshold=90): # Determine the threshold value based on the selected method if method == "percentile": # Calculate the Xth percentile of the similarity scores threshold_value = np.percentile(similarities, threshold) elif method == "standard_deviation": # Calculate the mean and standard deviation of the similarity scores mean = np.mean(similarities) std_dev = np.std(similarities) # Set the threshold value to mean minus X standard deviations threshold_value = mean - (threshold * std_dev) elif method == "interquartile": # Calculate the first and third quartiles (Q1 and Q3) q1, q3 = np.percentile(similarities, [25, 75]) # Set the threshold value using the IQR rule for outliers threshold_value = q1 - 1.5 * (q3 - q1) else: # Raise an error if an invalid method is provided raise ValueError("Invalid method. Choose 'percentile', 'standard_deviation', or 'interquartile'.") # Identify indices where similarity drops below the threshold value return [i for i, sim in enumerate(similarities) if sim < threshold_value]
評估分塊大小(Evaluating Chunk Size)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/3_chunk_size_selector.ipynb
評估忠實度和相關性:評估生成回答的忠實度(是否準確反映了檢索到的分塊內容)和相關性(是否與用戶查詢相關)。
比較不同分塊大小的結果:通過對比不同分塊大小下的檢索和生成效果,確定最適合的分塊大小。
# Define strict evaluation prompt templates FAITHFULNESS_PROMPT_TEMPLATE = """ Evaluate the faithfulness of the AI response compared to the true answer. User Query: {question} AI Response: {response} True Answer: {true_answer} Faithfulness measures how well the AI response aligns with facts in the true answer, without hallucinations. INSTRUCTIONS: - Score STRICTLY using only these values: * {full} = Completely faithful, no contradictions with true answer * {partial} = Partially faithful, minor contradictions * {none} = Not faithful, major contradictions or hallucinations - Return ONLY the numerical score ({full}, {partial}, or {none}) with no explanation or additional text. """
上下文增強檢索(Context-Enriched Retrieval)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/4_context_enriched_rag.ipynb
帶重疊上下文的分塊:將文本分割成帶有重疊上下文的塊,以保留語義連貫性。
嵌入生成:將文本塊轉換為數值表示(嵌入向量)。
上下文感知檢索:檢索相關文本塊及其相鄰塊,以提高回答的完整性。
def context_enriched_search(query, text_chunks, embeddings, k=1, context_size=1): """ Retrieves the most relevant chunk along with its neighboring chunks. """ # Convert the query into an embedding vector query_embedding = create_embeddings(query).data[0].embedding similarity_scores = [] # Compute similarity scores between query and each text chunk embedding for i, chunk_embedding in enumerate(embeddings): # Calculate cosine similarity between the query embedding and current chunk embedding similarity_score = cosine_similarity(np.array(query_embedding), np.array(chunk_embedding.embedding)) # Store the index and similarity score as a tuple similarity_scores.append((i, similarity_score)) # Sort chunks by similarity score in descending order (highest similarity first) similarity_scores.sort(key=lambda x: x[1], reverse=True) # Get the index of the most relevant chunk top_index = similarity_scores[0][0] # Define the range for context inclusion # Ensure we don't go below 0 or beyond the length of text_chunks start = max(0, top_index - context_size) end = min(len(text_chunks), top_index + context_size + 1) # Return the relevant chunk along with its neighboring context chunks return [text_chunks[i] for i in range(start, end)]
上下文塊標題(Contextual Chunk Headers, CCH)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/5_contextual_chunk_headers_rag.ipynb
CCH通過在每個分塊的前面添加高級別的上下文信息(例如文檔標題或章節標題),然后再對分塊進行嵌入處理。這種方法能夠顯著提升檢索質量,避免生成與上下文無關的回答。
添加上下文標題:在對文本進行分塊之前,提取文檔的高級別上下文信息,如標題、章節名或小標題等。
構建帶上下文的分塊:將提取到的上下文信息作為“標題”附加到每個分塊的開頭,形成帶上下文的分塊。
def generate_chunk_header(chunk, model="meta-llama/Llama-3.2-3B-Instruct"): """ Generates a title/header for a given text chunk using an LLM. """ # Define the system prompt to guide the AI's behavior system_prompt = "Generate a concise and informative title for the given text." # Generate a response from the AI model based on the system prompt and text chunk response = client.chat.completions.create( model=model, temperature=0, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": chunk} ] ) # Return the generated header/title, stripping any leading/trailing whitespace return response.choices[0].message.content.strip()
問題生成(Document Augmentation)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/6_doc_augmentation_rag.ipynb
通過為每個文本塊生成相關的問題,我們能夠改進檢索過程,從而讓語言模型生成更優質、更準確的回答。
def generate_questions(text_chunk, num_questions=5, model="meta-llama/Llama-3.2-3B-Instruct"): """ Generates relevant questions that can be answered from the given text chunk. Args: text_chunk (str): The text chunk to generate questions from. num_questions (int): Number of questions to generate. model (str): The model to use for question generation. """ # Generate questions using the OpenAI API response = client.chat.completions.create( model=model, temperature=0.7, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] ) # Extract and clean questions from the response questions_text = response.choices[0].message.content.strip() questions = [] # Extract questions using regex pattern matching for line in questions_text.split('\n'): # Remove numbering and clean up whitespace cleaned_line = re.sub(r'^\d+\.\s*', '', line.strip()) if cleaned_line and cleaned_line.endswith('?'): questions.append(cleaned_line) return questions
查詢轉換(Query Transformations)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/7_query_transform.ipynb
查詢重寫(Query Rewriting):通過分析用戶查詢的意圖,添加更具體的關鍵詞或限定條件;
后退提示(Step-back Prompting):通過擴展查詢的范圍,提取更廣泛的背景信息;
子查詢分解(Sub-query Decomposition):將一個復雜的查詢拆分為多個簡單查詢,分別檢索,然后整合結果;
def rewrite_query(original_query, model="meta-llama/Llama-3.2-3B-Instruct"): # Define the system prompt to guide the AI assistant's behavior system_prompt = "You are an AI assistant specialized in improving search queries. Your task is to rewrite user queries to be more specific, detailed, and likely to retrieve relevant information." # Define the user prompt with the original query to be rewritten user_prompt = f""" Rewrite the following query to make it more specific and detailed. Include relevant terms and concepts that might help in retrieving accurate information. Original query: {original_query} Rewritten query: """ # Generate the rewritten query using the specified model response = client.chat.completions.create( model=model, temperature=0.0, # Low temperature for deterministic output messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] ) # Return the rewritten query, stripping any leading/trailing whitespace return response.choices[0].message.content.strip()
重排序(Reranker)https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/8_reranker.ipynb
對初始檢索得到的每個文檔或文本塊進行相關性評估,計算其與用戶查詢的匹配程度。此外將最相關的文檔或文本塊排在前面,確保高質量的內容優先被考慮用于回答生成。
相關片段提取(Relevant Segment Extraction, RSE)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/9_rse.ipynb
從檢索到的文本塊中識別出與用戶查詢高度相關的片段,將識別出的相關片段重新組合成連續的文本段落。
上下文壓縮(Contextual Compression)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/10_contextual_compression.ipynb
在RAG中檢索文檔時,我們常常會得到包含相關和不相關信息的文本塊。上下文壓縮可以幫助我們:
去除無關的句子和段落;
僅關注與查詢相關的信息;
在上下文窗口中最大化有用信號。
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/11_feedback_loop_rag.ipynb
傳統的RAG系統是靜態的——它們僅基于嵌入向量的相似性檢索信息。而引入反饋循環后,我們創建了一個動態系統,它能夠:
記住哪些有效(哪些無效):通過記錄用戶對回答的評價,系統可以了解哪些檢索結果和回答是高質量的,哪些需要改進。
調整文檔相關性評分:根據用戶反饋,系統會動態調整文檔或文本塊的相關性評分,使其在未來的檢索中更精準地反映用戶需求。
將成功的問答對納入知識庫:系統會將用戶認可的問答對作為新的知識納入其知識庫,以便在后續交互中直接使用。
隨著每次用戶交互變得更智能:通過不斷學習用戶的偏好和需求,系統能夠逐步優化其檢索和生成策略,從而提供更優質的回答。
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/12_adaptive_rag.ipynb
不同的問題需要不同的檢索策略。我們的系統通過以下步驟實現自適應檢索:
識別用戶查詢的類型,例如事實性問題(Factual)、分析性問題(Analytical)、觀點性問題(Opinion)或上下文性問題(Contextual)。
為不同類型的問題設計不同的檢索策略;
根據選定的檢索策略,執行具體的檢索技術;
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/13_self_rag.ipynb
Self-RAG在檢索和生成過程中引入了動態決策機制,能夠根據具體需求決定何時以及如何使用檢索到的信息。這種機制使得Self-RAG能夠生成更高質量、更可靠的回答。
判斷對于給定的查詢是否需要進行檢索。
在需要時檢索可能相關的文檔。
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/14_proposition_chunking.ipynb
命題分塊用于將文檔分解為原子化的事實性陳述,從而實現更精準的檢索。與傳統的基于字符數的簡單分塊方法不同,命題分塊能夠保留每個事實的語義完整性。
多模態RAG
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/15_multimodel_rag.ipynb
多模態檢索增強生成(Multi-Modal RAG)系統能夠從文檔中提取文本和圖像內容,為圖像生成描述性標題,并結合這兩種內容類型來回答用戶查詢。這種方法通過將視覺信息納入知識庫,顯著增強了傳統RAG系統的能力。
傳統RAG系統僅處理文本數據,但許多文檔中的關鍵信息實際上包含在圖像、圖表和表格中。通過為這些視覺元素生成描述,并將其納入檢索系統,我們能夠:
獲取圖表和示意圖中的信息:解鎖圖像和圖表中隱藏的信息,使其能夠被檢索和利用。
理解表格和圖表:通過圖像描述理解與文本內容相輔相成的表格和圖表。
構建更全面的知識庫:將視覺信息與文本信息相結合,形成更豐富、更全面的知識庫。
回答依賴視覺數據的問題:能夠回答那些需要結合視覺信息才能解答的問題。
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/16_fusion_rag.ipynb
融合檢索(Fusion Retrieval)系統結合了語義向量搜索(Semantic Vector Search)和基于關鍵詞的BM25檢索的優勢。這種方法通過捕捉概念上的相似性和精確的關鍵詞匹配,顯著提升了檢索質量。
Graph RAG
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/17_graph_rag.ipynb
與傳統的RAG系統相比,Graph RAG通過將知識組織為一個連接的圖結構,而不是一個平面的文檔集合,從而顯著提升了系統的檢索能力和生成質量。這種方法允許系統在相關概念之間進行導航,檢索出比標準向量相似性方法更具上下文相關性的信息。
層次化索引
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/18_hierarchy_rag.ipynb
層次化索引(Hierarchical Indices)方法通過采用兩層檢索策略來提升檢索效率和質量:首先通過摘要識別相關的文檔部分,然后從這些部分中檢索具體的細節。
Hypothetical Document Embedding (HyDE)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/19_HyDE_rag.ipynb
HyDE通過將用戶查詢轉換為假設的文檔答案,再進行檢索,從而彌合了簡短查詢與長篇文檔之間的語義鴻溝。
Corrective RAG (CRAG)
https://github.com/FareedKhan-dev/all-rag-techniques/blob/main/20_crag.ipynb
糾正型檢索增強生成(Corrective RAG, CRAG)通過動態評估檢索到的信息,并在必要時使用網絡搜索作為備用手段來糾正檢索過程,從而顯著提升了系統的準確性和可靠性。
特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.