RAG

檢索增強生成 - Retrieval Augmented Generation

RAG 主要用來解決大型語言模型（LLM）實際應用時的兩大侷限：幻覺/錯覺（hallucination）與資料時限。RAG 結合「資訊檢索（retrieval）」和「生成（generation）」：在文字生成之前，先從資料庫中檢索相關的資料放入上下文，以確保 LLM 可依照正確的最新資訊生成結果。

RAG 優點：

降低 AI 幻覺
提升資料數據安全
減少模型微調
改善資料時限

流程示意圖

Introduction

Introduction to Retrieval Augmented Generation (RAG) | Weaviate

Tutorials

Introduction to RAG

Embedding/Rerank Models

嵌入模型排行榜
中文
- BCEmbedding
  - HuggingFace: https://huggingface.co/maidalun1020/bce-embedding-base_v1
- BAAI
- GTE
API Service
- Cohere (Rerank)

Vector Databases

Qdrant - 一個開源的向量搜索引擎，旨在處理高維數據。有GUI管理介面。
- 什麼是Qdrant？理解這個向量搜索引擎的最終指南 – AI StartUps Product Information, Reviews, Latest Updates
Chroma
VectorAdmin - 向量資料庫管理介面 (嵌入模型僅支援 OpenAI)
Pinecone (Cloud)
- Introducing Pinecone Inference to streamline your AI workflow | Pinecone
- multilingual-e5-large - Pinecone Docs
Supabase (Cloud)
Astra DB (Cloud)
- Doc: Quickstart | Astra DB Serverless | DataStax Docs

Advanced RAG

RAG Projects

Danswer

Danswer is the AI Assistant connected to your company's docs, apps, and people. Danswer provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud.

GitHub: https://github.com/danswer-ai/danswer
Doc: https://docs.danswer.dev/introduction

Embedchain

Embedchain streamlines the creation of personalized LLM applications, offering a seamless process for managing various types of unstructured data.

Doc: ⚡ Quickstart - Embedchain
GitHub: https://github.com/embedchain/embedchain

GraphRAG

微軟開源一個基於圖譜的檢索與推理增強的解決方案。GraphRAG 透過從預檢索、後檢索到提示壓縮的過程中考慮知識圖譜的檢索與推理，為回答生成提供了一種更精準和相關的方法。

neo4j

Verba

Verba is a fully-customizable personal assistant for querying and interacting with your data, either locally or deployed via cloud. Resolve questions around your documents, cross-reference multiple data points or gain insights from existing knowledge bases. Verba combines state-of-the-art RAG techniques with Weaviate's context-aware database. Choose between different RAG frameworks, data types, chunking & retrieving techniques, and LLM providers based on your individual use-case.

Github: Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
Weaviate is an open source, AI-native vector database
- Doc: Quickstart Tutorial | Weaviate - Vector Database
Video: Open Source RAG with Ollama - YouTube

PrivateGPT

LLMWare

The Ultimate Toolkit for Enterprise RAG Pipelines with Small, Specialized Models.

talkd/dialog

Talkd.ai—Optimizing LLMs with easy RAG deployment and management.

talkd/dialog | dialog
GitHub: https://github.com/talkdai/dialog

RAG 評估

評估生成（Generation）指標

忠誠度（Faithfulness）
忠誠度是評估 RAG 模型生成答案的真實度和可靠性的關鍵指標。它主要衡量生成答案與給定上下文事實之間的一致性。忠誠度高的答案意味著模型能夠準確地從給定的上下文中提取信息，並生成與事實一致的回答。這對於保證生成內容的質量和信任度至關重要。
答案相關性（Answer Relevancy）
答案相關性則重點衡量生成答案與用戶提問的匹配程度。高相關性的答案不僅要求模型能夠理解用戶的問題，還要求其能夠生成與問題密切相關的回答。這直接影響到用戶的滿意度和模型的實用性。
答案正確性（Answer Correctness）
答案正確性是衡量生成的答案與已知的“地面真相”答案之間的一致性。計算方法是評估生成答案的準確度，即答案與真實答案的一致性。技術達成的方式可以通過比較生成答案與真實答案的文字相似度來完成，這類似於答案相關性，但更側重於答案的準確性。

評估檢索（Retrieval）指標

上下文召回率（Context Recall）
上下文召回率關注於模型在檢索過程中能否准確地找到與問題相關的上下文訊息。一個高召回率的模型能夠從大量數據中有效地過濾出最相關的訊息，這是提升問答系統準確性和效率的關鍵。
上下文精確度（Context Precision）
上下文精確度是衡量RAG系統在回答問題時使用的上下文資料的相關性。計算方法是確定RAG系統為回答特定問題而選擇的上下文資料與問題的相關性。技術達成的方式通常涵蓋比較RAG選擇的上下文與一組預先定義的相關上下文，計算這些上下文在生成答案時的重要性。