LangChain

LangChain 是一個旨在為開發者提供一套工具和程式介接，以便更容易、更有效地利用大型語言模型（LLM）的開源開發框架，專注於情境感知和推理。它包含多個組件，如 Python 和 JavaScript 的函式庫、快速部署的模板、用於開發REST API的 LangServe，以及用於除錯和監控的 LangSmith。LangChain 簡化了開發、生產和部署過程，提供與語言模型互動、執行檢索策略和輔助建立複雜應用架構的工具。 
 
 Introduction | 🦜️🔗 LangChain 
 LangChain是什麼？AI開發者必須了解的LLM開源框架 - ALPHA Camp 
 GitHub:  https://github.com/langchain-ai/langchain   
 Hub:  LangSmith (langchain.com)   
 教學： sugarforever/wtf-langchain 
 CookBook 
 LangChain Templates 
 
 LangSmith 
 LangChain 提供的雲端服務，可用來作程式除錯與監視後端程序，例如 RAG 的檢索資訊過程。 
 
 https://github.com/langchain-ai/langsmith-cookbook   
 LangChain 怎麼玩？用 LangSmith 幫忙追查問題 - MyApollo 
 深入LangSmith：如何帮助大模型(LLM)应用从原型到投入生产？【上】 - 文章 - 开发者社区 - 火山引擎 
 
 RAG 
 
 Learn RAG with Langchain ( ipynb ) 
 LangChain: A Complete Guide & Tutorial (nanonets.com) 
 Meta-Llama CookBook for RAG (ipynb) 
 LangChain and Streamlit RAG | Medium 
 
 GitHub: https://github.com/streamlit/example-app-langchain-rag   
 
 
 
 Retrievers in LCEL 
 from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

def format_docs(docs):
 return "\n\n".join([d.page_content for d in docs])

chain = (
 {"context": retriever | format_docs, "question": RunnablePassthrough()}
 | prompt
 | model
 | StrOutputParser()
)

chain.invoke("What did the president say about technology?")
 
 ChatPromptTemplate 
 few_shot_examples = [
{"input":"Could you please clarify the terms outlined in section 3.2 of the contract?",
"output":"Certainly, I will provide clarification on the terms in section 3.2."},
{"input":"We are interested in extending the payment deadline to 30 days instead of the current 15 days. Additionally, we would like to add a clause regarding late payment penalties.",
"output":"Our request is to extend the payment deadline to 30 days and include a clause on late payment penalties."},
{"input":"""The current indemnification clause seems too broad. We would like to narrow it down to cover only direct damages and exclude consequential damages.
Additionally, we propose including a dispute resolution clause specifying arbitration as the preferred method of resolving disputes.""",
"output":"""We suggest revising the indemnification clause to limit it to covering direct damages and excluding consequential damages.
Furthermore, we recommend adding a dispute resolution clause that specifies arbitration as the preferred method of resolving disputes."""},
{"input":"I believe the proposed changes are acceptable.",
"output":"Thank you for your feedback. I will proceed with implementing the proposed changes."}
]

few_shot_template = ChatPromptTemplate.from_messages(
 [
 ("human", "{input}"),
 ("ai", "{output}")
 ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
 example_prompt=few_shot_template,
 examples=few_shot_examples,
)

print(few_shot_prompt.format()) 
 custom_prompt = ChatPromptTemplate.from_template("""
You are an information extraction assistant.
Read the text below and identify important entities.

**Extraction rules:**
- Always extract the **Report Id** (this is the central node).
- Extract **people**, **institutions**, **places**, **dates**, **monetary amounts**, and **vehicle registration numbers** (e.g., MH12AB1234, PK-02-4567, KA05MG2020).
- Do not ignore any people names; extract all mentioned in the document, even if they seem minor or role not clear.
 Treat all of types of vehicles (eg; cars, bikes etc) as the same kind of entity called "Vehicle".

**Output format:**
1. List all nodes (unique entities).
2. Identify the central node (Report Id).
3. Create relationships of the form:
 (Report Id)-[HAS_ENTITY]->(Entity),
4. Do not create any other types of relationships. 

Text:
{input}

Return only structured data like:
Nodes:
- Report SYN-REP-2024
- Honda bike ABCD1234
- XYZ College, Chennai
- ...
""") 
   
 Input Data Loader 
 Web 
 from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
 web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
 bs_kwargs=dict(
 parse_only=bs4.SoupStrainer(
 class_=("post-content", "post-title", "post-header")
 )
 ),
)
docs = loader.load() 
 Text file 
 from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("../", glob="**/*.md")
docs = loader.load()
len(docs)
print(docs[0].page_content[:100]) 
 from langchain.document_loaders import TextLoader

dataset_folder_path='/path/to/dataset/'
documents=[]
for file in os.listdir(dataset_folder_path):
 loader=TextLoader(dataset_folder_path+file)
 documents.extend(loader.load())
 
print(documents[:3]) 
 Markdown file 
 '''
%pip install "unstructured[md]"
'''
from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

data = loader.load()
assert len(data) == 1
readme_content = data[0].page_content
print(readme_content[:3]) 
 PDF + Text file 
 from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders import PyPDFLoader

documents = []
for filename in SAMPLEDATA:
 path = os.path.join(os.getcwd(), filename)

 if filename.endswith(".pdf"):
 loader = PyPDFLoader(path)
 new_docs = loader.load_and_split()
 print(f"Processed pdf file: {filename}")
 elif filename.endswith(".txt"):
 loader = TextLoader(path)
 new_docs = loader.load_and_split()
 print(f"Processed txt file: {filename}")
 else:
 print(f"Unsupported file type: {filename}")

 if len(new_docs) > 0:
 documents.extend(new_docs)

SAMPLEDATA = []

print(f"\nProcessing done.") 
 OCR 
 
 A Simple Guide to OCR with Vision LLMs, LangChain, and Ollama | by Andreas Klos | Medium 
 
 常用函式 
 格式化輸出 
 # Helper function for printing docs
def pretty_print_docs(docs):
 print(
 f"\n{'-' * 100}\n".join(
 [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
 )
 )