# LangChain

LangChain 是一個旨在為開發者提供一套工具和程式介接，以便更容易、更有效地利用大型語言模型（LLM）的開源開發框架，專注於情境感知和推理。它包含多個組件，如 Python 和 JavaScript 的函式庫、快速部署的模板、用於開發REST API的 LangServe，以及用於除錯和監控的 LangSmith。LangChain 簡化了開發、生產和部署過程，提供與語言模型互動、執行檢索策略和輔助建立複雜應用架構的工具。

- [Introduction | 🦜️🔗 LangChain](https://python.langchain.com/docs/get_started/introduction)
- [LangChain是什麼？AI開發者必須了解的LLM開源框架 - ALPHA Camp](https://tw.alphacamp.co/blog/langchain-intro)
- GitHub: [https://github.com/langchain-ai/langchain](https://github.com/langchain-ai/langchain)
- Hub: [LangSmith (langchain.com)](https://smith.langchain.com/hub)
- 教學：[sugarforever/wtf-langchain](https://github.com/sugarforever/wtf-langchain)
- [CookBook](https://github.com/langchain-ai/langchain/tree/master/cookbook)
- [LangChain Templates](https://templates.langchain.com/)

#### LangSmith

LangChain 提供的雲端服務，可用來作程式除錯與監視後端程序，例如 RAG 的檢索資訊過程。

- [https://github.com/langchain-ai/langsmith-cookbook](https://github.com/langchain-ai/langsmith-cookbook)
- [LangChain 怎麼玩？用 LangSmith 幫忙追查問題 - MyApollo](https://myapollo.com.tw/blog/langchain-langsmith/)
- [深入LangSmith：如何帮助大模型(LLM)应用从原型到投入生产？【上】 - 文章 - 开发者社区 - 火山引擎](https://developer.volcengine.com/articles/7370375414524411931)

#### RAG

- [Learn RAG with Langchain](https://sakunaharinda.xyz/) ([ipynb](https://github.com/sakunaharinda/ragatouille-book/tree/main/book))
- [LangChain: A Complete Guide &amp; Tutorial (nanonets.com)](https://nanonets.com/blog/langchain/)
- [Meta-Llama CookBook for RAG](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/RAG/hello_llama_cloud.ipynb) (ipynb)
- [LangChain and Streamlit RAG | Medium](https://medium.com/snowflake/langchain-and-streamlit-rag-c5f53af8f6ba)
    - GitHub: [https://github.com/streamlit/example-app-langchain-rag](https://github.com/streamlit/example-app-langchain-rag)

##### Retrievers in LCEL

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()


def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

chain.invoke("What did the president say about technology?")

```

#### ChatPromptTemplate

```python
few_shot_examples = [
{"input":"Could you please clarify the terms outlined in section 3.2 of the contract?",
"output":"Certainly, I will provide clarification on the terms in section 3.2."},
{"input":"We are interested in extending the payment deadline to 30 days instead of the current 15 days. Additionally, we would like to add a clause regarding late payment penalties.",
"output":"Our request is to extend the payment deadline to 30 days and include a clause on late payment penalties."},
{"input":"""The current indemnification clause seems too broad. We would like to narrow it down to cover only direct damages and exclude consequential damages.
Additionally, we propose including a dispute resolution clause specifying arbitration as the preferred method of resolving disputes.""",
"output":"""We suggest revising the indemnification clause to limit it to covering direct damages and excluding consequential damages.
Furthermore, we recommend adding a dispute resolution clause that specifies arbitration as the preferred method of resolving disputes."""},
{"input":"I believe the proposed changes are acceptable.",
"output":"Thank you for your feedback. I will proceed with implementing the proposed changes."}
]

few_shot_template = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}")
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=few_shot_template,
    examples=few_shot_examples,
)

print(few_shot_prompt.format())
```

```python
custom_prompt = ChatPromptTemplate.from_template("""
You are an information extraction assistant.
Read the text below and identify important entities.

**Extraction rules:**
- Always extract the **Report Id** (this is the central node).
- Extract **people**, **institutions**, **places**, **dates**, **monetary amounts**, and **vehicle registration numbers** (e.g., MH12AB1234, PK-02-4567, KA05MG2020).
- Do not ignore any people names; extract all mentioned in the document, even if they seem minor or role not clear.
  Treat all of types of vehicles (eg; cars, bikes etc) as the same kind of entity called "Vehicle".

**Output format:**
1. List all nodes (unique entities).
2. Identify the central node (Report Id).
3. Create relationships of the form:
   (Report Id)-[HAS_ENTITY]->(Entity),
4. Do not create any other types of relationships.                                            

Text:
{input}

Return only structured data like:
Nodes:
- Report SYN-REP-2024
- Honda bike ABCD1234
- XYZ College, Chennai
- ...
""")
```

#### Input Data Loader

##### Web

```python
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()
```

##### Text file

```python
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("../", glob="**/*.md")
docs = loader.load()
len(docs)
print(docs[0].page_content[:100])
```

```python
from langchain.document_loaders import TextLoader

dataset_folder_path='/path/to/dataset/'
documents=[]
for file in os.listdir(dataset_folder_path):
  loader=TextLoader(dataset_folder_path+file)
  documents.extend(loader.load())
  
print(documents[:3])
```

##### Markdown file

```python
'''
%pip install "unstructured[md]"
'''
from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

data = loader.load()
assert len(data) == 1
readme_content = data[0].page_content
print(readme_content[:3])
```

##### PDF + Text file

```python
from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders import PyPDFLoader

documents = []
for filename in SAMPLEDATA:
    path = os.path.join(os.getcwd(), filename)

    if filename.endswith(".pdf"):
        loader = PyPDFLoader(path)
        new_docs = loader.load_and_split()
        print(f"Processed pdf file: {filename}")
    elif filename.endswith(".txt"):
        loader = TextLoader(path)
        new_docs = loader.load_and_split()
        print(f"Processed txt file: {filename}")
    else:
        print(f"Unsupported file type: {filename}")

    if len(new_docs) > 0:
        documents.extend(new_docs)

SAMPLEDATA = []

print(f"\nProcessing done.")
```

#### OCR

- [A Simple Guide to OCR with Vision LLMs, LangChain, and Ollama | by Andreas Klos | Medium](https://medium.com/@a-klos/a-simple-guide-to-ocr-with-vision-llms-langchain-and-ollama-2dc5c15660d4)

#### 常用函式

格式化輸出

```python
# Helper function for printing docs
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )
```