Gen AI

生成式人工智慧

LLM Models

Chinese LLMs
Code LLMs
Benchmark
Function Calling LLMs
商用專有模型

Voice

Gen Audio
Instant voice cloning
Text to Speech (TTS)

RAG

檢索增強生成 - Retrieval Augmented Generation

RAG 主要用來解決大型語言模型(LLM)實際應用時的兩大侷限:幻覺/錯覺(hallucination)與資料時限。RAG 結合「資訊檢索(retrieval)」和「生成(generation)」:在文字生成之前,先從資料庫中檢索相關的資料放入上下文,以確保 LLM 可依照正確的最新資訊生成結果。

RAG 優點:

Tutorials

Advanced RAG
Embedding/Rerank Models
Vector Databases

RAG Projects

Embedchain

Embedchain streamlines the creation of personalized LLM applications, offering a seamless process for managing various types of unstructured data.

LangChain

LlamaIndex

GraphRAG

微軟開源一個基於圖譜的檢索與推理增強的解決方案。GraphRAG 透過從預檢索、後檢索到提示壓縮的過程中考慮知識圖譜的檢索與推理,為回答生成提供了一種更精準和相關的方法。

neo4j

Verba

Verba is a fully-customizable personal assistant for querying and interacting with your data, either locally or deployed via cloud. Resolve questions around your documents, cross-reference multiple data points or gain insights from existing knowledge bases. Verba combines state-of-the-art RAG techniques with Weaviate's context-aware database. Choose between different RAG frameworks, data types, chunking & retrieving techniques, and LLM providers based on your individual use-case.

PrivateGPT

LLMWare

The Ultimate Toolkit for Enterprise RAG Pipelines with Small, Specialized Models.

talkd/dialog

Talkd.ai—Optimizing LLMs with easy RAG deployment and management.

RAG 評估

評估生成(Generation)指標

評估檢索(Retrieval)指標

Ragas

Fine-Tune

模型微調工作流程

  1. 準備資料集(訓練資料)
  2. 準備基礎模型
  3. 匯入資料集
  4. 開始微調作業 (Fine-Tune)
  5. 評估新模型損失曲線
  6. 以新模型做實際推論

準備資料集

開始微調模型之前,您必須先建立用來微調模型的資料集。為獲得最佳效能,資料集內的範例必須具有高品質、多元且代表真實輸入和輸出的要素。

格式

資料集中包含的範例應符合您預期的實際工作環境流量。如果您的資料集含有特定格式、關鍵字、操作說明或資訊,則實際工作環境資料的格式應相同,並含有相同的指示。 例如,如果資料集中的範例包含 "question:" 和 "context:",則實際工作環境流量也應一併設定包含 "question:" 和 "context:" 的格式,且順序應與資料集範例中的順序相同。如果您排除結構定義,即使資料集的範例包含確切的問題,模型將無法辨識模式。

在資料集中的每個範例中加入提示或前置碼,也有助於改善調整後模型的效能。請注意,如果資料集中包含提示或前置碼,那麼在推論時向已調整的模型發出提示時,也應包含該提示或前置碼。

Tools & Platform

Unsloth

Unsloth - Easily finetune & train LLMs

微調模型專用的 Python 函式庫,在地端使用 GPU 資源對各種 Open Source 模型進行微調作業。

Atlas

Atlas by NOMIC - 資料集(非結構化資料)品質檢測服務

AnythingLLM

具有 Chat/Fine-Tune/Multi-Model 多功能的平台

LLaMA-Factory
outlines

生成結構化文字資料。可用於微調模型前的資料集預處理。

Models

Gemini-Pro

要微調 Gemini-Pro 模型,有三種不同方式呼叫 Gemini API 來做微調作業,Google AI StudioPython SDKREST API (curl)

Mistral

官方 Mistral AI 推出微調用 SDK 與 API。

AI Applications

Merlinn - open-source AI on-call developer
aidocx

運用 AI 自動生成特定知識的技術書籍(*.epub)

ASR - Automatic Speech Recognition
Translator - 翻譯機
WrenAI - text-to-SQL

WrenAI is a text-to-SQL solution for data teams to get results and insights faster by asking business questions without writing SQL.

Chatbox

Chatbox支援多款全球最先進的AI大模型服務,支援Windows、Mac和Linux。AI提升工作效率,深受全世界專業人士的好評。

QAnything

開源的企業級本地知識庫問答及應用

GPT Academic

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

AI Dev

OpenUI

AI自動生成網頁原始碼並即時預覽

Data Analysis (Chat with CSV)

Chat with Dataset

PandasAI

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

Web Scraper

- Crawlee

A web scraping and browser automation library. 

- ScrapeGraphAI

ScrapeGraphAI is a open-source web scraping python library designed to usher in a new era of scraping tools.

- Crew AI

Crew AI is a collaborative working system designed to enable various artificial intelligence agents to work together as a team, efficiently accomplishing complex tasks. Each agent has a specific role, resembling a team composed of researchers, writers, and planners.

 

OpenAI API

Web UI Development for LLM

- Gradio

Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!

- Streamlit

Streamlit is the UI powering the LLM movement

AI Memory

VS Code

Instructor

Structured outputs powered by llms. Designed for simplicity, transparency, and control.

Phidata

Phidata adds memory, knowledge and tools to LLMs.

PDF Extractor

- gptpdf

使用 OpenAI API 提取 PDF 內容,輸出為 Markdown 格式。 

- omniparse

PDF to Markdown

- PDF-Extract-Kit

URLs

- Mark

Marker converts PDF to markdown quickly and accurately.

URLs

- gptpdf

Using VLLM (like GPT-4o) to parse PDF into markdown.

Gemini API

Learning AI

AI 常見專用名詞

Gen AI (生成式 AI)

人工智慧 (AI) 藉由使用機器學習與環境互動並執行工作來模擬人類行為,而不需明確指示輸出的內容。

生成式 AI 是人工智慧的分支,可根據自然語言輸入來建立新內容。 生成式 AI 通常內建於軟體應用程式中,並使用經過大量文字資料定型的語言模型,以產生人類般的自然語言回應,甚至是原始影像。 這類應用程式的其中一個熱門範例是 ChatGPT,這是 OpenAI 所建立的聊天機器人,這是一家與 Microsoft 緊密合作的 AI 研究公司。

LLM (大型語言模型)

語言模型支援的一般自然語言處理(NLP) 工作包括:

其他

Introduction

Medium Articles

Course/HandBook

Google AI Courses for Free
Microsoft

國網中心(NCHC)教學

LLM Tokenizer 分詞器

PyImageSearch 教學 (英文)

AI 網站


RedHat AI

Red Hat® Enterprise Linux® AI is a foundation model platform to seamlessly develop, test, and run Granite family large language models (LLMs) for enterprise applications.

Red Hat Enterprise Linux AI brings together:

URLs:

InstructLab

Command-line interface. Use this to chat with the model or train the model (training consumes the taxonomy data)

What are the components of the InstructLab project?

How is InstructLab different from retrieval-augmented generation (RAG)?

RAG is a cost-efficient method for supplementing an LLM with domain-specific knowledge that wasn’t part of its pretraining. RAG makes it possible for a chatbot to accurately answer questions related to a specific field or business without retraining the model. Knowledge documents are stored in a vector database, then retrieved in chunks and sent to the model as part of user queries. This is helpful for anyone who wants to add proprietary data to an LLM without giving up control of their information, or who needs an LLM to access timely information.

This is in contrast to the InstructLab method, which sources end-user contributions to support regular builds of an enhanced version of an LLM. InstructLab helps add knowledge and unlock new skills of an LLM.

It’s possible to "supercharge" a RAG process by using the RAG technique on an InstructLab-tuned model.

URLs:

Agent

Camel AI

CAMEL-AI.org is the 1st LLM multi-agent framework and an open-source community dedicated to finding the scaling law of agents.

Crew AI

Crew AI is a collaborative working system designed to enable various artificial intelligence agents to work together as a team, efficiently accomplishing complex tasks. Each agent has a specific role, resembling a team composed of researchers, writers, and planners.

SWE-agent

SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

AutoGen

Enable Next-Gen Large Language Model Applications

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

AutoGPT-Code-Ability

AutoGPT's coding ability is an open-source coding assistant powered by AI. The goal is to make software development more accessible to everyone, regardless of skill level or resources. By generating code in Python, a popular and very accessible language, AutoGPT acts as a virtual co-pilot to help users build projects like backends for existing frontends or command-line tools.

 

AI Cloud Providers

AI Platform
Data Analysis
Dev Platform
Code Review
Monitor AP in developing

Prompt Engineering

Prompt Engineering - 提示工程

生成式 AI 應用程式傳回的回應品質不僅取決於模型本身,也取決於其所提供的提示類型。 「提示工程」一詞描述提示改善的流程。 設計應用程式的開發人員和使用這些應用程式的取用者,都可以考慮使用提示工程來改善生成式 AI 的回應品質。

提示是我們告知應用程式預期執行操作的方式。 工程師可以利用提示來新增程式的指示。 例如,開發人員可以為教師建置生成式 AI 應用程式,以建立與學生閱讀文字相關的複選問題。 在應用程式開發期間,開發人員可以新增其他規則,定義程式應該根據收到的提示執行哪些操作。

為什麼要使用英文 prompt 而不使用中文提問?

  1. 英文的訓練語料庫占比超過93%,中文約0.04%,使用英文提問時,它能返回更準確的解答。
  2. ChatGPT在一次輸入中,最多處理4096 token——超過上限的任何字符都將被忽略而不會顯示訊息。相較之下,英文的token使用量少,使用英文提問時,它能給予更多篇幅的答案(中文消耗的token約是英文兩倍)。詳細的可參考OpenAI token解答,以及使用官方的token計算器
  3. 返回英文答案後,再使用ChatGPT翻譯成中文就行了。例如:Please write in Traditional Chinese language.

Prompt Writing

提示詞框架

Prompt Dev
Prompt Generator
For Code LLMs

Instruct

# Prompt
You are an expert programmer that writes simple, concise code and explanations. Write a python function to generate the nth fibonacci number.

Response:

Here is a Python function that generates the nth Fibonacci number:

def fib(n):
    if n <= 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

This function uses the recursive formula for the Fibonacci sequence, which is:

fib(n) = fib(n-1) + fib(n-2)

Code Review

# Prompt
Where is the bug in this code?

def fib(n):
    if n <= 0:
        return n
    else:
        return fib(n-1) + fib(n-2)

Response:

The bug in this code is that it does not handle the case where `n` is equal to 1.

Unit Tests

# Prompt
write a unit test for this function: $(cat fib.py)

Response:

Here's an example of how you could write a unit test for the `fib` function:

```
import unittest

class TestFib(unittest.TestCase):
    def test_fib(self):
        self.assertEqual(fib(0), 0)
        self.assertEqual(fib(1), 1)
        self.assertEqual(fib(2), 1)
        self.assertEqual(fib(3), 2)
        self.assertEqual(fib(4), 3)
        self.assertEqual(fib(5), 5)
        self.assertEqual(fib(6), 8)
        self.assertEqual(fib(7), 13)
        self.assertEqual(fib(8), 21)
        self.assertEqual(fib(9), 34)
if __name__ == '__main__':
    unittest.main()
```

This test case covers all the cases where the input is an integer between 0 and 9, inclusive.

Code Completion

# Prompt
# A simple python function to remove whitespace from a string:

Response:

def remove_whitespace(s):
    return ''.join(s.split())

Infill: <PRE> {prefix} <SUF>{suffix} <MID>

# Prompt
<PRE> def compute_gcd(x, y): <SUF>return result <MID>

Response:

  if x == y:
        return x

    if x > y:
        x = x - y
    else:
        y = y - x

    result = compute_gcd(x, y)
Cheat Sheet

chatgpt_prompt.jpg

Function Calling

LLM(Large Language Model)的一個新功能是 Function Calling(函數呼叫)。這項功能允許LLM直接呼叫外部函數,並將其結果整合到模型的輸出中。下面是它的工作原理和應用: 
工作原理
 Function Calling 的工作原理是,LLM 在生成輸出時,可以 reconocize 具體的函數名稱和參數,並將其傳遞給外部函數執行。外部函數執行後,將結果返回給LLM,LLM 則將其整合到輸出中。這個過程可以重複多次,實現多個函數的呼叫和整合。

範例
例如,假設有一個LLM需要生成一個天氣報告,LLM 可以呼叫一個外部函數,該函數可以從天氣API中獲取当前的天氣資訊。 LLM將函數的結果整合到輸出中,生成一個完整的天氣報告。

應用
Function Calling 的應用非常廣泛,以下是一些例子:
總之,Function Calling 是 LLM 的一個強大功能,可以擴展模型的能力,實現更加 Complex 和多樣化的任務。
Tutorials
Models

Python Coding

LLM Model

LMStudio
from langchain.llms import OpenAI

#set llm for langchain using model from lmstudio
llm = OpenAI(
       openai_api_base='http://localhost:1234/v1',
       openai_api_key='NULL'
       )
import streamlit as st
from openai import OpenAI

# Set up the Streamlit App
st.title("ChatGPT Clone using Llama-3 🦙")
st.caption("Chat with locally hosted Llama-3 using the LM Studio 💯")

# Point to the local server setup using LM Studio
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

# Initialize the chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display the chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Accept user input
if prompt := st.chat_input("What is up?"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})
    # Display user message in chat message container
    with st.chat_message("user"):
        st.markdown(prompt)
    # Generate response
    response = client.chat.completions.create(
        model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
        messages=st.session_state.messages, temperature=0.7
    )
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": response.choices[0].message.content})
    # Display assistant response in chat message container
    with st.chat_message("assistant"):
        st.markdown(response.choices[0].message.content)

GPT

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",
    # base_url="...",
    # organization="...",
    # other params...
)

Ollama

from langchain_community.llms import Ollama

llm = Ollama(model="llama2:13b")
llm.invoke("The first man on the moon was ... think step by step")

LLM Engine

A software that can load the LLM Models

LLM Engine

Open WebUI

A Web UI Tool for Ollama

URLs
Installation

Installing Both Open WebUI and Ollama Together:

# With GPU Support
docker run -d -p 3000:8080 --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama
# For CPU only
docker run -d -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

LLM Engine

Kuwa Gen AI OS

一個自由、開放、安全且注重隱私的生成式人工智慧服務系統,包括友善的大語言模型使用介面,以及能支援生成式人工智慧應用的新型GenAI核心。

  1. 🌐 提供多語言GenAI開發與部署的整體解決方案,支援Windows及Linux
  2. 💬 提供群聊、引用、完整 Prompt 列表的匯入/匯出/分享等友善使用功能
  3. 🔄 可靈活組合 Prompt x RAGs x Bot x 模型 x 硬體/GPUs以滿足應用所需
  4. 💻 支援從虛擬主機、筆記型電腦、個人電腦、地端伺服器到公私雲端的各種環境
  5. 🔓 開放原始碼,允許開發人員貢獻並根據自己的需求打造自己的客製系統
URLs
LLM Engine

AnythingLLM

The ultimate AI business intelligence tool. Any LLM, any document, full control, full privacy.

AnythingLLM is a "single-player" (單機個人)application you can install on any Mac, Windows, or Linux operating system and get local LLMs, RAG, and Agents with little to zero configuration and full privacy.

AnythingLLM 也有自架網站版,見文章下方的連結。

You can install AnythingLLM as a Desktop Application, Self Host it locally using Docker and Host it on cloud (aws, google cloud, railway etc..) using Docker

You want AnythingLLM Desktop if...

URLs
LLM Engine

Ollama

Run Llama 3, Phi 3, Mistral, Gemma, and other models. Customize and create your own.

Installation

ollama + open webui
mkdir ollama-data download open-webui-data

docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - 11434:11434
    volumes:
      - ./ollama-data:/root/.ollama
      - ./download:/download
    container_name: ollama
    pull_policy: always
    tty: true
    restart: always
    networks:
      - ollama-docker

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - ./open-webui-data:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped
    networks:
      - ollama-docker

networks:
  ollama-docker:
    external: false
ollama
mkdir ollama-data download

docker run --name ollama -d --rm \
    -v $PWD/ollama-data:/root/.ollama \
    -v $PWD/download:/download \
    -p 11434:11434 \
    ollama/ollama

Models

List Models Installed

ollama list

Load a GGUF model manually

ollama create <my-model-name> -f <modelfile>

Page Assist

Page Assist is an open-source Chrome Extension that provides a Sidebar and Web UI for your Local AI model.

LLM Engine

LM Studio

Discover, download, and run local LLMs.

With LM Studio, you can ...

🤖 - Run LLMs on your laptop, entirely offline
👾 - Use models through the in-app Chat UI or an OpenAI compatible local server
📂 - Download any compatible model files from HuggingFace
🤗 repositories
🔭 - Discover new & noteworthy LLMs in the app's home page
URLs

 

LLM Engine

Text generation web UI

A Gradio web UI for Large Language Models.

只能執行本地模型,不支援外部模型 API。

支援以下多重功能的 AI 平台

教學

LLM Engine

OpenLLM

OpenLLM helps developers run any open-source LLMs, such as Llama 2 and Mistral, as OpenAI-compatible API endpoints, locally and in the cloud, optimized for serving throughput and production deployment.

Install

Recommend using a Python Virtual Environment

pip install openllm
Start a LLM Server
openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

To interact with the server, you can visit the web UI at http://localhost:3000/ or send a request using curl. You can also use OpenLLM’s built-in Python client to interact with the server:

import openllm

client = openllm.HTTPClient('http://localhost:3000')
client.generate('Explain to me the difference between "further" and "farther"')
OpenAI Compatible Endpoints
import openai

client = openai.OpenAI(base_url='http://localhost:3000/v1', api_key='na')  # Here the server is running on 0.0.0.0:3000

completions = client.chat.completions.create(
  prompt='Write me a tag line for an ice cream shop.', model=model, max_tokens=64, stream=stream
)
LangChain
from langchain.llms import OpenLLMAPI

llm = OpenLLMAPI(server_url='http://44.23.123.1:3000')
llm.invoke('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')

# streaming
for it in llm.stream('What is the difference between a duck and a goose? And why there are so many Goose in Canada?'):
  print(it, flush=True, end='')

# async context
await llm.ainvoke('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')

# async streaming
async for it in llm.astream('What is the difference between a duck and a goose? And why there are so many Goose in Canada?'):
  print(it, flush=True, end='')

 

 

LLM Engine

NVIDIA NIM

Explore the latest community-built AI models with an API optimized and accelerated by NVIDIA, then deploy anywhere with NVIDIA NIM inference microservices.

URLs

LLM Engine

Xinference

Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications.

 

LLM Engine

Bechmark

bench.py
import aiohttp
import asyncio
import time
from tqdm import tqdm

import random

questions = [
    "Why is the sky blue?", "Why do we dream?", "Why is the ocean salty?", "Why do leaves change color?",
    "Why do birds sing?", "Why do we have seasons?", "Why do stars twinkle?", "Why do we yawn?",
    "Why is the sun hot?", "Why do cats purr?", "Why do dogs bark?", "Why do fish swim?",
    "Why do we have fingerprints?", "Why do we sneeze?", "Why do we have eyebrows?", "Why do we have hair?",
    "Why do we have nails?", "Why do we have teeth?", "Why do we have bones?", "Why do we have muscles?",
    "Why do we have blood?", "Why do we have a heart?", "Why do we have lungs?", "Why do we have a brain?",
    "Why do we have skin?", "Why do we have ears?", "Why do we have eyes?", "Why do we have a nose?",
    "Why do we have a mouth?", "Why do we have a tongue?", "Why do we have a stomach?", "Why do we have intestines?",
    "Why do we have a liver?", "Why do we have kidneys?", "Why do we have a bladder?", "Why do we have a pancreas?",
    "Why do we have a spleen?", "Why do we have a gallbladder?", "Why do we have a thyroid?", "Why do we have adrenal glands?",
    "Why do we have a pituitary gland?", "Why do we have a hypothalamus?", "Why do we have a thymus?", "Why do we have lymph nodes?",
    "Why do we have a spinal cord?", "Why do we have nerves?", "Why do we have a circulatory system?", "Why do we have a respiratory system?",
    "Why do we have a digestive system?", "Why do we have an immune system?"
]

async def fetch(session, url):
    """
    参数:
        session (aiohttp.ClientSession): 用于请求的会话。
        url (str): 要发送请求的 URL。
    
    返回:
        tuple: 包含完成 token 数量和请求时间。
    """
    start_time = time.time()

    # 随机选择一个问题
    question = random.choice(questions) # <--- 这两个必须注释一个

    # 固定问题                                 
    # question = questions[0]             # <--- 这两个必须注释一个

    # 请求的内容
    json_payload = {
        "model": "llama3:8b-instruct-fp16",
        "messages": [{"role": "user", "content": question}],
        "stream": False,
        "temperature": 0.7 # 参数使用 0.7 保证每次的结果略有区别
    }
    async with session.post(url, json=json_payload) as response:
        response_json = await response.json()
        end_time = time.time()
        request_time = end_time - start_time
        completion_tokens = response_json['usage']['completion_tokens'] # 从返回的参数里获取生成的 token 的数量
        return completion_tokens, request_time

async def bound_fetch(sem, session, url, pbar):
    # 使用信号量 sem 来限制并发请求的数量,确保不会超过最大并发请求数
    async with sem:
        result = await fetch(session, url)
        pbar.update(1)
        return result

async def run(load_url, max_concurrent_requests, total_requests):
    """
    通过发送多个并发请求来运行基准测试。
    
    参数:
        load_url (str): 要发送请求的URL。
        max_concurrent_requests (int): 最大并发请求数。
        total_requests (int): 要发送的总请求数。
    
    返回:
        tuple: 包含完成 token 总数列表和响应时间列表。
    """
    # 创建 Semaphore 来限制并发请求的数量
    sem = asyncio.Semaphore(max_concurrent_requests)
    
    # 创建一个异步的HTTP会话
    async with aiohttp.ClientSession() as session:
        tasks = []
        
        # 创建一个进度条来可视化请求的进度
        with tqdm(total=total_requests) as pbar:
            # 循环创建任务,直到达到总请求数
            for _ in range(total_requests):
                # 为每个请求创建一个任务,确保它遵守信号量的限制
                task = asyncio.ensure_future(bound_fetch(sem, session, load_url, pbar))
                tasks.append(task)  # 将任务添加到任务列表中
            
            # 等待所有任务完成并收集它们的结果
            results = await asyncio.gather(*tasks)
        
        # 计算所有结果中的完成token总数
        completion_tokens = sum(result[0] for result in results)
        
        # 从所有结果中提取响应时间
        response_times = [result[1] for result in results]
        
        # 返回完成token的总数和响应时间的列表
        return completion_tokens, response_times

if __name__ == '__main__':
    import sys

    if len(sys.argv) != 3:
        print("Usage: python bench.py <C> <N>")
        sys.exit(1)

    C = int(sys.argv[1])  # 最大并发数
    N = int(sys.argv[2])  # 请求总数

    # vllm 和 ollama 都兼容了 openai 的 api 让测试变得更简单了
    url = 'http://localhost:11434/v1/chat/completions'

    start_time = time.time()
    completion_tokens, response_times = asyncio.run(run(url, C, N))
    end_time = time.time()

    # 计算总时间
    total_time = end_time - start_time
    # 计算每个请求的平均时间
    avg_time_per_request = sum(response_times) / len(response_times)
    # 计算每秒生成的 token 数量
    tokens_per_second = completion_tokens / total_time

    print(f'Performance Results:')
    print(f'  Total requests            : {N}')
    print(f'  Max concurrent requests   : {C}')
    print(f'  Total time                : {total_time:.2f} seconds')
    print(f'  Average time per request  : {avg_time_per_request:.2f} seconds')
    print(f'  Tokens per second         : {tokens_per_second:.2f}')

LLM Engine

OpenAI Proxy

Proxy Server to call 100+ LLMs in a unified interface & track spend, set budgets per virtual key/user

Features:

企業在導入 LLM 時,可能會用到多種不同的模型,這些包含商用授權與開源授權以及來自不同的服務商。為了統一管理及開發應用這些各類不同模型,建議使用 OpenAI Proxy 這個平台來解決,以達到下列目的:

Translator

使用 LLM 實現語言翻譯

LiteLLM + 反思提示 + 工作流
Translation Agent
RTranslator

RTranslator is an (almost) open-source, free, and offline real-time translation app for Android.

Jupyter Notebook

Installation

With pip

pip install notebook
Python Virtual Environment

With Python Venv

mkdir my-rag
cd my-rag
python -m venv .venv
source .venv/bin/activate
(my-rag)> pip install --upgrade pip
(my-rag)> pip install notebook
(my-rag)> jupyter notebook

With Conda

conda create -n my-rag python=3.10
conda activate my-rag
(my-rag)> pip install --upgrade pip
(my-rag)> pip install notebook
(my-rag)> jupyter notebook

UI 可切換不同虛擬環境(需要先建立不同的 ipykernel)

mkdir my-rag
cd my-rag
python -m venv .venv
source .venv/bin/activate
(my-rag)> pip install --upgrade pip
(my-rag)> pip install ipykernel
(my-rag)> ipython kernel install --user --name="my-rag-kernel"
(my-rag)> jupyter notebook

Cloud Providers

LangChain

LangChain 是一個旨在為開發者提供一套工具和程式介接,以便更容易、更有效地利用大型語言模型(LLM)的開源開發框架,專注於情境感知和推理。它包含多個組件,如 Python 和 JavaScript 的函式庫、快速部署的模板、用於開發REST API的 LangServe,以及用於除錯和監控的 LangSmith。LangChain 簡化了開發、生產和部署過程,提供與語言模型互動、執行檢索策略和輔助建立複雜應用架構的工具。

LangSmith

LangChain 提供的雲端服務,可用來作程式除錯與監視後端程序,例如 RAG 的檢索資訊過程。

 

RAG

Retrievers in LCEL
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()


def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

chain.invoke("What did the president say about technology?")

ChatPromptTemplate

few_shot_examples = [
{"input":"Could you please clarify the terms outlined in section 3.2 of the contract?",
"output":"Certainly, I will provide clarification on the terms in section 3.2."},
{"input":"We are interested in extending the payment deadline to 30 days instead of the current 15 days. Additionally, we would like to add a clause regarding late payment penalties.",
"output":"Our request is to extend the payment deadline to 30 days and include a clause on late payment penalties."},
{"input":"""The current indemnification clause seems too broad. We would like to narrow it down to cover only direct damages and exclude consequential damages.
Additionally, we propose including a dispute resolution clause specifying arbitration as the preferred method of resolving disputes.""",
"output":"""We suggest revising the indemnification clause to limit it to covering direct damages and excluding consequential damages.
Furthermore, we recommend adding a dispute resolution clause that specifies arbitration as the preferred method of resolving disputes."""},
{"input":"I believe the proposed changes are acceptable.",
"output":"Thank you for your feedback. I will proceed with implementing the proposed changes."}
]

few_shot_template = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}")
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=few_shot_template,
    examples=few_shot_examples,
)

print(few_shot_prompt.format())

Loader

Web

from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

Text

from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("../", glob="**/*.md")
docs = loader.load()
len(docs)
print(docs[0].page_content[:100])

Markdown

'''
%pip install "unstructured[md]"
'''
from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

data = loader.load()
assert len(data) == 1
readme_content = data[0].page_content
print(readme_content[:3])

常用函式

格式化輸出

# Helper function for printing docs
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

Finance AI

OpenBB

Investment research made easy with AI.

StockBot