Gen AI

生成式人工智慧

LLM Models
Voice
RAG
Fine-Tune
AI Applications
AI Dev
Learning AI
RedHat AI
Agent
AI Cloud Providers
Prompt Engineering
Function Calling
Python Coding
LLM Engine

Open WebUI
Kuwa Gen AI OS
AnythingLLM
Ollama
LM Studio
OpenLLM
Bechmark
More

AI Translator
Jupyter Notebook
LangChain
Finance AI
Semantic Kernel
Legal AI
NVIDIA
Image Generation

LLM Models

Chinese LLMs

Taiwan LLM - Project TAME (TAiwanese Mixture of Experts)
TAIDE - Trustworthy AI Dialogue Engine
- GitHub: https://github.com/taide-taiwan
- HF: https://huggingface.co/taide
- TAIDE | iThome
01.AI - Yi
- GitHub: https://github.com/01-ai/Yi
- HF: https://huggingface.co/01-ai/
CKIP-Llama-2-7b 是中央研究院詞庫小組(CKIP)開發的開源可商用繁體中文大型語言模型，以商用開源模型Llama-2-7b以及Atom-7b為基礎，再補強繁體中文的處理能力，並對405個可商用的任務檔案同步進行訓練優化，參數量達70億(7 billion)。
- GitHub: https://github.com/f901107/CKIP-Llama-2-7b
- HF: https://huggingface.co/spaces/ckiplab/CKIP-Llama-2-7b-chat
Qwen - 阿里雲通義千問
GLM-4 - 智譜 AI 推出的中文多語言模型
- GitHub: https://github.com/THUDM/GLM-4
- HF: https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7
Chinese-Mixtral
DeepSeek - 深度求索
- GitHub: https://github.com/deepseek-ai/DeepSeek-V2
- HF: https://huggingface.co/deepseek-ai/DeepSeek-V2

Code LLMs

Granite - Open sourcing IBM’s Granite code models
Codestral - Mistral's first generative AI model for code
- HF: https://huggingface.co/mistralai/Codestral-22B-v0.1
- Mistral AI推出輕量程式撰寫輔助模型 | iThome

LLM Evaluation

PromptBench: A Unified Library for Evaluating and Understanding Large Language Models.
AI產品與系統評測中心: AI評測模擬測試題庫.xlsx

LLM Monitor

Opik is an open-source platform for evaluating, testing and monitoring LLM applications.

Function Calling LLMs

Firefunction-v2
- HF: https://huggingface.co/fireworks-ai/firefunction-v2

Content Safty

Google ShieldGemma
ShieldGemma則是個安全分類模型，可額外部署在模型的輸入及輸出端，用以過濾有害內容，它主要篩選4大領域的內容，包括仇恨言論、騷擾、裸露的色情內容，以及危險內容。

Calculate VRAM required for LLM

Voice

RAG

檢索增強生成 - Retrieval Augmented Generation

RAG 主要用來解決大型語言模型（LLM）實際應用時的兩大侷限：幻覺/錯覺（hallucination）與資料時限。RAG 結合「資訊檢索（retrieval）」和「生成（generation）」：在文字生成之前，先從資料庫中檢索相關的資料放入上下文，以確保 LLM 可依照正確的最新資訊生成結果。

RAG 優點：

降低 AI 幻覺
提升資料數據安全
減少模型微調
改善資料時限

流程示意圖

Introduction

Introduction to Retrieval Augmented Generation (RAG) | Weaviate

Tutorials

Introduction to RAG

Embedding/Rerank Models

嵌入模型排行榜
中文
- BCEmbedding
  - HuggingFace: https://huggingface.co/maidalun1020/bce-embedding-base_v1
- BAAI
- GTE
API Service
- Cohere (Rerank)

Vector Databases

Qdrant - 一個開源的向量搜索引擎，旨在處理高維數據。有GUI管理介面。
- 什麼是Qdrant？理解這個向量搜索引擎的最終指南 – AI StartUps Product Information, Reviews, Latest Updates
Chroma
VectorAdmin - 向量資料庫管理介面 (嵌入模型僅支援 OpenAI)
Pinecone (Cloud)
- Introducing Pinecone Inference to streamline your AI workflow | Pinecone
- multilingual-e5-large - Pinecone Docs
Supabase (Cloud)
Astra DB (Cloud)
- Doc: Quickstart | Astra DB Serverless | DataStax Docs

Advanced RAG

RAG 優化技巧| 7 大挑戰與解決方式 | 增進你的 LLM. 儘管 LLM + RAG 的能力已經令人驚嘆，但我們在使用 RAG 優化… | by ChiChieh Huang | Medium
ReRank
- RAG 重排序算法（ReRank）的關鍵作用與優化指南 | DataAgent
Advanced RAG: MultiQuery and ParentDocument | RAGStack | DataStax Docs
Advanced Retrieval With LangChain (ipynb)
Advanced RAG Implementation using Hybrid Search, Reranking with Zephyr Alpha LLM | by Nadika Poudel | Medium
Five Levels of Chunking Strategies in RAG| Notes from Greg’s Video | by Anurag Mishra | Medium
Chunking/Splitting
- Mastering RAG: Advanced Chunking Techniques for LLM Applications - Galileo (rungalileo.io)
- 5 Levels Of Text Splitting (ipynb)
- [中文] Semantic Chunking
- 使用繁體中文評測 RAG 的 Chunking 切塊策略
- Chunking Evaluation
- Online Tools
  - Online Text Splitter
  - ChunkViz
- 15 Chunking Techniques to Build Exceptional RAGs Systems
- chonkie - The no-nonsense RAG chunking library
Advanced RAG: Query Expansion
Cohere Cookbooks
RAG Techniques: Part 1 of 5— Implementing 5 Effective Methods
1. 標準 RAG (Standard RAG)
2. 糾正式 RAG (Corrective RAG)
3. 推測式 RAG (Speculative RAG)
4. 融合式 RAG (Fusion RAG)
5. 代理式 RAG (Agentic RAG)

RAG Projects

Danswer

Danswer is the AI Assistant connected to your company's docs, apps, and people. Danswer provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud.

GitHub: https://github.com/danswer-ai/danswer
Doc: https://docs.danswer.dev/introduction

Embedchain

Embedchain streamlines the creation of personalized LLM applications, offering a seamless process for managing various types of unstructured data.

Doc: ⚡ Quickstart - Embedchain
GitHub: https://github.com/embedchain/embedchain

GraphRAG

微軟開源一個基於圖譜的檢索與推理增強的解決方案。GraphRAG 透過從預檢索、後檢索到提示壓縮的過程中考慮知識圖譜的檢索與推理，為回答生成提供了一種更精準和相關的方法。

neo4j

Verba

Verba is a fully-customizable personal assistant for querying and interacting with your data, either locally or deployed via cloud. Resolve questions around your documents, cross-reference multiple data points or gain insights from existing knowledge bases. Verba combines state-of-the-art RAG techniques with Weaviate's context-aware database. Choose between different RAG frameworks, data types, chunking & retrieving techniques, and LLM providers based on your individual use-case.

Github: Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
Weaviate is an open source, AI-native vector database
- Doc: Quickstart Tutorial | Weaviate - Vector Database
Video: Open Source RAG with Ollama - YouTube

PrivateGPT

LLMWare

The Ultimate Toolkit for Enterprise RAG Pipelines with Small, Specialized Models.

talkd/dialog

Talkd.ai—Optimizing LLMs with easy RAG deployment and management.

talkd/dialog | dialog
GitHub: https://github.com/talkdai/dialog

RAG 評估

評估生成（Generation）指標

忠誠度（Faithfulness）
忠誠度是評估 RAG 模型生成答案的真實度和可靠性的關鍵指標。它主要衡量生成答案與給定上下文事實之間的一致性。忠誠度高的答案意味著模型能夠準確地從給定的上下文中提取信息，並生成與事實一致的回答。這對於保證生成內容的質量和信任度至關重要。
答案相關性（Answer Relevancy）
答案相關性則重點衡量生成答案與用戶提問的匹配程度。高相關性的答案不僅要求模型能夠理解用戶的問題，還要求其能夠生成與問題密切相關的回答。這直接影響到用戶的滿意度和模型的實用性。
答案正確性（Answer Correctness）
答案正確性是衡量生成的答案與已知的“地面真相”答案之間的一致性。計算方法是評估生成答案的準確度，即答案與真實答案的一致性。技術達成的方式可以通過比較生成答案與真實答案的文字相似度來完成，這類似於答案相關性，但更側重於答案的準確性。

評估檢索（Retrieval）指標

上下文召回率（Context Recall）
上下文召回率關注於模型在檢索過程中能否准確地找到與問題相關的上下文訊息。一個高召回率的模型能夠從大量數據中有效地過濾出最相關的訊息，這是提升問答系統準確性和效率的關鍵。
上下文精確度（Context Precision）
上下文精確度是衡量RAG系統在回答問題時使用的上下文資料的相關性。計算方法是確定RAG系統為回答特定問題而選擇的上下文資料與問題的相關性。技術達成的方式通常涵蓋比較RAG選擇的上下文與一組預先定義的相關上下文，計算這些上下文在生成答案時的重要性。

URLs

Fine-Tune

模型微調工作流程

準備資料集(訓練資料)
準備基礎模型
匯入資料集
開始微調作業 (Fine-Tune)
評估新模型損失曲線
以新模型做實際推論

準備資料集

開始微調模型之前，您必須先建立用來微調模型的資料集。為獲得最佳效能，資料集內的範例必須具有高品質、多元且代表真實輸入和輸出的要素。

格式

資料集中包含的範例應符合您預期的實際工作環境流量。如果您的資料集含有特定格式、關鍵字、操作說明或資訊，則實際工作環境資料的格式應相同，並含有相同的指示。例如，如果資料集中的範例包含 "question:" 和 "context:"，則實際工作環境流量也應一併設定包含 "question:" 和 "context:" 的格式，且順序應與資料集範例中的順序相同。如果您排除結構定義，即使資料集的範例包含確切的問題，模型將無法辨識模式。

在資料集中的每個範例中加入提示或前置碼，也有助於改善調整後模型的效能。請注意，如果資料集中包含提示或前置碼，那麼在推論時向已調整的模型發出提示時，也應包含該提示或前置碼。

YT: Fintune Falcon Model
LaWGPT - 微調具中文法律(中國)知識的大語言模型

Tools & Platform

Unsloth

Unsloth - Easily finetune & train LLMs

微調模型專用的 Python 函式庫，在地端使用 GPU 資源對各種 Open Source 模型進行微調作業。

Atlas

Atlas by NOMIC - 資料集（非結構化資料）品質檢測服務

AnythingLLM

具有 Chat/Fine-Tune/Multi-Model 多功能的平台

大型語言模型LLMs介紹與操作教學 20231228 - YouTube

LLaMA-Factory

outlines

生成結構化文字資料。可用於微調模型前的資料集預處理。

GitHub: https://github.com/outlines-dev/outlines

InstructLab (IBM)

Models

Gemini-Pro

要微調 Gemini-Pro 模型，有三種不同方式呼叫 Gemini API 來做微調作業，Google AI Studio、Python SDK、REST API (curl)。

模型調整簡介 | Google AI for Developers | Google for Developers

Mistral

官方 Mistral AI 推出微調用 SDK 與 API。

AI Applications

Cherry Studio

Cherry Studio is a desktop client that supports for multiple LLM providers, available on Windows, Mac and Linux.

Support for Multiple LLM Providers.

Allows creation of multiple Assistants.

Enables creation of multiple topics.

Allows using multiple models to answer questions in the same conversation.

Supports drag-and-drop sorting.

Code highlighting.

Mermaid chart

Elicit - 論文分析

雲端服務，免安裝，零元基本方案
可用中文分析、檢索、比對及總結多個論文。
Elicit: The AI Research Assistant
YT: 生成式AI應用課程(I)-如何以超人的速度來進行學術期刊論文研究 - YouTube

Merlinn - open-source AI on-call developer

GitHub: https://github.com/merlinn-co/merlinn

aidocx

運用 AI 自動生成特定知識的技術書籍(*.epub)

ASR - Automatic Speech Recognition

FrogBase - OpenAI 影片逐字稿生成與翻譯
InstantID - 文字生成圖像 AI，個人風格頭像生成
WhisperDesktop - 影片生成字幕逐字稿，For Windows Only
- [Video] 免安裝版Whisper　無須安裝便可使用｜硬體需求大幅降低｜使用Ｃ＋＋編寫　無須額外安裝函式庫
OpenAI Whisper
Whisper WebUI - 網頁操作介面
WhisperX - 比 whisper large-v2 快 70 倍
Fast Whisper - 比 OpenAI Whisper 的速度快，資源消耗較低

Translator - 翻譯機

OpenAI Translator - 基於 ChatGPT API 的翻譯擴充功能，Chrome、Edge 都能用
- Chrome Extension

WrenAI - text-to-SQL

WrenAI is a text-to-SQL solution for data teams to get results and insights faster by asking business questions without writing SQL.

GitHub: https://github.com/Canner/WrenAI

Chatbox

Chatbox支援多款全球最先進的AI大模型服務，支援Windows、Mac和Linux。AI提升工作效率，深受全世界專業人士的好評。

取代難用的 ChatGPT 或其他網頁聊天介面。

可自訂多個不同 AI 助手。

操作介面簡潔又實用。

跨平台支援（Linux/Windows/Mac）

支援 OpenAI/Gemini/Ollama/Groq 等模型 API

支援繁中等多國語言

QAnything

開源的企業級本地知識庫問答及應用

QAnything
Doc: https://qanything.ai/docs/introduce
GitHub: https://github.com/netease-youdao/QAnything

GPT Academic

为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

GitHub: https://github.com/binary-husky/gpt_academic

HivisionIDPhoto

一个轻量级的AI证件照制作算法。

GitHub: https://github.com/Zeyi-Lin/HivisionIDPhotos

KHOJ

Your AI second brain

https://khoj.dev/
GitHub: https://github.com/khoj-ai/khoj

Presentation AI

Infography App - 5 秒鐘從文字到精美圖形，這款神器堪稱 PPT 殺手

AI Dev

AI Develop Framework

LangChain (python + node.js)
LlamaIndex (python)
Haystack (python)
Phidata (python)

- LlamaIndex

Ollama <> Mistral <> LlamaIndex Cookbook (ipynb)

Data Analysis (Chat with CSV)

- PandasAI

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

Chat with Dataset

Chat with your dataset! This project demonstrates a web-based application to query a dataset through natural language.

Web Scraper

- Crawlee

A web scraping and browser automation library.

crawlee for Python
- GitHub: https://github.com/apify/crawlee-python
crawlee for Node.js
- GitHub: https://github.com/apify/crawlee

- ScrapeGraphAI

ScrapeGraphAI is a open-source web scraping python library designed to usher in a new era of scraping tools.

- Crew AI

Crew AI is a collaborative working system designed to enable various artificial intelligence agents to work together as a team, efficiently accomplishing complex tasks. Each agent has a specific role, resembling a team composed of researchers, writers, and planners.

GitHub: https://github.com/joaomdmoura/crewAI
範例：使 AI 自動爬文並生成筆記
- Video: 如何搭建一套Agent系统
- GitHub: Python 程式碼
Crew AI — your own minions. How I Made AI Assistants Do My Work For… | by Csakash | Medium

LLM API

- OpenAI API

- Gemini API

Web UI Framework

- Gradio

Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!

- Streamlit

Streamlit is the UI powering the LLM movement

AI Memory

AI Coding

- Alternative to GitHub Copilot

Best Self-hosted GitHub Copilot AI Coding Alternatives - Virtualization Howto

- VS Code

CodeGPT - Code like a pro with our AI Copilot!
- Doc: https://docs.codegpt.co/docs/intro
- VSCode: CodeGPT: Chat & AI Agents - Visual Studio Marketplace
Continue
Codeium - 支援 vim/Neovim 編輯器，使用專有模型與 OpenAI API，個人使用免費
Cline - an AI assistant that can use your CLI aNd Editor
- GitHub: https://github.com/cline/cline

- Cursor

PDF Extractor

gptpdf - 使用 OpenAI API 提取 PDF 內容，輸出為 Markdown 格式。
omniparse - PDF to Markdown
- GitHub: https://github.com/adithya-s-k/omniparse
PDF-Extract-Kit - Layout Detection, Formula Detection, Formula Recognition
Marker - Marker converts PDF to markdown quickly and accurately.
- Video: Marker: This Open-Source Tool will make your PDFs LLM Ready - YouTube
Mathpix (cloud)
tabled - 提取表格內容
MarkItDown - Microsoft 開發的各種類型檔案轉換成 Markdown 格式，支援指令與 Python API 兩種方式

Responsible AI

Input Otput Guardrails with llama (ipynb)
Google SynthID Text
- Google開源SynthID Text | iThome
- SynthID: Tools for watermarking and detecting LLM-generated Text

OpenUI - AI自動生成網頁原始碼並即時預覽
- GitHub: https://github.com/wandb/openui
- Video: OpenUI & Llama 3: Effortless Text to Frontend UI Generation - YouTube
Instructor - Structured outputs powered by llms. Designed for simplicity, transparency, and control.
- Welcome To Instructor - Instructor (jxnl.github.io)

Learning AI

AI 常見專用名詞

Gen AI (生成式 AI)

人工智慧 (AI) 藉由使用機器學習與環境互動並執行工作來模擬人類行為，而不需明確指示輸出的內容。

生成式 AI 是人工智慧的分支，可根據自然語言輸入來建立新內容。生成式 AI 通常內建於軟體應用程式中，並使用經過大量文字資料定型的語言模型，以產生人類般的自然語言回應，甚至是原始影像。這類應用程式的其中一個熱門範例是 ChatGPT，這是 OpenAI 所建立的聊天機器人，這是一家與 Microsoft 緊密合作的 AI 研究公司。

生成式 AI 是由遠超過人類一輩子閱讀量的文字、影像、聲音所訓練而成，但卻缺乏普通人類的價值觀及基本判斷能力。「他」就像是個博學多聞、過目不忘的孩子，卻缺乏生活常識；偶爾胡說八道，又時常過度坦誠，因而需要隨時照料。因此，無論單純利用 AI 產生內容，或是將 AI 包裝為自家服務的公司，都應該特別小心謹慎。

LLM (大型語言模型)

語言模型支援的一般自然語言處理(NLP) 工作包括:

文字分析，例如擷取關鍵詞或識別文字中的具名實體。
情感分析和意見挖掘，將文字分類為正面或負面。
機器翻譯，其中文字會自動從一種語言翻譯到另一種語言。
摘要，其中摘要說明大型文字主體的主要重點。
對話式 AI 解決方案，例如 聊天機器人 或 數位助理，其中語言模型可以解譯自然語言輸入，並傳回適當的回應。

其他

Agent (代理/專員): 扮演介於用戶與 AI 之間的中間人，使 LLM 透過外部資源存取、執行指令及管理工作流程來執行更複雜的任務
Token (詞元): 模型能一次處理運算的文字長度單位
Tokenizer (分詞器)
TOPS: AI 性能基礎計算單位，類似遊戲性能的 FPS、磁碟存取性能的 IOPS。

Introduction

Medium Articles

ChiChieh Huang – Medium

Course/HandBook

Google AI Courses for Free

Microsoft

國網中心(NCHC)教學

LLM Tokenizer 分詞器

PyImageSearch 教學 (英文)

AI 各類資源大匯集

Toolify.ai - 收集各種不同類型的 AI 應用網站目錄與詳細資訊
全世界最好的中文LLM资料总结
llm-course - 收集了大量各項有關 AI 的資源，適合進階開發者來這挖寶。
各類中文 LLM 總整理
Open Source LLM Tools
Awesome LLM Apps

資策會

下載指引：下載專區 | 資策會 (iii.org.tw)

生成式AI企業大腦開發指引
生成式AI輔助之軟體開發指引
2023企業應具備的AI素養-生成式AI導入指引

Open Source MLOps platform

Pezzo - A fully cloud-native and open-source LLMOps platform. Seamlessly observe and monitor your AI operations, troubleshoot issues, save up to 90% on costs and latency, collaborate and manage your prompts in one place, and instantly deliver AI changes.
MLflow - Build better models and generative AI apps on a unified, end-to-end,
open source MLOps platform

RedHat AI

Red Hat® Enterprise Linux® AI is a foundation model platform to seamlessly develop, test, and run Granite family large language models (LLMs) for enterprise applications.

Red Hat Enterprise Linux AI brings together:

The Granite family of open source-licensed LLMs, distributed under the Apache-2.0 license with complete transparency on training datasets.
InstructLab model alignment tools, which open the world of community-developed LLMs to a wide range of users.
A bootable image of Red Hat Enterprise Linux, including popular AI libraries such as PyTorch and hardware optimized inference for NVIDIA, Intel, and AMD.
Enterprise-grade technical support and model intellectual property indemnification provided by Red Hat.

URLs:

InstructLab

Command-line interface. Use this to chat with the model or train the model (training consumes the taxonomy data)

What are the components of the InstructLab project?

Taxonomy
InstructLab is driven by taxonomies, which are largely created manually and with care. InstructLab contains a taxonomy tree that lets users create models tuned with human-provided data, which is then enhanced with synthetic data generation.
Command-line interface (CLI)
The InstructLab CLI lets contributors test their contributions using their laptop or workstation. Community members can use the InstructLab technique to generate a low-fidelity approximation of synthetic data generation and model-instruction tuning without access to specialized hardware.
Model training infrastructure
Finally, there’s the process of creating the enhanced LLMs. It takes GPU-intensive infrastructure to regularly retrain models based on new contributions from the community. IBM donates and maintains the infrastructure necessary to frequently retrain the InstructLab project’s enhanced models.

How is InstructLab different from retrieval-augmented generation (RAG)?

RAG is a cost-efficient method for supplementing an LLM with domain-specific knowledge that wasn’t part of its pretraining. RAG makes it possible for a chatbot to accurately answer questions related to a specific field or business without retraining the model. Knowledge documents are stored in a vector database, then retrieved in chunks and sent to the model as part of user queries. This is helpful for anyone who wants to add proprietary data to an LLM without giving up control of their information, or who needs an LLM to access timely information.

This is in contrast to the InstructLab method, which sources end-user contributions to support regular builds of an enhanced version of an LLM. InstructLab helps add knowledge and unlock new skills of an LLM.

It’s possible to "supercharge" a RAG process by using the RAG technique on an InstructLab-tuned model.

URLs:

What is InstructLab? (redhat.com)
InstructLab Community
Quick Start Guide
GitHub: taxonomy
Docs: taxonomy
InstructLab Community Collaboration Spaces

Agent

Tutorials

Microsoft: 10 Lessons teaching everything you need to know to start building AI Agents

AgentGPT

AgentGPT allows you to configure and deploy Autonomous AI agents. Name your own custom AI and have it embark on any goal imaginable.

AgentGPT
GitHub: https://github.com/reworkd/AgentGPT

Camel AI

CAMEL-AI.org is the 1st LLM multi-agent framework and an open-source community dedicated to finding the scaling law of agents.

GitHub: https://github.com/camel-ai/camel

Crew AI

GitHub: https://github.com/joaomdmoura/crewAI
範例：使 AI 自動爬文並生成筆記
- Video: 如何搭建一套Agent系统
- GitHub: Python 程式碼
Crew AI — your own minions. How I Made AI Assistants Do My Work For… | by Csakash | Medium

SWE-agent

SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

SWE-agent documentation (princeton-nlp.github.io)

AutoGen

Enable Next-Gen Large Language Model Applications

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

AutoGPT-Code-Ability

AutoGPT's coding ability is an open-source coding assistant powered by AI. The goal is to make software development more accessible to everyone, regardless of skill level or resources. By generating code in Python, a popular and very accessible language, AutoGPT acts as a virtual co-pilot to help users build projects like backends for existing frontends or command-line tools.

GitHub: https://github.com/Significant-Gravitas/AutoGPT-Code-Ability

Potpie

Potpie is an open-source platform that creates AI agents specialized in your codebase, enabling automated code analysis, testing, and development tasks.

AI Cloud Providers

LLM API

Fireworks.ai - Fast, Affordable, Customizable Gen AI Platform
Abacus.ai - Chat with all LLMs including GPT-x in just one place and cheaper price than OpenAI.
OpenRouter.ai - A unified interface for LLMs
Groq - Fast AI Inference
SiliconFlow, Accelerate AGI to Benefit Humanity
阿里雲Model Studio - 阿里雲

Data Analysis

Julius AI - Analyze your data with computational AI.
Jeda AI

Dev Platform

LightningAI - Code together. Prototype. Train. Deploy. Host AI web apps. From your browser - with zero setup. Alternative to CoLab. 22 Free GPU hours.
- Doc: Overview ⚡Lightning AI - Docs
- Studio Templates

Code Review

CodeRabbit - Cut Code Review Time & Bugs in Half

Monitor AP in developing

Langsmith
LangChain 提供的雲端服務，可用來作程式除錯與監視後端程序，例如 RAG 的檢索資訊過程。
AgentOps AI
Replay Analytics and Debugging, LLM Cost Management, Agent Benchmarking
Langtrace

Prompt Engineering

Prompt Engineering - 提示工程

生成式 AI 應用程式傳回的回應品質不僅取決於模型本身，也取決於其所提供的提示類型。「提示工程」一詞描述提示改善的流程。設計應用程式的開發人員和使用這些應用程式的取用者，都可以考慮使用提示工程來改善生成式 AI 的回應品質。

提示是我們告知應用程式預期執行操作的方式。工程師可以利用提示來新增程式的指示。例如，開發人員可以為教師建置生成式 AI 應用程式，以建立與學生閱讀文字相關的複選問題。在應用程式開發期間，開發人員可以新增其他規則，定義程式應該根據收到的提示執行哪些操作。

為什麼要使用英文 prompt 而不使用中文提問?

英文的訓練語料庫占比超過93%，中文約0.04%，使用英文提問時，它能返回更準確的解答。

ChatGPT在一次輸入中，最多處理4096 token——超過上限的任何字符都將被忽略而不會顯示訊息。相較之下，英文的token使用量少，使用英文提問時，它能給予更多篇幅的答案(中文消耗的token約是英文兩倍)。詳細的可參考OpenAI token解答，以及使用官方的token計算器

返回英文答案後，再使用ChatGPT翻譯成中文就行了。例如：Please write in Traditional Chinese language.

Prompt Fundamentals

Prompt Tips

提示詞框架

Prompt Dev

Prompt Generator

quality-prompts

Coding AI

Instruct

# Prompt
You are an expert programmer that writes simple, concise code and explanations. Write a python function to generate the nth fibonacci number.

Response:

Here is a Python function that generates the nth Fibonacci number:

def fib(n):
    if n <= 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

This function uses the recursive formula for the Fibonacci sequence, which is:

fib(n) = fib(n-1) + fib(n-2)

Code Review

# Prompt
Where is the bug in this code?

def fib(n):
    if n <= 0:
        return n
    else:
        return fib(n-1) + fib(n-2)

Response:

The bug in this code is that it does not handle the case where `n` is equal to 1.

Unit Tests

# Prompt
write a unit test for this function: $(cat fib.py)

Response:

Here's an example of how you could write a unit test for the `fib` function:

```
import unittest

class TestFib(unittest.TestCase):
    def test_fib(self):
        self.assertEqual(fib(0), 0)
        self.assertEqual(fib(1), 1)
        self.assertEqual(fib(2), 1)
        self.assertEqual(fib(3), 2)
        self.assertEqual(fib(4), 3)
        self.assertEqual(fib(5), 5)
        self.assertEqual(fib(6), 8)
        self.assertEqual(fib(7), 13)
        self.assertEqual(fib(8), 21)
        self.assertEqual(fib(9), 34)
if __name__ == '__main__':
    unittest.main()
```

This test case covers all the cases where the input is an integer between 0 and 9, inclusive.

Code Completion

# Prompt
# A simple python function to remove whitespace from a string:

Response:

def remove_whitespace(s):
    return ''.join(s.split())

Infill: <PRE> {prefix} <SUF>{suffix} <MID>

# Prompt
<PRE> def compute_gcd(x, y): <SUF>return result <MID>

Response:

  if x == y:
        return x

    if x > y:
        x = x - y
    else:
        y = y - x

    result = compute_gcd(x, y)

CURSOR Prompt

Cursor Directory

CURSOR_CHAT_PROMPT = '''
System: You are an intelligent programmer, powered by GPT-4. You are happy to help answer any questions that the user has (usually they will be about coding).

1. Please keep your response as concise as possible, and avoid being too verbose.

2. When the user is asking for edits to their code, please output a simplified version of the code block that highlights the changes necessary and adds comments to indicate where unchanged code has been skipped. For example:
```file_path
// ... existing code ...
{{ edit_1 }}
// ... existing code ...
{{ edit_2 }}
// ... existing code ...
```
The user can see the entire file, so they prefer to only read the updates to the code. Often this will mean that the start/end of the file will be skipped, but that's okay! Rewrite the entire file only if specifically requested. Always provide a brief explanation of the updates, unless the user specifically requests only the code.

3. Do not lie or make up facts.

4. If a user messages you in a foreign language, please respond in that language.

5. Format your response in markdown.

6. When writing out new code blocks, please specify the language ID after the initial backticks, like so:
```python
{{ code }}
```

7. When writing out code blocks for an existing file, please also specify the file path after the initial backticks and restate the method / class your codeblock belongs to, like so:
```typescript:app/components/Ref.tsx
function AIChatHistory() {{
    ...
    {{ code }}
    ...
}}
```
User: Please also follow these instructions in all of your responses if relevant to my query. No need to acknowledge these instructions directly in your response.
<custom_instructions>
Respond the code block in English!!!! this is important.
</custom_instructions>

## Current File
Here is the file I'm looking at. It might be truncated from above and below and, if so, is centered around my cursor.

```{file_path}
{file_contents}
```
{user_message}
'''

CURSOR_REWRITE_PROMPT = '''
System: You are an intelligent programmer. You are helping a colleague rewrite a piece of code.

Your colleague is going to give you a file and a selection to edit, along with a set of instructions. Please rewrite the selected code according to their instructions.

Think carefully and critically about the rewrite that best follows their instructions.

The user has requested that the following rules always be followed. Note that only some of them may be relevant to this request:

## Custom Rules
Respond the code block in English!!!! this is important.


User: First, I will give you some potentially helpful context about my code.
Then, I will show you the selection and give you the instruction. The selection will be in `{file_path}`.


-------

## Potentially helpful context

#### file_context_4
{file_context_4}

#### file_context_3
{file_context_3}

#### file_context_2
{file_context_2}

#### file_context_1
{file_context_1}

#### file_context_0
{file_context_0}


This is my current file. The selection will be denoted by comments "Start of Selection" and "End of Selection":
```{file_path}
# Start of Selection
{code_to_rewrite}
# End of Selection

Please rewrite the selected code according to the instructions.
Remember to only rewrite the code in the selection.
Please format your output as:

```
# Start of Selection
# INSERT_YOUR_REWRITE_HERE
# End of Selection

Immediately start your response with
```
'''

For RAG

Rewrite User's Question

Given the following conversation, rewrite the last user input to reflect what the user is actually asking.

User: When was the last time John Doe bought something from us?
AI: John last bought a Fruity Fedora hat from us two weeks ago, on January 3, 2021.
User: How about Emily Doe?


給以下對話，重寫最後一個用戶輸入以反映用實際在問什麼。
{歷史對話}

Responsible AI

Task: Check if there is unsafe content in '$agent_type' messages in conversations according our safety policy with the below categories.
<BEGIN UNSAFE CONTENT CATEGORIES>
O1: Violence and Hate.
Should not
- Help people plan or engage in violence.
- Encourage people to engage in violence.
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
Can
-  Provide information on violence and discrimination.
- Discuss the topics of hate and violence, and can discuss historical events involving violence.
O2: Sexual Content.
Should not
- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
Can
- Discuss topics related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
O3: Criminal Planning.
Should not
- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in criminal activities or condone criminal behavior.
Can
- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
O4: Guns and Illegal Weapons.
Should not
- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
Can
- Discuss firearms and the arguments for and against firearm ownership.
O5: Self-Harm.
Should not
- Encourage people to harm themselves.
- Romanticize or condone self-harm or suicide.
- Provide information regarding the methods of suicide or self-harm.
- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
Can
- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
Should
- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

$prompt

<END CONVERSATION>

Provide your safety assessment for $agent_type in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories.

CoT - Chain of Thought

YT: Cursor编程实战演示——用思维链方法让AI写AI提示词 - YouTube

我要AI幫我{XXXX}，請幫我用Chain-of-Thought來寫AI提示詞。

我要AI幫我從英文翻譯成中文(先直譯，再反思，再意譯)，請幫我用Chain-of-Thought來寫AI提示詞。

我希望翻譯的英文句子如下，請使用上面的提示詞翻譯。

Cheat Sheet

the_prompt_engineering_cheat_sheet.pdf

Function Calling

LLM（Large Language Model）的一個新功能是 Function Calling（函數呼叫）。這項功能允許LLM直接呼叫外部函數，並將其結果整合到模型的輸出中。下面是它的工作原理和應用：

工作原理

Function Calling 的工作原理是，LLM 在生成輸出時，可以 reconocize 具體的函數名稱和參數，並將其傳遞給外部函數執行。外部函數執行後，將結果返回給LLM，LLM 則將其整合到輸出中。這個過程可以重複多次，實現多個函數的呼叫和整合。

範例

例如，假設有一個LLM需要生成一個天氣報告，LLM 可以呼叫一個外部函數，該函數可以從天氣API中獲取当前的天氣資訊。 LLＭ將函數的結果整合到輸出中，生成一個完整的天氣報告。

應用

Function Calling 的應用非常廣泛，以下是一些例子：

數據處理：LLM 可以呼叫外部函數對數據進行處理和分析，例如數據清理、資料轉換等。
API 整合：LLM 可以呼叫外部API，例如天氣API、地圖API等，獲取所需的資訊。
計算任務：LLM 可以呼叫外部函數執行複雜的計算任務，例如科學計算、機器學習等。
自定義功能：LLM 可以呼叫外部函數實現自定義的功能，例如特殊的算法、business logic 等。

總之，Function Calling 是 LLM 的一個強大功能，可以擴展模型的能力，實現更加 Complex 和多樣化的任務。

Tutorials

9 Best Local LLM For Function Calling (Open Source) [2024] - Sci Fi Logic

Models

Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard)

Python Coding

LLM Model API

LMStudio

from langchain.llms import OpenAI

#set llm for langchain using model from lmstudio
llm = OpenAI(
       openai_api_base='http://localhost:1234/v1',
       openai_api_key='NULL'
       )

import streamlit as st
from openai import OpenAI

# Set up the Streamlit App
st.title("ChatGPT Clone using Llama-3 🦙")
st.caption("Chat with locally hosted Llama-3 using the LM Studio 💯")

# Point to the local server setup using LM Studio
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

# Initialize the chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display the chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Accept user input
if prompt := st.chat_input("What is up?"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})
    # Display user message in chat message container
    with st.chat_message("user"):
        st.markdown(prompt)
    # Generate response
    response = client.chat.completions.create(
        model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
        messages=st.session_state.messages, temperature=0.7
    )
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": response.choices[0].message.content})
    # Display assistant response in chat message container
    with st.chat_message("assistant"):
        st.markdown(response.choices[0].message.content)

GPT

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",
    # base_url="...",
    # organization="...",
    # other params...
)

Ollama

from langchain_community.llms import Ollama

llm = Ollama(model="llama2:13b")
llm.invoke("The first man on the moon was ... think step by step")

Chunking/Splitting

中文句子切割

# Unicode 編碼
#   \u3002 全形句號
#   \uff0c 全形逗號
# Get Unicode for specific character
# >>> '，'.encode('unicode-escape') # for py3
# >>> list(u'，') # for py2

import re
text = "這是中文句子。第一段，第二段，第三段。"
chunks = re.split('[\u3002\uff0c]', text)
#print("\n\n".join([chunk for chunk in chunks]))
for chunk in chunks:
    print("---" * 10)
    print(chunk)

英文句子切割

# \s+ 單或多個空白
chunks = re.split(r'(?<=[.?!])\s+', text)

LLM Engine

A software that can load the LLM Models

LLM Engine

Open WebUI

A Web UI Tool for Ollama

URLs

https://openwebui.com/
GitHub: https://github.com/open-webui/open-webui
Docs: https://docs.openwebui.com/

Installation

Installing Both Open WebUI and Ollama Together:

# With GPU Support
docker run -d -p 3000:8080 --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

# For CPU only
docker run -d -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

LLM Engine

Kuwa Gen AI OS

一個自由、開放、安全且注重隱私的生成式人工智慧服務系統，包括友善的大語言模型使用介面，以及能支援生成式人工智慧應用的新型GenAI核心。

🌐 提供多語言GenAI開發與部署的整體解決方案，支援Windows及Linux
💬 提供群聊、引用、完整 Prompt 列表的匯入/匯出/分享等友善使用功能
🔄 可靈活組合 Prompt x RAGs x Bot x 模型 x 硬體/GPUs以滿足應用所需
💻 支援從虛擬主機、筆記型電腦、個人電腦、地端伺服器到公私雲端的各種環境
🔓 開放原始碼，允許開發人員貢獻並根據自己的需求打造自己的客製系統

URLs

LLM Engine

AnythingLLM

The ultimate AI business intelligence tool. Any LLM, any document, full control, full privacy.

AnythingLLM is a "single-player" （單機個人）application you can install on any Mac, Windows, or Linux operating system and get local LLMs, RAG, and Agents with little to zero configuration and full privacy.

AnythingLLM 也有自架網站版，見文章下方的連結。

You can install AnythingLLM as a Desktop Application, Self Host it locally using Docker and Host it on cloud (aws, google cloud, railway etc..) using Docker

You want AnythingLLM Desktop if...

You want a one-click installable app to use local LLMs, RAG, and Agents locally
You do not need multi-user support
Everything needs to stay only on your device
You do not need to "publish" anything to the public internet. Eg: Chat widget for website

URLs

LLM Engine

Ollama

Run Llama 3, Phi 3, Mistral, Gemma, and other models. Customize and create your own.

Installation

ollama + open webui

mkdir ollama-data download open-webui-data

docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - 11434:11434
    volumes:
      - ./ollama-data:/root/.ollama
      - ./download:/download
    container_name: ollama
    pull_policy: always
    tty: true
    restart: always
    networks:
      - ollama-docker

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - ./open-webui-data:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped
    networks:
      - ollama-docker

networks:
  ollama-docker:
    external: false

ollama

mkdir ollama-data download

docker run --name ollama -d --rm \
    -v $PWD/ollama-data:/root/.ollama \
    -v $PWD/download:/download \
    -p 11434:11434 \
    ollama/ollama

Models

List Models Installed

ollama list

Load a GGUF model manually

ollama create <my-model-name> -f <modelfile>

Page Assist

Page Assist is an open-source Chrome Extension that provides a Sidebar and Web UI for your Local AI model.

Video: This Chrome Extension Surprised Me - YouTube

LLM Engine

LM Studio

Discover, download, and run local LLMs.

With LM Studio, you can ...

🤖 - Run LLMs on your laptop, entirely offline

👾 - Use models through the in-app Chat UI or an OpenAI compatible local server

📂 - Download any compatible model files from HuggingFace

🤗 repositories

🔭 - Discover new & noteworthy LLMs in the app's home page

URLs

LLM Engine

OpenLLM

OpenLLM helps developers run any open-source LLMs, such as Llama 2 and Mistral, as OpenAI-compatible API endpoints, locally and in the cloud, optimized for serving throughput and production deployment.

Install

pip install openllm

Start a LLM Server

openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

To interact with the server, you can visit the web UI at http://localhost:3000/ or send a request using curl. You can also use OpenLLM’s built-in Python client to interact with the server:

import openllm

client = openllm.HTTPClient('http://localhost:3000')
client.generate('Explain to me the difference between "further" and "farther"')

OpenAI Compatible Endpoints

import openai

client = openai.OpenAI(base_url='http://localhost:3000/v1', api_key='na')  # Here the server is running on 0.0.0.0:3000

completions = client.chat.completions.create(
  prompt='Write me a tag line for an ice cream shop.', model=model, max_tokens=64, stream=stream
)

LangChain

from langchain.llms import OpenLLMAPI

llm = OpenLLMAPI(server_url='http://44.23.123.1:3000')
llm.invoke('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')

# streaming
for it in llm.stream('What is the difference between a duck and a goose? And why there are so many Goose in Canada?'):
  print(it, flush=True, end='')

# async context
await llm.ainvoke('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')

# async streaming
async for it in llm.astream('What is the difference between a duck and a goose? And why there are so many Goose in Canada?'):
  print(it, flush=True, end='')

LLM Engine

Bechmark

Benchmark for LLM engines

bench.py

import aiohttp
import asyncio
import time
from tqdm import tqdm

import random

questions = [
    "Why is the sky blue?", "Why do we dream?", "Why is the ocean salty?", "Why do leaves change color?",
    "Why do birds sing?", "Why do we have seasons?", "Why do stars twinkle?", "Why do we yawn?",
    "Why is the sun hot?", "Why do cats purr?", "Why do dogs bark?", "Why do fish swim?",
    "Why do we have fingerprints?", "Why do we sneeze?", "Why do we have eyebrows?", "Why do we have hair?",
    "Why do we have nails?", "Why do we have teeth?", "Why do we have bones?", "Why do we have muscles?",
    "Why do we have blood?", "Why do we have a heart?", "Why do we have lungs?", "Why do we have a brain?",
    "Why do we have skin?", "Why do we have ears?", "Why do we have eyes?", "Why do we have a nose?",
    "Why do we have a mouth?", "Why do we have a tongue?", "Why do we have a stomach?", "Why do we have intestines?",
    "Why do we have a liver?", "Why do we have kidneys?", "Why do we have a bladder?", "Why do we have a pancreas?",
    "Why do we have a spleen?", "Why do we have a gallbladder?", "Why do we have a thyroid?", "Why do we have adrenal glands?",
    "Why do we have a pituitary gland?", "Why do we have a hypothalamus?", "Why do we have a thymus?", "Why do we have lymph nodes?",
    "Why do we have a spinal cord?", "Why do we have nerves?", "Why do we have a circulatory system?", "Why do we have a respiratory system?",
    "Why do we have a digestive system?", "Why do we have an immune system?"
]

async def fetch(session, url):
    """
    参数:
        session (aiohttp.ClientSession): 用于请求的会话。
        url (str): 要发送请求的 URL。
    
    返回:
        tuple: 包含完成 token 数量和请求时间。
    """
    start_time = time.time()

    # 随机选择一个问题
    question = random.choice(questions) # <--- 这两个必须注释一个

    # 固定问题                                 
    # question = questions[0]             # <--- 这两个必须注释一个

    # 请求的内容
    json_payload = {
        "model": "llama3:8b-instruct-fp16",
        "messages": [{"role": "user", "content": question}],
        "stream": False,
        "temperature": 0.7 # 参数使用 0.7 保证每次的结果略有区别
    }
    async with session.post(url, json=json_payload) as response:
        response_json = await response.json()
        end_time = time.time()
        request_time = end_time - start_time
        completion_tokens = response_json['usage']['completion_tokens'] # 从返回的参数里获取生成的 token 的数量
        return completion_tokens, request_time

async def bound_fetch(sem, session, url, pbar):
    # 使用信号量 sem 来限制并发请求的数量，确保不会超过最大并发请求数
    async with sem:
        result = await fetch(session, url)
        pbar.update(1)
        return result

async def run(load_url, max_concurrent_requests, total_requests):
    """
    通过发送多个并发请求来运行基准测试。
    
    参数:
        load_url (str): 要发送请求的URL。
        max_concurrent_requests (int): 最大并发请求数。
        total_requests (int): 要发送的总请求数。
    
    返回:
        tuple: 包含完成 token 总数列表和响应时间列表。
    """
    # 创建 Semaphore 来限制并发请求的数量
    sem = asyncio.Semaphore(max_concurrent_requests)
    
    # 创建一个异步的HTTP会话
    async with aiohttp.ClientSession() as session:
        tasks = []
        
        # 创建一个进度条来可视化请求的进度
        with tqdm(total=total_requests) as pbar:
            # 循环创建任务，直到达到总请求数
            for _ in range(total_requests):
                # 为每个请求创建一个任务，确保它遵守信号量的限制
                task = asyncio.ensure_future(bound_fetch(sem, session, load_url, pbar))
                tasks.append(task)  # 将任务添加到任务列表中
            
            # 等待所有任务完成并收集它们的结果
            results = await asyncio.gather(*tasks)
        
        # 计算所有结果中的完成token总数
        completion_tokens = sum(result[0] for result in results)
        
        # 从所有结果中提取响应时间
        response_times = [result[1] for result in results]
        
        # 返回完成token的总数和响应时间的列表
        return completion_tokens, response_times

if __name__ == '__main__':
    import sys

    if len(sys.argv) != 3:
        print("Usage: python bench.py <C> <N>")
        sys.exit(1)

    C = int(sys.argv[1])  # 最大并发数
    N = int(sys.argv[2])  # 请求总数

    # vllm 和 ollama 都兼容了 openai 的 api 让测试变得更简单了
    url = 'http://localhost:11434/v1/chat/completions'

    start_time = time.time()
    completion_tokens, response_times = asyncio.run(run(url, C, N))
    end_time = time.time()

    # 计算总时间
    total_time = end_time - start_time
    # 计算每个请求的平均时间
    avg_time_per_request = sum(response_times) / len(response_times)
    # 计算每秒生成的 token 数量
    tokens_per_second = completion_tokens / total_time

    print(f'Performance Results:')
    print(f'  Total requests            : {N}')
    print(f'  Max concurrent requests   : {C}')
    print(f'  Total time                : {total_time:.2f} seconds')
    print(f'  Average time per request  : {avg_time_per_request:.2f} seconds')
    print(f'  Tokens per second         : {tokens_per_second:.2f}')

LLM Engine

LocalAI

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.

OpenAI Proxy

Proxy Server to call 100+ LLMs in a unified interface & track spend, set budgets per virtual key/user

Features:

Unified Interface: Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
Cost tracking: Authentication, Spend Tracking & Budgets Virtual Keys
Load Balancing: between Multiple Models + Deployments of the same model - LiteLLM proxy can handle 1.5k+ requests/second during load tests.

企業在導入 LLM 時，可能會用到多種不同的模型，這些包含商用授權與開源授權以及來自不同的服務商。為了統一管理及開發應用這些各類不同模型，建議使用 OpenAI Proxy 這個平台來解決，以達到下列目的：

統一 API 介接入口與格式

成本追蹤

平衡負載

Doc: https://docs.litellm.ai/docs/simple_proxy

Xinference

Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications.

NVIDIA NIM

Explore the latest community-built AI models with an API optimized and accelerated by NVIDIA, then deploy anywhere with NVIDIA NIM inference microservices.

text-generation-webui

A Gradio web UI for Large Language Models.

只能執行本地模型，不支援外部模型 API。

支援以下多重功能的 AI 平台

Chat
Fine-Tune Model
Multiple model backends: Transformers, llama.cpp (through llama-cpp-python), ExLlamaV2, AutoGPTQ, AutoAWQ, GPTQ-for-LLaMa, QuIP#.
OpenAI-compatible API server with Chat and Completions endpoints

教學

GitHub: https://github.com/oobabooga/text-generation-webui
GitHub: https://github.com/Atinoda/text-generation-webui-docker
大型語言模型LLMs課程教學課程大綱 (三) - HackMD
- YOUTUBE [啟動 TextGen]
- YOUTUBE [上架大型語言模型]
- YOUTUBE [指派AI人設]
- YOUTUBE [模型微調]
- YOUTUBE [上架微調模型]
- 程式碼 Z01_TextGen_Colab.ipynb
- 預設密碼在程式碼裡面 (account:nchc password:nchc) 請自行修改

AI Translator

使用 LLM 實現語言翻譯

PDFMathTranslate

完整保留排版的 PDF 檔案全文雙語翻譯，支援 Google/DeepL/Ollama/OpenAI 翻譯。

GitHub: https://github.com/Byaidu/PDFMathTranslate

LiteLLM + 反思提示 + 工作流

GitHub: https://github.com/wshuyi/workflows_with_litellm_pub

Translation Agent

GitHub: https://github.com/andrewyng/translation-agent

RTranslator

RTranslator is an (almost) open-source, free, and offline real-time translation app for Android.

GitHub: https://github.com/niedev/RTranslator

沉浸式翻譯

一款免費的，好用的，沒有廢話的，革命性的，飽受讚譽的，AI 驅動的雙語網頁翻譯擴展，幫助你有效地打破資訊差，在手機上也可以用！

沉浸式翻譯- 雙語對照網頁翻譯外掛_PDF文件翻譯工具
FluentRead - 流畅阅读是一款高效的浏览器翻译插件

影片/字幕

pyVideoTrans视频翻译配音

一键字幕生成+字幕翻译+创建配音+合成 = 带字幕和配音的新视频

VideoLingo

Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

VideoLingo
GitHub: https://github.com/Huanshere/VideoLingo

SubtitleEdit

使用 .Net 開發，適合 Windows 用戶，AI 生成/翻譯字幕，字幕編輯功能豐富。

bilingual_book_maker

電子書翻譯

GitHub: https://github.com/yihong0618/bilingual_book_maker

MTranServer

自行部署離線的翻譯伺服器，翻譯軟體可用沉浸式翻譯與簡約翻譯。

https://github.com/xxnuo/MTranServer

AiNiee

一款專注於Ai翻譯的工具，一鍵自動翻譯RPG SLG游戲，Epub TXT小說，Srt Vtt Lrc字幕，Word MD 檔案等等復雜長文字。

Jupyter Notebook

Installation

With pip

pip install notebook

Python Virtual Environment

With Python Venv

mkdir my-rag
cd my-rag
python -m venv .venv
source .venv/bin/activate
(my-rag)> pip install --upgrade pip
(my-rag)> pip install notebook
(my-rag)> jupyter notebook

With Conda

conda create -n my-rag python=3.10
conda activate my-rag
(my-rag)> pip install --upgrade pip
(my-rag)> pip install notebook
(my-rag)> jupyter notebook

UI 可切換不同虛擬環境（需要先建立不同的 ipykernel）

mkdir my-rag
cd my-rag
python -m venv .venv
source .venv/bin/activate
(my-rag)> pip install --upgrade pip
(my-rag)> pip install ipykernel
(my-rag)> ipython kernel install --user --name="my-rag-kernel"
(my-rag)> jupyter notebook

Resources

Online Test
nbviewer - A simple way to share Jupyter Notebooks

CoLab by Google

Google Colab is Jupyter Notebooks that are hosted by Google’s Colaboratory
Overview of Colaboratory Features
Installing and using Python libraries in Colab
Using Google Colab with GitHub
Google Colab Tips for Power Users

LangChain

LangChain 是一個旨在為開發者提供一套工具和程式介接，以便更容易、更有效地利用大型語言模型（LLM）的開源開發框架，專注於情境感知和推理。它包含多個組件，如 Python 和 JavaScript 的函式庫、快速部署的模板、用於開發REST API的 LangServe，以及用於除錯和監控的 LangSmith。LangChain 簡化了開發、生產和部署過程，提供與語言模型互動、執行檢索策略和輔助建立複雜應用架構的工具。

LangSmith

LangChain 提供的雲端服務，可用來作程式除錯與監視後端程序，例如 RAG 的檢索資訊過程。

RAG

Retrievers in LCEL

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()


def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

chain.invoke("What did the president say about technology?")

ChatPromptTemplate

few_shot_examples = [
{"input":"Could you please clarify the terms outlined in section 3.2 of the contract?",
"output":"Certainly, I will provide clarification on the terms in section 3.2."},
{"input":"We are interested in extending the payment deadline to 30 days instead of the current 15 days. Additionally, we would like to add a clause regarding late payment penalties.",
"output":"Our request is to extend the payment deadline to 30 days and include a clause on late payment penalties."},
{"input":"""The current indemnification clause seems too broad. We would like to narrow it down to cover only direct damages and exclude consequential damages.
Additionally, we propose including a dispute resolution clause specifying arbitration as the preferred method of resolving disputes.""",
"output":"""We suggest revising the indemnification clause to limit it to covering direct damages and excluding consequential damages.
Furthermore, we recommend adding a dispute resolution clause that specifies arbitration as the preferred method of resolving disputes."""},
{"input":"I believe the proposed changes are acceptable.",
"output":"Thank you for your feedback. I will proceed with implementing the proposed changes."}
]

few_shot_template = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}")
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=few_shot_template,
    examples=few_shot_examples,
)

print(few_shot_prompt.format())

Loader

Web

from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

Text

from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("../", glob="**/*.md")
docs = loader.load()
len(docs)
print(docs[0].page_content[:100])

from langchain.document_loaders import TextLoader

dataset_folder_path='/path/to/dataset/'
documents=[]
for file in os.listdir(dataset_folder_path):
  loader=TextLoader(dataset_folder_path+file)
  documents.extend(loader.load())
  
print(documents[:3])

Markdown

'''
%pip install "unstructured[md]"
'''
from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

data = loader.load()
assert len(data) == 1
readme_content = data[0].page_content
print(readme_content[:3])

PDF + Text

from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders import PyPDFLoader

documents = []
for filename in SAMPLEDATA:
    path = os.path.join(os.getcwd(), filename)

    if filename.endswith(".pdf"):
        loader = PyPDFLoader(path)
        new_docs = loader.load_and_split()
        print(f"Processed pdf file: {filename}")
    elif filename.endswith(".txt"):
        loader = TextLoader(path)
        new_docs = loader.load_and_split()
        print(f"Processed txt file: {filename}")
    else:
        print(f"Unsupported file type: {filename}")

    if len(new_docs) > 0:
        documents.extend(new_docs)

SAMPLEDATA = []

print(f"\nProcessing done.")

常用函式

格式化輸出

# Helper function for printing docs
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

Finance AI

OpenBB

Investment research made easy with AI.

StockBot

GitHub: https://github.com/bklieger-groq/stockbot-on-groq

FinGPT

Semantic Kernel

Semantic Kernel 是一款由微軟開放的輕量級的 AI 開發套件(框架)，可讓您輕鬆建立 AI 代理，並將最新的 AI 模型整合到您的 C#、Python 或 Java 程式碼庫中。它可作為有效率的中介軟體，讓您快速交付企業級解決方案。

微軟教學：

中文教學：

Semantic Kernel的實踐：開發前準備-認識OpenAI與AOAI的模型
[Video][日語中字] https://www.youtube.com/watch?v=sByJwdJhc3s&t=45s

Legal AI

法律 AI

Legal Assistant

ailaw - AI 法律助手
律果科技 - 打造出全球唯一能通過臺灣律師第一試的LLM
- 練成第一款超懂臺灣法律的LLM，律果實現法務GAI應用新可能 | iThome

Gen AI

LLM Models

Chinese LLMs

Code LLMs

LLM Evaluation

LLM Monitor

Function Calling LLMs

Content Safty

Calculate VRAM required for LLM

Voice

Gen Audio

Instant voice cloning

Text to Speech (TTS)

RAG

Introduction

Tutorials

Embedding/Rerank Models

Vector Databases

Advanced RAG

RAG Projects

Danswer

Embedchain

GraphRAG

Verba

PrivateGPT

LLMWare

talkd/dialog

RAG 評估

URLs

Fine-Tune

模型微調工作流程

準備資料集

格式

Tools & Platform

Unsloth

Atlas

AnythingLLM

LLaMA-Factory

outlines

InstructLab (IBM)

Models

Gemini-Pro

Mistral

AI Applications

Cherry Studio

Elicit - 論文分析

Merlinn - open-source AI on-call developer

aidocx

ASR - Automatic Speech Recognition

Translator - 翻譯機

WrenAI - text-to-SQL

Chatbox

QAnything

GPT Academic

HivisionIDPhoto

KHOJ

Presentation AI

AI Dev

AI Develop Framework

- LlamaIndex

Data Analysis (Chat with CSV)

- PandasAI

Chat with Dataset

Web Scraper

- Crawlee

- ScrapeGraphAI

- Crew AI

LLM API

- OpenAI API

- Gemini API

Web UI Framework

- Gradio

- Streamlit

AI Memory

AI Coding

- Alternative to GitHub Copilot

- VS Code

- Cursor

PDF Extractor

Responsible AI