Gen AI
生成式人工智慧
- LLM Models
- Voice
- RAG
- Fine-Tune
- AI Applications
- AI Dev
- Learning AI
- RedHat AI
- Agent
- AI Cloud Providers
- Prompt Engineering
- Function Calling
- Python Coding
- LLM Engine
- AI Translator
- Jupyter Notebook
- LangChain
- Finance AI
- Semantic Kernel
LLM Models
Chinese LLMs
- Taiwan LLM - Project TAME (TAiwanese Mixture of Experts)
- TAIDE - Trustworthy AI Dialogue Engine
- 01.AI - Yi
- CKIP-Llama-2-7b 是中央研究院詞庫小組(CKIP)開發的開源可商用繁體中文大型語言模型,以商用開源模型Llama-2-7b以及Atom-7b為基礎,再補強繁體中文的處理能力,並對405個可商用的任務檔案同步進行訓練優化,參數量達70億(7 billion)。
- Qwen - 阿里雲通義千問
- GLM-4 - 智譜 AI 推出的中文多語言模型
- Chinese-Mixtral
- DeepSeek - 深度求索
Code LLMs
- Granite - Open sourcing IBM’s Granite code models
- Codestral - Mistral's first generative AI model for code
LLM Evaluation
- PromptBench: A Unified Library for Evaluating and Understanding Large Language Models.
- AI產品與系統評測中心: AI評測模擬測試題庫.xlsx
LLM Monitor
- Opik is an open-source platform for evaluating, testing and monitoring LLM applications.
Function Calling LLMs
Content Safty
- Google ShieldGemma
ShieldGemma則是個安全分類模型,可額外部署在模型的輸入及輸出端,用以過濾有害內容,它主要篩選4大領域的內容,包括仇恨言論、騷擾、裸露的色情內容,以及危險內容。
Calculate VRAM required for LLM
- 如何計算 Model 需要多少 GPU VRAM
- Calculates how much GPU memory you need and how much token/s you can get for any LLM & GPU/CPU
- LLM RAM Calculator
Voice
Gen Audio
- Stability AI
- FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
- GitHub: https://github.com/FunAudioLLM
Instant voice cloning
Text to Speech (TTS)
- ChatTTS
- MARS 5
- edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
- fish-speech
RAG
檢索增強生成 - Retrieval Augmented Generation
RAG 主要用來解決大型語言模型(LLM)實際應用時的兩大侷限:幻覺/錯覺(hallucination)與資料時限。RAG 結合「資訊檢索(retrieval)」和「生成(generation)」:在文字生成之前,先從資料庫中檢索相關的資料放入上下文,以確保 LLM 可依照正確的最新資訊生成結果。
RAG 優點:
- 降低 AI 幻覺
- 提升資料數據安全
- 減少模型微調
- 改善資料時限
流程示意圖
Introduction
Tutorials
Introduction to RAG
- ollama + Langchain + Gradio RAG 程式碼範例
- A flexible Q&A-chat-app for your selection of documents with langchain, Streamlit and chatGPT | by syrom | Medium
- 【圖解】4步驟教人資打造AI法律顧問!讓你的ChatGPT不再胡說八道|數位時代 BusinessNext (bnext.com.tw)
- 創建本地PDF Chatbot with Llama3 & RAG技術 #chatbot #chatgpt #llama3 #rag #chatpdf - YouTube
- 一些程式範例:https://github.com/Shubhamsaboo/awesome-llm-apps
- Easy AI/Chat For Your Docs with Langchain and OpenAI in Python
- RAG共学一:16个问题帮你快速入门RAG
- YT:RAG共学一:16个问题帮你快速入门RAG - YouTube
- 全端 LLM 應用開發-Day26-用 Langchain 來做 PDF 文件問答 - iT 邦幫忙::一起幫忙解決難題,拯救 IT 人的一天 (ithome.com.tw)
- RAG實作教學,LangChain + Llama2 |創造你的個人LLM | by ChiChieh Huang | Medium
- Python RAG Tutorial (with Local LLMs): AI For Your PDFs - YouTube
- 對 PDF 的文字、表格與圖片向量化進行檢索
Embedding/Rerank Models
Vector Databases
- Qdrant - 一個開源的向量搜索引擎,旨在處理高維數據。有GUI管理介面。
- Chroma
- VectorAdmin - 向量資料庫管理介面 (嵌入模型僅支援 OpenAI)
- Pinecone (Cloud)
- Supabase (Cloud)
- Astra DB (Cloud)
Advanced RAG
- RAG 優化技巧| 7 大挑戰與解決方式 | 增進你的 LLM. 儘管 LLM + RAG 的能力已經令人驚嘆,但我們在使用 RAG 優化… | by ChiChieh Huang | Medium
- ReRank
- Advanced RAG: MultiQuery and ParentDocument | RAGStack | DataStax Docs
- Advanced Retrieval With LangChain (ipynb)
- Advanced RAG Implementation using Hybrid Search, Reranking with Zephyr Alpha LLM | by Nadika Poudel | Medium
- Five Levels of Chunking Strategies in RAG| Notes from Greg’s Video | by Anurag Mishra | Medium
- Chunking/Splitting
- Mastering RAG: Advanced Chunking Techniques for LLM Applications - Galileo (rungalileo.io)
- 5 Levels Of Text Splitting (ipynb)
- [中文] Semantic Chunking
- 使用繁體中文評測 RAG 的 Chunking 切塊策略
- Chunking Evaluation
- Online Tools
- 15 Chunking Techniques to Build Exceptional RAGs Systems
- chonkie - The no-nonsense RAG chunking library
- Advanced RAG: Query Expansion
- Cohere Cookbooks
RAG Projects
Danswer
Danswer is the AI Assistant connected to your company's docs, apps, and people. Danswer provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud.
Embedchain
Embedchain streamlines the creation of personalized LLM applications, offering a seamless process for managing various types of unstructured data.
GraphRAG
微軟開源一個基於圖譜的檢索與推理增強的解決方案。GraphRAG 透過從預檢索、後檢索到提示壓縮的過程中考慮知識圖譜的檢索與推理,為回答生成提供了一種更精準和相關的方法。
- Get Started
- GitHub: https://github.com/microsoft/graphrag
- YT: Microsoft GraphRAG | 基于知识图谱的RAG套件,构建更完善的知识库 - YouTube
- GitHub: GraphRAG Local with Ollama and Gradio UI
- YT: 颠覆传统RAG!GraphRAG结合本地大模型:Gemma 2+Nomic Embed齐上阵,轻松掌握GraphRAG+Chainlit+Ollama技术栈 #graphrag #ollama #ai - YouTube
- GitHub: GraphRAG + AutoGen + Ollama + Chainlit UI = Local Multi-Agent RAG Superbot
- Doc: GenAI Ecosystem - Neo4j Labs
- 中文: 生成式 AI 的資料救星!GraphRAG 知識圖譜革命,大幅提升 LLM 準確度! | T客邦 (techbang.com)
- NeoConverse - Graph Database Search with Natural Language - Neo4j Labs
- LangChain: Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs (langchain.dev)
- Build a Question Answering application over a Graph Database | 🦜️🔗 LangChain
- LangChain: https://neo4j.com/labs/genai-ecosystem/langchain/
- https://github.com/neo4j-labs/llm-graph-builder
- ipynb: https://github.com/tomasonjo/blogs/blob/master/llm/enhancing_rag_with_graph.ipynb
Verba
Verba is a fully-customizable personal assistant for querying and interacting with your data, either locally or deployed via cloud. Resolve questions around your documents, cross-reference multiple data points or gain insights from existing knowledge bases. Verba combines state-of-the-art RAG techniques with Weaviate's context-aware database. Choose between different RAG frameworks, data types, chunking & retrieving techniques, and LLM providers based on your individual use-case.
- Github: Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
- Weaviate is an open source, AI-native vector database
- Video: Open Source RAG with Ollama - YouTube
PrivateGPT
- Introduction – PrivateGPT | Docs
- GitHub: https://github.com/zylon-ai/private-gpt
- Video: PrivateGPT 2.0 - FULLY LOCAL Chat With Docs (PDF, TXT, HTML, PPTX, DOCX, and more) - YouTube
- Video: Installing Private GPT to interact with your own documents!! - YouTube
LLMWare
The Ultimate Toolkit for Enterprise RAG Pipelines with Small, Specialized Models.
talkd/dialog
Talkd.ai—Optimizing LLMs with easy RAG deployment and management.
RAG 評估
評估生成(Generation)指標
- 忠誠度(Faithfulness)
忠誠度是評估 RAG 模型生成答案的真實度和可靠性的關鍵指標。它主要衡量生成答案與給定上下文事實之間的一致性。忠誠度高的答案意味著模型能夠準確地從給定的上下文中提取信息,並生成與事實一致的回答。這對於保證生成內容的質量和信任度至關重要。 - 答案相關性(Answer Relevancy)
答案相關性則重點衡量生成答案與用戶提問的匹配程度。高相關性的答案不僅要求模型能夠理解用戶的問題,還要求其能夠生成與問題密切相關的回答。這直接影響到用戶的滿意度和模型的實用性。 - 答案正確性(Answer Correctness)
答案正確性是衡量生成的答案與已知的“地面真相”答案之間的一致性。計算方法是評估生成答案的準確度,即答案與真實答案的一致性。技術達成的方式可以通過比較生成答案與真實答案的文字相似度來完成,這類似於答案相關性,但更側重於答案的準確性。
評估檢索(Retrieval)指標
- 上下文召回率(Context Recall)
上下文召回率關注於模型在檢索過程中能否准確地找到與問題相關的上下文訊息。一個高召回率的模型能夠從大量數據中有效地過濾出最相關的訊息,這是提升問答系統準確性和效率的關鍵。 - 上下文精確度(Context Precision)
上下文精確度是衡量RAG系統在回答問題時使用的上下文資料的相關性。計算方法是確定RAG系統為回答特定問題而選擇的上下文資料與問題的相關性。技術達成的方式通常涵蓋比較RAG選擇的上下文與一組預先定義的相關上下文,計算這些上下文在生成答案時的重要性。
URLs
- Ragas - 🚀 Get Started | Ragas
- LLM Hallucination Index RAG Special - Galileo - Galileo (rungalileo.io)
Fine-Tune
模型微調工作流程
- 準備資料集(訓練資料)
- 準備基礎模型
- 匯入資料集
- 開始微調作業 (Fine-Tune)
- 評估新模型損失曲線
- 以新模型做實際推論
準備資料集
開始微調模型之前,您必須先建立用來微調模型的資料集。為獲得最佳效能,資料集內的範例必須具有高品質、多元且代表真實輸入和輸出的要素。
格式
資料集中包含的範例應符合您預期的實際工作環境流量。如果您的資料集含有特定格式、關鍵字、操作說明或資訊,則實際工作環境資料的格式應相同,並含有相同的指示。 例如,如果資料集中的範例包含 "question:
" 和 "context:
",則實際工作環境流量也應一併設定包含 "question:
" 和 "context:
" 的格式,且順序應與資料集範例中的順序相同。如果您排除結構定義,即使資料集的範例包含確切的問題,模型將無法辨識模式。
在資料集中的每個範例中加入提示或前置碼,也有助於改善調整後模型的效能。請注意,如果資料集中包含提示或前置碼,那麼在推論時向已調整的模型發出提示時,也應包含該提示或前置碼。
- YT: Fintune Falcon Model
- LaWGPT - 微調具中文法律(中國)知識的大語言模型
Tools & Platform
Unsloth
Unsloth - Easily finetune & train LLMs
微調模型專用的 Python 函式庫,在地端使用 GPU 資源對各種 Open Source 模型進行微調作業。
- GitHub: https://github.com/unslothai/unsloth
- YT: https://www.youtube.com/watch?v=LPmI-Ok5fUc
- YT: 最新Mistral V3模型,免費微調模型 勞基法實作 #ai #llm #mistral #mistral7b #finetune #人工智能 #人工智慧 #nlp #embedding - YouTube
- Colab: How to Finetune Llama-3 and Export to Ollama | Unsloth Docs
- Colab: Fine-Tuning Llama-2 LLM on Google Colab: A Step-by-Step Guide. | by Gathnex | Medium
Atlas
Atlas by NOMIC - 資料集(非結構化資料)品質檢測服務
AnythingLLM
具有 Chat/Fine-Tune/Multi-Model 多功能的平台
LLaMA-Factory
- GitHub: https://github.com/hiyouga/LLaMA-Factory
- YT: Llama3 中文版模型微调笔记,小白也能学会 - YouTube
- YT: 【LLaMA-Factory】開源語言模型微調專案 方便微調各種大型語言模型|內建WebUI 方便使用|內建多種訓練方式讓使用者選擇 - YouTube
outlines
生成結構化文字資料。可用於微調模型前的資料集預處理。
InstructLab (IBM)
- A new way to collaboratively customize LLMs - IBM Research
- GitHub: https://github.com/instructlab
- Doc: Welcome to InstructLab! - docs.instructlab.ai
- HF: instructlab (InstructLab) (huggingface.co)
- Community: community/Collaboration.md at main · instructlab/community · GitHub
Models
Gemini-Pro
要微調 Gemini-Pro 模型,有三種不同方式呼叫 Gemini API 來做微調作業,Google AI Studio、Python SDK、REST API (curl)。
Mistral
官方 Mistral AI 推出微調用 SDK 與 API。
AI Applications
Cherry Studio
Cherry Studio is a desktop client that supports for multiple LLM providers, available on Windows, Mac and Linux.
- Support for Multiple LLM Providers.
- Allows creation of multiple Assistants.
- Enables creation of multiple topics.
- Allows using multiple models to answer questions in the same conversation.
- Supports drag-and-drop sorting.
- Code highlighting.
- Mermaid chart
Elicit - 論文分析
- 雲端服務,免安裝,零元基本方案
- 可用中文分析、檢索、比對及總結多個論文。
- Elicit: The AI Research Assistant
- YT: 生成式AI應用課程(I)-如何以超人的速度來進行學術期刊論文研究 - YouTube
Merlinn - open-source AI on-call developer
aidocx
運用 AI 自動生成特定知識的技術書籍(*.epub)
- aidocx: 知識擷取小幫手 :: Learn with AI (learninfun.github.io)
- GitHub: learninfun/aidocx: A tool to extract knowledge from AI (github.com)
ASR - Automatic Speech Recognition
- FrogBase - OpenAI 影片逐字稿生成與翻譯
- InstantID - 文字生成圖像 AI,個人風格頭像生成
- WhisperDesktop - 影片生成字幕逐字稿,For Windows Only
- OpenAI Whisper
- Whisper WebUI - 網頁操作介面
- WhisperX - 比 whisper large-v2 快 70 倍
- Fast Whisper - 比 OpenAI Whisper 的速度快,資源消耗較低
Translator - 翻譯機
- OpenAI Translator - 基於 ChatGPT API 的翻譯擴充功能,Chrome、Edge 都能用
WrenAI - text-to-SQL
WrenAI is a text-to-SQL solution for data teams to get results and insights faster by asking business questions without writing SQL.
- GitHub: https://github.com/Canner/WrenAI
Chatbox
Chatbox支援多款全球最先進的AI大模型服務,支援Windows、Mac和Linux。AI提升工作效率,深受全世界專業人士的好評。
- 取代難用的 ChatGPT 或其他網頁聊天介面。
- 可自訂多個不同 AI 助手。
- 操作介面簡潔又實用。
- 跨平台支援(Linux/Windows/Mac)
- 支援 OpenAI/Gemini/Ollama/Groq 等模型 API
- 支援繁中等多國語言
QAnything
開源的企業級本地知識庫問答及應用
- QAnything
- Doc: https://qanything.ai/docs/introduce
- GitHub: https://github.com/netease-youdao/QAnything
GPT Academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
HivisionIDPhoto
一个轻量级的AI证件照制作算法。
KHOJ
Your AI second brain
AI Dev
AI Develop Framework
- LangChain (python + node.js)
- LlamaIndex (python)
- Haystack (python)
- Phidata (python)
- LlamaIndex
Data Analysis (Chat with CSV)
- LangChain: Chat with a CSV | LangChain Agents Tutorial (Beginners) - YouTube
- PandasAI: Multi-ChatCSV Streamlit App:Analyze Multiple CSV files with PandasAI and OpenAI| Step by Step - YouTube
- Streamlit: Chat with CSV Streamlit Chatbot using Llama 2: All Open Source - YouTube
- DataLine
- PandasAI
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
- GitHub: https://github.com/Sinaptik-AI/pandas-ai
- Video: Multi-ChatCSV Streamlit App:Analyze Multiple CSV files with PandasAI and OpenAI| Step by Step - YouTube
Chat with Dataset
Web Scraper
- Crawlee
A web scraping and browser automation library.
- ScrapeGraphAI
ScrapeGraphAI is a open-source web scraping python library designed to usher in a new era of scraping tools.
- Indices and tables — ScrapeGraphAI documentation
- GitHub: https://github.com/ScrapeGraphAI/Scrapegraph-ai
- Video: Scrape Any Website using llama3+Ollama+ScrapeGraphAI | Fully Local + Free #ai #llm - YouTube
- Crew AI
Crew AI is a collaborative working system designed to enable various artificial intelligence agents to work together as a team, efficiently accomplishing complex tasks. Each agent has a specific role, resembling a team composed of researchers, writers, and planners.
- GitHub: https://github.com/joaomdmoura/crewAI
- 範例:使 AI 自動爬文並生成筆記
- Video: 如何搭建一套Agent系统
- GitHub: Python 程式碼
- Crew AI — your own minions. How I Made AI Assistants Do My Work For… | by Csakash | Medium
LLM API
- OpenAI API
- What is ChatGPT API? - GeeksforGeeks
- [D7] OpenAI API 入門 - 基本提示技巧 - iT 邦幫忙::一起幫忙解決難題,拯救 IT 人的一天 (ithome.com.tw)
- [D8] OpenAI API 入門 - Chat Completion 訊息角色 - iT 邦幫忙::一起幫忙解決難題,拯救 IT 人的一天 (ithome.com.tw)
- Model Parameters in OpenAI API. Let’s break down the… | by Csakash | Medium
- Gemini API
Web UI Framework
- Gradio
Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!
- Streamlit
Streamlit is the UI powering the LLM movement
- Streamlit • A faster way to build and share data apps
- AI talks: ChatGPT assistant via Streamlit
- GitHub: Some Example Codes
AI Memory
- How To Give Your Chatbot More Memory | by Dan Cleary | Medium
- [D8] OpenAI API 入門 - Chat Completion 訊息角色 - iT 邦幫忙::一起幫忙解決難題,拯救 IT 人的一天 (ithome.com.tw)
AI Coding
- VS Code
- CodeGPT - Code like a pro with our AI Copilot!
- Continue
- Codeium - 支援 vim/Neovim 編輯器,使用專有模型與 OpenAI API,個人使用免費
- Cursor
PDF Extractor
- gptpdf - 使用 OpenAI API 提取 PDF 內容,輸出為 Markdown 格式。
- omniparse - PDF to Markdown
- PDF-Extract-Kit - Layout Detection, Formula Detection, Formula Recognition
- Marker - Marker converts PDF to markdown quickly and accurately.
- Mathpix (cloud)
- tabled - 提取表格內容
Responsible AI
- Input Otput Guardrails with llama (ipynb)
- Google SynthID Text
More
- OpenUI - AI自動生成網頁原始碼並即時預覽
- Instructor - Structured outputs powered by llms. Designed for simplicity, transparency, and control.
Learning AI
AI 常見專用名詞
Gen AI (生成式 AI)
人工智慧 (AI) 藉由使用機器學習與環境互動並執行工作來模擬人類行為,而不需明確指示輸出的內容。
生成式 AI 是人工智慧的分支,可根據自然語言輸入來建立新內容。 生成式 AI 通常內建於軟體應用程式中,並使用經過大量文字資料定型的語言模型,以產生人類般的自然語言回應,甚至是原始影像。 這類應用程式的其中一個熱門範例是 ChatGPT,這是 OpenAI 所建立的聊天機器人,這是一家與 Microsoft 緊密合作的 AI 研究公司。
生成式 AI 是由遠超過人類一輩子閱讀量的文字、影像、聲音所訓練而成,但卻缺乏普通人類的價值觀及基本判斷能力。「他」就像是個博學多聞、過目不忘的孩子,卻缺乏生活常識;偶爾胡說八道,又時常過度坦誠,因而需要隨時照料。因此,無論單純利用 AI 產生內容,或是將 AI 包裝為自家服務的公司,都應該特別小心謹慎。
LLM (大型語言模型)
語言模型支援的一般自然語言處理(NLP) 工作包括:
- 文字分析,例如擷取關鍵詞或識別文字中的具名實體。
- 情感分析和意見挖掘,將文字分類為 正面 或 負面。
- 機器翻譯,其中文字會自動從一種語言翻譯到另一種語言。
- 摘要,其中摘要說明大型文字主體的主要重點。
- 對話式 AI 解決方案,例如 聊天機器人 或 數位助理,其中語言模型可以解譯自然語言輸入,並傳回適當的回應。
其他
- Agent (代理/專員): 扮演介於用戶與 AI 之間的中間人,使 LLM 透過外部資源存取、執行指令及管理工作流程來執行更複雜的任務
- Token (詞元): 模型能一次處理運算的文字長度單位
- Tokenizer (分詞器)
- TOPS: AI 性能基礎計算單位,類似遊戲性能的 FPS、磁碟存取性能的 IOPS。
Introduction
- 關於生成式 AI,產品經理(PM)需要知道的 20 個關鍵字 - ALPHA Camp
- 用 AI 提高工程師的生產力,初階、資深與獨立開發者的三種不同做法 - ALPHA Camp
- Prompt Engineering 提示工程是什麼?新手必學指南 - ALPHA Camp
- 我的大型語言模型應用開發 - 學習歷程 - ALPHA Camp
- 看影片學習 AI 知識與最新趨勢,為你精選四個 YouTube 頻道 - ALPHA Camp
- AI工程師是什麼?生成式AI工程師要掌握哪些技能? - ALPHA Camp
Medium Articles
Course/HandBook
Google AI Courses for Free
- Beginner: Introduction to Generative AI Learning Path
- Machine Learning | Resources | Google for Developers
Microsoft
國網中心(NCHC)教學
- 快速了解模型訓練原理:Taichung.py 2024/04/23 Meetup, 在自己的電腦上建立專屬大型語言模型知識庫機器人 - YouTube
- 大型語言模型LLMs介紹與操作教學 20231228 - YouTube
- 大型語言模型-[初階]建立基於RAG方案的專屬私有知識庫教學 - YouTube
- 大型語言模型-[初階]建立基於RAG方案的專屬私有知識庫教學 Q&A - YouTube
LLM Tokenizer 分詞器
PyImageSearch 教學 (英文)
- 1: Harnessing Power at the Edge: An Introduction to Local Large Language Models - PyImageSearch
- 2: Inside Look: Exploring Ollama for On-Device AI - PyImageSearch
- 3: Integrating Local LLM Frameworks: A Deep Dive into LM Studio and AnythingLLM - PyImageSearch
- 4: Exploring Oobabooga Text Generation Web UI: Installation, Features, and Fine-Tuning Llama Model with LoRA - PyImageSearch
AI 各類資源大匯集
- Toolify.ai - 收集各種不同類型的 AI 應用網站目錄與詳細資訊
- 全世界最好的中文LLM资料总结
- llm-course - 收集了大量各項有關 AI 的資源,適合進階開發者來這挖寶。
- 各類 中文 LLM 總整理
- Open Source LLM Tools
資策會
- 生成式AI企業大腦開發指引
- 生成式AI輔助之軟體開發指引
- 2023企業應具備的AI素養-生成式AI導入指引
Open Source MLOps platform
- Pezzo - A fully cloud-native and open-source LLMOps platform. Seamlessly observe and monitor your AI operations, troubleshoot issues, save up to 90% on costs and latency, collaborate and manage your prompts in one place, and instantly deliver AI changes.
- MLflow - Build better models and generative AI apps on a unified, end-to-end,
open source MLOps platform
RedHat AI
Red Hat® Enterprise Linux® AI is a foundation model platform to seamlessly develop, test, and run Granite family large language models (LLMs) for enterprise applications.
- The Granite family of open source-licensed LLMs, distributed under the Apache-2.0 license with complete transparency on training datasets.
- InstructLab model alignment tools, which open the world of community-developed LLMs to a wide range of users.
- A bootable image of Red Hat Enterprise Linux, including popular AI libraries such as PyTorch and hardware optimized inference for NVIDIA, Intel, and AMD.
- Enterprise-grade technical support and model intellectual property indemnification provided by Red Hat.
URLs:
- Red Hat Enterprise Linux AI
- Red Hat Delivers Accessible, Open Source Generative AI Innovation with Red Hat Enterprise Linux AI
InstructLab
Command-line interface. Use this to chat with the model or train the model (training consumes the taxonomy data)
What are the components of the InstructLab project?
- Taxonomy
InstructLab is driven by taxonomies, which are largely created manually and with care. InstructLab contains a taxonomy tree that lets users create models tuned with human-provided data, which is then enhanced with synthetic data generation. - Command-line interface (CLI)
The InstructLab CLI lets contributors test their contributions using their laptop or workstation. Community members can use the InstructLab technique to generate a low-fidelity approximation of synthetic data generation and model-instruction tuning without access to specialized hardware. - Model training infrastructure
Finally, there’s the process of creating the enhanced LLMs. It takes GPU-intensive infrastructure to regularly retrain models based on new contributions from the community. IBM donates and maintains the infrastructure necessary to frequently retrain the InstructLab project’s enhanced models.
How is InstructLab different from retrieval-augmented generation (RAG)?
RAG is a cost-efficient method for supplementing an LLM with domain-specific knowledge that wasn’t part of its pretraining. RAG makes it possible for a chatbot to accurately answer questions related to a specific field or business without retraining the model. Knowledge documents are stored in a vector database, then retrieved in chunks and sent to the model as part of user queries. This is helpful for anyone who wants to add proprietary data to an LLM without giving up control of their information, or who needs an LLM to access timely information.
This is in contrast to the InstructLab method, which sources end-user contributions to support regular builds of an enhanced version of an LLM. InstructLab helps add knowledge and unlock new skills of an LLM.
It’s possible to "supercharge" a RAG process by using the RAG technique on an InstructLab-tuned model.
URLs:
- What is InstructLab? (redhat.com)
- InstructLab Community
- Quick Start Guide
- GitHub: taxonomy
- Docs: taxonomy
- InstructLab Community Collaboration Spaces
Agent
AgentGPT
AgentGPT allows you to configure and deploy Autonomous AI agents. Name your own custom AI and have it embark on any goal imaginable.
Camel AI
CAMEL-AI.org is the 1st LLM multi-agent framework and an open-source community dedicated to finding the scaling law of agents.
Crew AI
Crew AI is a collaborative working system designed to enable various artificial intelligence agents to work together as a team, efficiently accomplishing complex tasks. Each agent has a specific role, resembling a team composed of researchers, writers, and planners.
- GitHub: https://github.com/joaomdmoura/crewAI
- 範例:使 AI 自動爬文並生成筆記
- Video: 如何搭建一套Agent系统
- GitHub: Python 程式碼
- Crew AI — your own minions. How I Made AI Assistants Do My Work For… | by Csakash | Medium
SWE-agent
SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.
AutoGen
Enable Next-Gen Large Language Model Applications
AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
AutoGPT-Code-Ability
AutoGPT's coding ability is an open-source coding assistant powered by AI. The goal is to make software development more accessible to everyone, regardless of skill level or resources. By generating code in Python, a popular and very accessible language, AutoGPT acts as a virtual co-pilot to help users build projects like backends for existing frontends or command-line tools.
AI Cloud Providers
AI Platform
- Fireworks.ai - Fast, Affordable, Customizable Gen AI Platform
- Blog: https://blog.fireworks.ai/
- Docs: https://readme.fireworks.ai/docs/quickstart
- Models: https://fireworks.ai/models
- Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform | by Fireworks.ai | Medium
- Advancing Chatbot Intelligence: Unlocking the Power of Step-Back Prompting | by Csakash | Medium
- Abacus.ai - Chat with all LLMs including GPT-x in just one place and cheaper price than OpenAI.
Data Analysis
Dev Platform
- LightningAI - Code together. Prototype. Train. Deploy. Host AI web apps. From your browser - with zero setup. Alternative to CoLab. 22 Free GPU hours.
Code Review
- CodeRabbit - Cut Code Review Time & Bugs in Half
Monitor AP in developing
- Langsmith
LangChain 提供的雲端服務,可用來作程式除錯與監視後端程序,例如 RAG 的檢索資訊過程。 - AgentOps AI
Replay Analytics and Debugging, LLM Cost Management, Agent Benchmarking - Langtrace
Prompt Engineering
Prompt Engineering - 提示工程
生成式 AI 應用程式傳回的回應品質不僅取決於模型本身,也取決於其所提供的提示類型。 「提示工程」一詞描述提示改善的流程。 設計應用程式的開發人員和使用這些應用程式的取用者,都可以考慮使用提示工程來改善生成式 AI 的回應品質。
提示是我們告知應用程式預期執行操作的方式。 工程師可以利用提示來新增程式的指示。 例如,開發人員可以為教師建置生成式 AI 應用程式,以建立與學生閱讀文字相關的複選問題。 在應用程式開發期間,開發人員可以新增其他規則,定義程式應該根據收到的提示執行哪些操作。
為什麼要使用英文 prompt 而不使用中文提問?
- 英文的訓練語料庫占比超過93%,中文約0.04%,使用英文提問時,它能返回更準確的解答。
- ChatGPT在一次輸入中,最多處理4096 token——超過上限的任何字符都將被忽略而不會顯示訊息。相較之下,英文的token使用量少,使用英文提問時,它能給予更多篇幅的答案(中文消耗的token約是英文兩倍)。詳細的可參考OpenAI token解答,以及使用官方的token計算器
- 返回英文答案後,再使用ChatGPT翻譯成中文就行了。例如:Please write in Traditional Chinese language.
Prompt Fundamentals
- Learning Path to Become a Prompt Engineer (analyticsvidhya.com)
- Microsoft Learn: Prompt engineering techniques
- 提問(Prompt)的藝術:如何引導AI準確回答你的需求. 本文內容難度: ★ ☆ ☆ ☆ ☆ | by Simon Liu | InfuseAI
- 提示工程指南 | Prompt Engineering Guide (promptingguide.ai)
- gemini-for-google-workspace-prompting-guide-101.pdf
Prompt Tips
- Prompt library (anthropic.com)
- Prompt 高阶 | 链与框架
- ChatGPT Prompt : 各職業詠唱大全(超過100個案例) | Domyweb多米
- LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt (github.com)
- ⭐LangGPT 结构化提示词 - 飛書雲端文件 (feishu.cn)
- 世界上覆盖最全的优秀Qwen提示语大全
提示詞框架
- 你要牢记的四个常用AI提示词框架:ICIO、CRISPE、BROKE、RASCEF,有助于获取更加稳定和高质量的内容-阿里云开发者社区 (aliyun.com)
- Prompt工程全攻略:15+Prompt框架一网打尽(BROKE、COAST、LangGPT)、学会提示词让大模型更高效-腾讯云开发者社区-腾讯云 (tencent.com)
Prompt Dev
- Llama 2 Prompt Engineering — Extracting Information From Articles Examples | by Eliran Boraks | Medium
- Advancing Chatbot Intelligence: Unlocking the Power of Step-Back Prompting | by Csakash | Medium
Prompt Generator
Coding AI
Instruct
# Prompt
You are an expert programmer that writes simple, concise code and explanations. Write a python function to generate the nth fibonacci number.
Response:
Here is a Python function that generates the nth Fibonacci number:
def fib(n):
if n <= 1:
return n
else:
return fib(n-1) + fib(n-2)
This function uses the recursive formula for the Fibonacci sequence, which is:
fib(n) = fib(n-1) + fib(n-2)
Code Review
# Prompt
Where is the bug in this code?
def fib(n):
if n <= 0:
return n
else:
return fib(n-1) + fib(n-2)
Response:
The bug in this code is that it does not handle the case where `n` is equal to 1.
Unit Tests
# Prompt
write a unit test for this function: $(cat fib.py)
Response:
Here's an example of how you could write a unit test for the `fib` function:
```
import unittest
class TestFib(unittest.TestCase):
def test_fib(self):
self.assertEqual(fib(0), 0)
self.assertEqual(fib(1), 1)
self.assertEqual(fib(2), 1)
self.assertEqual(fib(3), 2)
self.assertEqual(fib(4), 3)
self.assertEqual(fib(5), 5)
self.assertEqual(fib(6), 8)
self.assertEqual(fib(7), 13)
self.assertEqual(fib(8), 21)
self.assertEqual(fib(9), 34)
if __name__ == '__main__':
unittest.main()
```
This test case covers all the cases where the input is an integer between 0 and 9, inclusive.
Code Completion
# Prompt
# A simple python function to remove whitespace from a string:
Response:
def remove_whitespace(s):
return ''.join(s.split())
Infill: <PRE> {prefix} <SUF>{suffix} <MID>
# Prompt
<PRE> def compute_gcd(x, y): <SUF>return result <MID>
Response:
if x == y:
return x
if x > y:
x = x - y
else:
y = y - x
result = compute_gcd(x, y)
CURSOR Prompt
CURSOR_CHAT_PROMPT = '''
System: You are an intelligent programmer, powered by GPT-4. You are happy to help answer any questions that the user has (usually they will be about coding).
1. Please keep your response as concise as possible, and avoid being too verbose.
2. When the user is asking for edits to their code, please output a simplified version of the code block that highlights the changes necessary and adds comments to indicate where unchanged code has been skipped. For example:
```file_path
// ... existing code ...
{{ edit_1 }}
// ... existing code ...
{{ edit_2 }}
// ... existing code ...
```
The user can see the entire file, so they prefer to only read the updates to the code. Often this will mean that the start/end of the file will be skipped, but that's okay! Rewrite the entire file only if specifically requested. Always provide a brief explanation of the updates, unless the user specifically requests only the code.
3. Do not lie or make up facts.
4. If a user messages you in a foreign language, please respond in that language.
5. Format your response in markdown.
6. When writing out new code blocks, please specify the language ID after the initial backticks, like so:
```python
{{ code }}
```
7. When writing out code blocks for an existing file, please also specify the file path after the initial backticks and restate the method / class your codeblock belongs to, like so:
```typescript:app/components/Ref.tsx
function AIChatHistory() {{
...
{{ code }}
...
}}
```
User: Please also follow these instructions in all of your responses if relevant to my query. No need to acknowledge these instructions directly in your response.
<custom_instructions>
Respond the code block in English!!!! this is important.
</custom_instructions>
## Current File
Here is the file I'm looking at. It might be truncated from above and below and, if so, is centered around my cursor.
```{file_path}
{file_contents}
```
{user_message}
'''
CURSOR_REWRITE_PROMPT = '''
System: You are an intelligent programmer. You are helping a colleague rewrite a piece of code.
Your colleague is going to give you a file and a selection to edit, along with a set of instructions. Please rewrite the selected code according to their instructions.
Think carefully and critically about the rewrite that best follows their instructions.
The user has requested that the following rules always be followed. Note that only some of them may be relevant to this request:
## Custom Rules
Respond the code block in English!!!! this is important.
User: First, I will give you some potentially helpful context about my code.
Then, I will show you the selection and give you the instruction. The selection will be in `{file_path}`.
-------
## Potentially helpful context
#### file_context_4
{file_context_4}
#### file_context_3
{file_context_3}
#### file_context_2
{file_context_2}
#### file_context_1
{file_context_1}
#### file_context_0
{file_context_0}
This is my current file. The selection will be denoted by comments "Start of Selection" and "End of Selection":
```{file_path}
# Start of Selection
{code_to_rewrite}
# End of Selection
Please rewrite the selected code according to the instructions.
Remember to only rewrite the code in the selection.
Please format your output as:
```
# Start of Selection
# INSERT_YOUR_REWRITE_HERE
# End of Selection
Immediately start your response with
```
'''
For RAG
Rewrite User's Question
Given the following conversation, rewrite the last user input to reflect what the user is actually asking.
User: When was the last time John Doe bought something from us?
AI: John last bought a Fruity Fedora hat from us two weeks ago, on January 3, 2021.
User: How about Emily Doe?
給以下對話,重寫最後一個用戶輸入以反映用實際在問什麼。
{歷史對話}
Responsible AI
Task: Check if there is unsafe content in '$agent_type' messages in conversations according our safety policy with the below categories.
<BEGIN UNSAFE CONTENT CATEGORIES>
O1: Violence and Hate.
Should not
- Help people plan or engage in violence.
- Encourage people to engage in violence.
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
Can
- Provide information on violence and discrimination.
- Discuss the topics of hate and violence, and can discuss historical events involving violence.
O2: Sexual Content.
Should not
- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
Can
- Discuss topics related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
O3: Criminal Planning.
Should not
- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in criminal activities or condone criminal behavior.
Can
- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
O4: Guns and Illegal Weapons.
Should not
- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
Can
- Discuss firearms and the arguments for and against firearm ownership.
O5: Self-Harm.
Should not
- Encourage people to harm themselves.
- Romanticize or condone self-harm or suicide.
- Provide information regarding the methods of suicide or self-harm.
- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
Can
- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
Should
- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
<END UNSAFE CONTENT CATEGORIES>
<BEGIN CONVERSATION>
$prompt
<END CONVERSATION>
Provide your safety assessment for $agent_type in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories.
CoT - Chain of Thought
我要AI幫我{XXXX},請幫我用Chain-of-Thought來寫AI提示詞。
我要AI幫我從英文翻譯成中文(先直譯,再反思,再意譯),請幫我用Chain-of-Thought來寫AI提示詞。
我希望翻譯的英文句子如下,請使用上面的提示詞翻譯。
Cheat Sheet
Function Calling
- 數據處理:LLM 可以呼叫外部函數對數據進行處理和分析,例如數據清理、資料轉換等。
- API 整合:LLM 可以呼叫外部API,例如天氣API、地圖API等,獲取所需的資訊。
- 計算任務:LLM 可以呼叫外部函數執行複雜的計算任務,例如科學計算、機器學習等。
- 自定義功能:LLM 可以呼叫外部函數實現自定義的功能,例如特殊的算法、business logic 等。
Tutorials
Models
Python Coding
LLM Model API
LMStudio
from langchain.llms import OpenAI
#set llm for langchain using model from lmstudio
llm = OpenAI(
openai_api_base='http://localhost:1234/v1',
openai_api_key='NULL'
)
import streamlit as st
from openai import OpenAI
# Set up the Streamlit App
st.title("ChatGPT Clone using Llama-3 🦙")
st.caption("Chat with locally hosted Llama-3 using the LM Studio 💯")
# Point to the local server setup using LM Studio
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
# Initialize the chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Display the chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Accept user input
if prompt := st.chat_input("What is up?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat message container
with st.chat_message("user"):
st.markdown(prompt)
# Generate response
response = client.chat.completions.create(
model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
messages=st.session_state.messages, temperature=0.7
)
# Add assistant response to chat history
st.session_state.messages.append({"role": "assistant", "content": response.choices[0].message.content})
# Display assistant response in chat message container
with st.chat_message("assistant"):
st.markdown(response.choices[0].message.content)
GPT
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# api_key="...",
# base_url="...",
# organization="...",
# other params...
)
Ollama
from langchain_community.llms import Ollama
llm = Ollama(model="llama2:13b")
llm.invoke("The first man on the moon was ... think step by step")
Chunking/Splitting
中文句子切割
# Unicode 編碼
# \u3002 全形句號
# \uff0c 全形逗號
# Get Unicode for specific character
# >>> ','.encode('unicode-escape') # for py3
# >>> list(u',') # for py2
import re
text = "這是中文句子。第一段,第二段,第三段。"
chunks = re.split('[\u3002\uff0c]', text)
#print("\n\n".join([chunk for chunk in chunks]))
for chunk in chunks:
print("---" * 10)
print(chunk)
英文句子切割
# \s+ 單或多個空白
chunks = re.split(r'(?<=[.?!])\s+', text)
LLM Engine
A software that can load the LLM Models
Open WebUI
A Web UI Tool for Ollama
URLs
- https://openwebui.com/
- GitHub: https://github.com/open-webui/open-webui
- Docs: https://docs.openwebui.com/
Installation
Installing Both Open WebUI and Ollama Together:
# With GPU Support
docker run -d -p 3000:8080 --gpus=all \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:ollama
# For CPU only
docker run -d -p 3000:8080 \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:ollama
Kuwa Gen AI OS
一個自由、開放、安全且注重隱私的生成式人工智慧服務系統,包括友善的大語言模型使用介面,以及能支援生成式人工智慧應用的新型GenAI核心。
- 🌐 提供多語言GenAI開發與部署的整體解決方案,支援Windows及Linux
- 💬 提供群聊、引用、完整 Prompt 列表的匯入/匯出/分享等友善使用功能
- 🔄 可靈活組合 Prompt x RAGs x Bot x 模型 x 硬體/GPUs以滿足應用所需
- 💻 支援從虛擬主機、筆記型電腦、個人電腦、地端伺服器到公私雲端的各種環境
- 🔓 開放原始碼,允許開發人員貢獻並根據自己的需求打造自己的客製系統
URLs
AnythingLLM
The ultimate AI business intelligence tool. Any LLM, any document, full control, full privacy.
AnythingLLM is a "single-player" (單機個人)application you can install on any Mac, Windows, or Linux operating system and get local LLMs, RAG, and Agents with little to zero configuration and full privacy.
AnythingLLM 也有自架網站版,見文章下方的連結。
You can install AnythingLLM as a Desktop Application, Self Host it locally using Docker and Host it on cloud (aws, google cloud, railway etc..) using Docker
You want AnythingLLM Desktop if...
- You want a one-click installable app to use local LLMs, RAG, and Agents locally
- You do not need multi-user support
- Everything needs to stay only on your device
- You do not need to "publish" anything to the public internet. Eg: Chat widget for website
URLs
Ollama
Run Llama 3, Phi 3, Mistral, Gemma, and other models. Customize and create your own.
- https://ollama.com/
- GitHub: https://github.com/ollama/ollama
- Doc: https://github.com/ollama/ollama/tree/main/docs
- Video: 離線不怕隱私外洩!免費開源 AI 助手 Ollama 從安裝到微調,一支影片通通搞定! - YouTube
Installation
ollama + open webui
mkdir ollama-data download open-webui-data
docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
ports:
- 11434:11434
volumes:
- ./ollama-data:/root/.ollama
- ./download:/download
container_name: ollama
pull_policy: always
tty: true
restart: always
networks:
- ollama-docker
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- ./open-webui-data:/app/backend/data
depends_on:
- ollama
ports:
- 3000:8080
environment:
- 'OLLAMA_BASE_URL=http://ollama:11434'
extra_hosts:
- host.docker.internal:host-gateway
restart: unless-stopped
networks:
- ollama-docker
networks:
ollama-docker:
external: false
ollama
mkdir ollama-data download
docker run --name ollama -d --rm \
-v $PWD/ollama-data:/root/.ollama \
-v $PWD/download:/download \
-p 11434:11434 \
ollama/ollama
Models
List Models Installed
ollama list
Load a GGUF model manually
ollama create <my-model-name> -f <modelfile>
Page Assist
Page Assist is an open-source Chrome Extension that provides a Sidebar and Web UI for your Local AI model.
LM Studio
Discover, download, and run local LLMs.
With LM Studio, you can ...
URLs
OpenLLM
OpenLLM helps developers run any open-source LLMs, such as Llama 2 and Mistral, as OpenAI-compatible API endpoints, locally and in the cloud, optimized for serving throughput and production deployment.
- GitHub: https://github.com/bentoml/OpenLLM
- CoLab: https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb
Install
Recommend using a Python Virtual Environment
pip install openllm
Start a LLM Server
openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code
To interact with the server, you can visit the web UI at http://localhost:3000/ or send a request using curl. You can also use OpenLLM’s built-in Python client to interact with the server:
import openllm
client = openllm.HTTPClient('http://localhost:3000')
client.generate('Explain to me the difference between "further" and "farther"')
OpenAI Compatible Endpoints
import openai
client = openai.OpenAI(base_url='http://localhost:3000/v1', api_key='na') # Here the server is running on 0.0.0.0:3000
completions = client.chat.completions.create(
prompt='Write me a tag line for an ice cream shop.', model=model, max_tokens=64, stream=stream
)
LangChain
from langchain.llms import OpenLLMAPI
llm = OpenLLMAPI(server_url='http://44.23.123.1:3000')
llm.invoke('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')
# streaming
for it in llm.stream('What is the difference between a duck and a goose? And why there are so many Goose in Canada?'):
print(it, flush=True, end='')
# async context
await llm.ainvoke('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')
# async streaming
async for it in llm.astream('What is the difference between a duck and a goose? And why there are so many Goose in Canada?'):
print(it, flush=True, end='')
Bechmark
Benchmark for LLM engines
bench.py
- ollama 支持并发之后和 vllm 相比性能如何?我们测测看_ollama vllm-CSDN博客
- YT: ollama vs vllm - 开启并发之后的 ollama 和 vllm 相比怎么样? - YouTube
import aiohttp
import asyncio
import time
from tqdm import tqdm
import random
questions = [
"Why is the sky blue?", "Why do we dream?", "Why is the ocean salty?", "Why do leaves change color?",
"Why do birds sing?", "Why do we have seasons?", "Why do stars twinkle?", "Why do we yawn?",
"Why is the sun hot?", "Why do cats purr?", "Why do dogs bark?", "Why do fish swim?",
"Why do we have fingerprints?", "Why do we sneeze?", "Why do we have eyebrows?", "Why do we have hair?",
"Why do we have nails?", "Why do we have teeth?", "Why do we have bones?", "Why do we have muscles?",
"Why do we have blood?", "Why do we have a heart?", "Why do we have lungs?", "Why do we have a brain?",
"Why do we have skin?", "Why do we have ears?", "Why do we have eyes?", "Why do we have a nose?",
"Why do we have a mouth?", "Why do we have a tongue?", "Why do we have a stomach?", "Why do we have intestines?",
"Why do we have a liver?", "Why do we have kidneys?", "Why do we have a bladder?", "Why do we have a pancreas?",
"Why do we have a spleen?", "Why do we have a gallbladder?", "Why do we have a thyroid?", "Why do we have adrenal glands?",
"Why do we have a pituitary gland?", "Why do we have a hypothalamus?", "Why do we have a thymus?", "Why do we have lymph nodes?",
"Why do we have a spinal cord?", "Why do we have nerves?", "Why do we have a circulatory system?", "Why do we have a respiratory system?",
"Why do we have a digestive system?", "Why do we have an immune system?"
]
async def fetch(session, url):
"""
参数:
session (aiohttp.ClientSession): 用于请求的会话。
url (str): 要发送请求的 URL。
返回:
tuple: 包含完成 token 数量和请求时间。
"""
start_time = time.time()
# 随机选择一个问题
question = random.choice(questions) # <--- 这两个必须注释一个
# 固定问题
# question = questions[0] # <--- 这两个必须注释一个
# 请求的内容
json_payload = {
"model": "llama3:8b-instruct-fp16",
"messages": [{"role": "user", "content": question}],
"stream": False,
"temperature": 0.7 # 参数使用 0.7 保证每次的结果略有区别
}
async with session.post(url, json=json_payload) as response:
response_json = await response.json()
end_time = time.time()
request_time = end_time - start_time
completion_tokens = response_json['usage']['completion_tokens'] # 从返回的参数里获取生成的 token 的数量
return completion_tokens, request_time
async def bound_fetch(sem, session, url, pbar):
# 使用信号量 sem 来限制并发请求的数量,确保不会超过最大并发请求数
async with sem:
result = await fetch(session, url)
pbar.update(1)
return result
async def run(load_url, max_concurrent_requests, total_requests):
"""
通过发送多个并发请求来运行基准测试。
参数:
load_url (str): 要发送请求的URL。
max_concurrent_requests (int): 最大并发请求数。
total_requests (int): 要发送的总请求数。
返回:
tuple: 包含完成 token 总数列表和响应时间列表。
"""
# 创建 Semaphore 来限制并发请求的数量
sem = asyncio.Semaphore(max_concurrent_requests)
# 创建一个异步的HTTP会话
async with aiohttp.ClientSession() as session:
tasks = []
# 创建一个进度条来可视化请求的进度
with tqdm(total=total_requests) as pbar:
# 循环创建任务,直到达到总请求数
for _ in range(total_requests):
# 为每个请求创建一个任务,确保它遵守信号量的限制
task = asyncio.ensure_future(bound_fetch(sem, session, load_url, pbar))
tasks.append(task) # 将任务添加到任务列表中
# 等待所有任务完成并收集它们的结果
results = await asyncio.gather(*tasks)
# 计算所有结果中的完成token总数
completion_tokens = sum(result[0] for result in results)
# 从所有结果中提取响应时间
response_times = [result[1] for result in results]
# 返回完成token的总数和响应时间的列表
return completion_tokens, response_times
if __name__ == '__main__':
import sys
if len(sys.argv) != 3:
print("Usage: python bench.py <C> <N>")
sys.exit(1)
C = int(sys.argv[1]) # 最大并发数
N = int(sys.argv[2]) # 请求总数
# vllm 和 ollama 都兼容了 openai 的 api 让测试变得更简单了
url = 'http://localhost:11434/v1/chat/completions'
start_time = time.time()
completion_tokens, response_times = asyncio.run(run(url, C, N))
end_time = time.time()
# 计算总时间
total_time = end_time - start_time
# 计算每个请求的平均时间
avg_time_per_request = sum(response_times) / len(response_times)
# 计算每秒生成的 token 数量
tokens_per_second = completion_tokens / total_time
print(f'Performance Results:')
print(f' Total requests : {N}')
print(f' Max concurrent requests : {C}')
print(f' Total time : {total_time:.2f} seconds')
print(f' Average time per request : {avg_time_per_request:.2f} seconds')
print(f' Tokens per second : {tokens_per_second:.2f}')
More
LocalAI
LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.
OpenAI Proxy
Proxy Server to call 100+ LLMs in a unified interface & track spend, set budgets per virtual key/user
Features:
- Unified Interface: Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
- Cost tracking: Authentication, Spend Tracking & Budgets Virtual Keys
- Load Balancing: between Multiple Models + Deployments of the same model - LiteLLM proxy can handle 1.5k+ requests/second during load tests.
企業在導入 LLM 時,可能會用到多種不同的模型,這些包含商用授權與開源授權以及來自不同的服務商。為了統一管理及開發應用這些各類不同模型,建議使用 OpenAI Proxy 這個平台來解決,以達到下列目的:
- 統一 API 介接入口與格式
- 成本追蹤
- 平衡負載
Xinference
Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications.
NVIDIA NIM
Explore the latest community-built AI models with an API optimized and accelerated by NVIDIA, then deploy anywhere with NVIDIA NIM inference microservices.
- NVIDIA NIM for Deploying Generative AI | NVIDIA
- Doc: Introduction - NVIDIA Docs
- Models: google / gemma-7b
- YT: Self-Host and Deploy Local LLAMA-3 with NIMs - YouTube
text-generation-webui
A Gradio web UI for Large Language Models.
只能執行本地模型,不支援外部模型 API。
支援以下多重功能的 AI 平台
- Chat
- Fine-Tune Model
- Multiple model backends: Transformers, llama.cpp (through llama-cpp-python), ExLlamaV2, AutoGPTQ, AutoAWQ, GPTQ-for-LLaMa, QuIP#.
- OpenAI-compatible API server with Chat and Completions endpoints
教學
- GitHub: https://github.com/oobabooga/text-generation-webui
- GitHub: https://github.com/Atinoda/text-generation-webui-docker
- 大型語言模型LLMs課程教學 課程大綱 (三) - HackMD
- YOUTUBE [啟動 TextGen]
- YOUTUBE [上架大型語言模型]
- YOUTUBE [指派AI人設]
- YOUTUBE [模型微調]
- YOUTUBE [上架微調模型]
- 程式碼 Z01_TextGen_Colab.ipynb
- 預設密碼在程式碼裡面 (account:nchc password:nchc) 請自行修改
AI Translator
使用 LLM 實現語言翻譯
PDFMathTranslate
完整保留排版的 PDF 檔案全文雙語翻譯,支援 Google/DeepL/Ollama/OpenAI 翻譯。
LiteLLM + 反思提示 + 工作流
Translation Agent
RTranslator
RTranslator is an (almost) open-source, free, and offline real-time translation app for Android.
沉浸式翻譯
一款免費的,好用的,沒有廢話的,革命性的,飽受讚譽的,AI 驅動的雙語網頁翻譯擴展,幫助你有效地打破資訊差,在手機上也可以用!
pyVideoTrans视频翻译配音
一键字幕生成+字幕翻译+创建配音+合成 = 带字幕和配音的新视频
bilingual_book_maker
電子書翻譯
Jupyter Notebook
Installation
With pip
pip install notebook
Python Virtual Environment
With Python Venv
mkdir my-rag
cd my-rag
python -m venv .venv
source .venv/bin/activate
(my-rag)> pip install --upgrade pip
(my-rag)> pip install notebook
(my-rag)> jupyter notebook
With Conda
conda create -n my-rag python=3.10
conda activate my-rag
(my-rag)> pip install --upgrade pip
(my-rag)> pip install notebook
(my-rag)> jupyter notebook
UI 可切換不同虛擬環境(需要先建立不同的 ipykernel)
mkdir my-rag
cd my-rag
python -m venv .venv
source .venv/bin/activate
(my-rag)> pip install --upgrade pip
(my-rag)> pip install ipykernel
(my-rag)> ipython kernel install --user --name="my-rag-kernel"
(my-rag)> jupyter notebook
Resources
- Online Test
- nbviewer - A simple way to share Jupyter Notebooks
CoLab by Google
- Google Colab is Jupyter Notebooks that are hosted by Google’s Colaboratory
- Overview of Colaboratory Features
- Installing and using Python libraries in Colab
- Using Google Colab with GitHub
- Google Colab Tips for Power Users
LangChain
LangChain 是一個旨在為開發者提供一套工具和程式介接,以便更容易、更有效地利用大型語言模型(LLM)的開源開發框架,專注於情境感知和推理。它包含多個組件,如 Python 和 JavaScript 的函式庫、快速部署的模板、用於開發REST API的 LangServe,以及用於除錯和監控的 LangSmith。LangChain 簡化了開發、生產和部署過程,提供與語言模型互動、執行檢索策略和輔助建立複雜應用架構的工具。
- Introduction | 🦜️🔗 LangChain
- LangChain是什麼?AI開發者必須了解的LLM開源框架 - ALPHA Camp
- GitHub: https://github.com/langchain-ai/langchain
- Hub: LangSmith (langchain.com)
- 教學:sugarforever/wtf-langchain
- CookBook
- LangChain Templates
LangSmith
LangChain 提供的雲端服務,可用來作程式除錯與監視後端程序,例如 RAG 的檢索資訊過程。
- https://github.com/langchain-ai/langsmith-cookbook
- LangChain 怎麼玩?用 LangSmith 幫忙追查問題 - MyApollo
- 深入LangSmith:如何帮助大模型(LLM)应用从原型到投入生产?【上】 - 文章 - 开发者社区 - 火山引擎
RAG
- Learn RAG with Langchain (ipynb)
- LangChain: A Complete Guide & Tutorial (nanonets.com)
- Meta-Llama CookBook for RAG (ipynb)
- LangChain and Streamlit RAG | Medium
Retrievers in LCEL
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
chain.invoke("What did the president say about technology?")
ChatPromptTemplate
few_shot_examples = [
{"input":"Could you please clarify the terms outlined in section 3.2 of the contract?",
"output":"Certainly, I will provide clarification on the terms in section 3.2."},
{"input":"We are interested in extending the payment deadline to 30 days instead of the current 15 days. Additionally, we would like to add a clause regarding late payment penalties.",
"output":"Our request is to extend the payment deadline to 30 days and include a clause on late payment penalties."},
{"input":"""The current indemnification clause seems too broad. We would like to narrow it down to cover only direct damages and exclude consequential damages.
Additionally, we propose including a dispute resolution clause specifying arbitration as the preferred method of resolving disputes.""",
"output":"""We suggest revising the indemnification clause to limit it to covering direct damages and excluding consequential damages.
Furthermore, we recommend adding a dispute resolution clause that specifies arbitration as the preferred method of resolving disputes."""},
{"input":"I believe the proposed changes are acceptable.",
"output":"Thank you for your feedback. I will proceed with implementing the proposed changes."}
]
few_shot_template = ChatPromptTemplate.from_messages(
[
("human", "{input}"),
("ai", "{output}")
]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
example_prompt=few_shot_template,
examples=few_shot_examples,
)
print(few_shot_prompt.format())
Loader
Web
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
Text
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("../", glob="**/*.md")
docs = loader.load()
len(docs)
print(docs[0].page_content[:100])
from langchain.document_loaders import TextLoader
dataset_folder_path='/path/to/dataset/'
documents=[]
for file in os.listdir(dataset_folder_path):
loader=TextLoader(dataset_folder_path+file)
documents.extend(loader.load())
print(documents[:3])
Markdown
'''
%pip install "unstructured[md]"
'''
from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)
data = loader.load()
assert len(data) == 1
readme_content = data[0].page_content
print(readme_content[:3])
PDF + Text
from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders import PyPDFLoader
documents = []
for filename in SAMPLEDATA:
path = os.path.join(os.getcwd(), filename)
if filename.endswith(".pdf"):
loader = PyPDFLoader(path)
new_docs = loader.load_and_split()
print(f"Processed pdf file: {filename}")
elif filename.endswith(".txt"):
loader = TextLoader(path)
new_docs = loader.load_and_split()
print(f"Processed txt file: {filename}")
else:
print(f"Unsupported file type: {filename}")
if len(new_docs) > 0:
documents.extend(new_docs)
SAMPLEDATA = []
print(f"\nProcessing done.")
常用函式
格式化輸出
# Helper function for printing docs
def pretty_print_docs(docs):
print(
f"\n{'-' * 100}\n".join(
[f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
)
)
Finance AI
OpenBB
Investment research made easy with AI.
- Investment Research | OpenBB
- GitHub: openbb-agents
- GitHub: OpenBB Platform
StockBot
FinGPT
- AI4Finance-Foundation.org - FinGPT, FinRobot, FinRL, AI Agent, FinLLMs, Open-Source Libraries
- https://github.com/AI4Finance-Foundation/FinGPT
Semantic Kernel
Semantic Kernel 是一款由微軟開放的輕量級的 AI 開發套件(框架),可讓您輕鬆建立 AI 代理,並將最新的 AI 模型整合到您的 C#、Python 或 Java 程式碼庫中。它可作為有效率的中介軟體,讓您快速交付企業級解決方案。
微軟教學:
- Introduction to Semantic Kernel | Microsoft Learn
- [Video][英文] 在 .NET 即時上 - 超越點選:釋放 Microsoft 語意核心的強大功能
- GitHub: https://github.com/microsoft/semantic-kernel
中文教學: