OpenLLM

OpenLLM helps developers run any open-source LLMs, such as Llama 2 and Mistral, as OpenAI-compatible API endpoints, locally and in the cloud, optimized for serving throughput and production deployment.

GitHub: https://github.com/bentoml/OpenLLM
CoLab: https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb

Install

pip install openllm

Start a LLM Server

openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

To interact with the server, you can visit the web UI at http://localhost:3000/ or send a request using curl. You can also use OpenLLM’s built-in Python client to interact with the server:

import openllm

client = openllm.HTTPClient('http://localhost:3000')
client.generate('Explain to me the difference between "further" and "farther"')