OpenLLM
OpenLLM helps developers run any open-source LLMs, such as Llama 2 and Mistral, as OpenAI-compatible API endpoints, locally and in the cloud, optimized for serving throughput and production deployment.
- GitHub: https://github.com/bentoml/OpenLLM
- CoLab: https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb
Install
Recommend using a Python Virtual Environment
pip install openllm
Start a LLM Server
openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code
To interact with the server, you can visit the web UI at http://localhost:3000/ or send a request using curl. You can also use OpenLLM’s built-in Python client to interact with the server:
import openllm
client = openllm.HTTPClient('http://localhost:3000')
client.generate('Explain to me the difference between "further" and "farther"')