Ollama

Run Llama 3, Phi 3, Mistral, Gemma, and other models. Customize and create your own.

Installation

ollama + open webui

mkdir ollama-data download open-webui-data

docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - 11434:11434
    volumes:
      - ./ollama-data:/root/.ollama
      - ./download:/download
    container_name: ollama
    pull_policy: always
    tty: true
    restart: always
    networks:
      - ollama-docker

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - ./open-webui-data:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped
    networks:
      - ollama-docker

networks:
  ollama-docker:
    external: false

ollama

mkdir ollama-data download

docker run --name ollama -d --rm \
    -v $PWD/ollama-data:/root/.ollama \
    -v $PWD/download:/download \
    -p 11434:11434 \
    ollama/ollama

K8s Deployment

Enable GPU Support in Kubernetes: Complete Guide

1. ollama-pv.yaml :

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ollama-pv
  labels:
    type: local
spec:
  storageClassName: local-path
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/k8svol/ollama"

2. ollama-pvc.yaml :

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
  namespace: ollama
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  storageClassName: local-path

3. ollama-deployment.yaml :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
        - name: ollama
          image: ollama/ollama:latest
          env:
            - name: OLLAMA_HOST
              value: 0.0.0.0:11434
          ports:
            - name: http
              containerPort: 11434
              protocol: TCP
          volumeMounts:
            - name: ollama-data
              mountPath: /root/.ollama
      volumes:
        - name: ollama-data
          persistentVolumeClaim:
            claimName: ollama-pvc

4. ollama-svc.yaml :

apiVersion: v1
kind: Service
metadata:
  name: ollama-service
  namespace: ollama
spec:
  selector:
    app: ollama
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
  type: ClusterIP

Testing with curl

curl -s http://<NODE_IP>:<nodeport>/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?"
}' | jq -r '.response' | tr -d '\n'

Models

List Models Installed

ollama list

Load a GGUF model manually

ollama create <my-model-name> -f <modelfile>

Page Assist

Page Assist is an open-source Chrome Extension that provides a Sidebar and Web UI for your Local AI model.

Video: This Chrome Extension Surprised Me - YouTube