Run Your Own ChatGPT with Open WebUI and Ollama

ChatGPT works well until you remember that every prompt you send sits on someone else's servers, your conversations feed into training data, and your API bill climbs with every request.

Open WebUI and Ollama give you the same experience, a polished chat interface with multiple AI models, running entirely on your own server. No API costs, no data leaving your network, no rate limits.

Open WebUI is the interface (129k+ stars on GitHub). Ollama is the engine that runs the models. Together they deploy in 5 minutes with Docker Compose, and the result feels like a real product, not a weekend hack.

This guide walks through the complete setup on a Linux server.

What you'll need

A Linux server (Ubuntu 22.04/24.04, Debian 12, or similar)
At least 8 GB RAM to run 7-8B parameter models on CPU
4+ CPU cores recommended
20 GB storage minimum (models weigh 4-8 GB each)
Docker Engine 25+ with the docker compose plugin

For CPU inference with small models (Llama 3.1 8B, Mistral 7B), a VPS with 4 cores and 16 GB RAM gets the job done. Responses are slower than ChatGPT but perfectly usable for personal use or a small team. For faster inference, a dedicated server with a GPU like an RTX 4090 or A100 changes everything.

Step 1, Install Docker

If Docker isn't already on the box:

bash

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

sudo usermod -aG docker $USER

Log out and back in, then verify:

bash

docker compose version

Step 2, Create the Docker Compose file

Make a directory and set up the compose file:

bash

mkdir -p /opt/openwebui && cd /opt/openwebui

Create docker-compose.yml:

yaml

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    tty: true

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - openwebui_data:/app/backend/data
    depends_on:
      - ollama
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=your_secret_key_here
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  ollama_data:
  openwebui_data:

Generate a secret key:

bash

openssl rand -hex 32

Replace your_secret_key_here with the generated value.

Step 3, Start the stack

bash

docker compose up -d

Watch the logs:

bash

docker compose logs -f open-webui

Wait until you see the server is ready on port 8080. Then open:

http://your-server-ip:3000

The first user to register becomes the admin. Create your account right away.

Step 4, Download your first model

Open WebUI lets you pull models directly from the interface, but the command line is faster:

bash

docker exec -it ollama ollama pull llama3.1:8b

This downloads Llama 3.1 8B (around 4.7 GB). A few other popular models worth trying:

bash

# Fast and capable general model
docker exec -it ollama ollama pull mistral:7b

# Google's compact model
docker exec -it ollama ollama pull gemma2:9b

# Coding-focused model
docker exec -it ollama ollama pull qwen2.5-coder:7b

# Small and fast for quick tasks
docker exec -it ollama ollama pull phi4-mini

Each model takes 3-8 GB of disk space. You can download as many as your storage allows. Ollama loads and unloads them from memory as needed.

Step 5, Start chatting

Go back to the Open WebUI interface. Pick a model from the dropdown at the top, type a message, and you have your own private ChatGPT.

What you get out of the box:

Multiple models, switch between Llama, Mistral, Gemma, and others in one click
Conversation history, all chats saved locally on your server
Multi-user support, create accounts for your team, each with their own history
RAG (document chat), upload PDFs and ask questions about their content
System prompts, customize how each model behaves
Model presets, save temperature, context length, and other settings per model
Dark mode, obviously

Step 6, Set up HTTPS for remote access

If you want to reach your AI from outside your network:

bash

sudo apt install -y nginx certbot python3-certbot-nginx

Create /etc/nginx/sites-available/openwebui:

nginx

server {
    listen 80;
    server_name ai.yourdomain.com;

    client_max_body_size 100M;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (needed for streaming responses)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Enable and get SSL:

bash

sudo ln -s /etc/nginx/sites-available/openwebui /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo certbot --nginx -d ai.yourdomain.com

The WebSocket headers matter. Without them, streaming responses (where text appears word by word) won't work through the proxy.

Step 7, Connect cloud APIs (optional)

Open WebUI isn't limited to local models. You can also connect it to cloud APIs.

In the admin panel, go to Settings → Connections and add your API keys for:

OpenAI: GPT-4o, GPT-4 Turbo
Anthropic: Claude
Any OpenAI-compatible API: Groq, Together AI, Fireworks, etc.

This turns Open WebUI into a unified interface for all your AI models, local and cloud, in one place. Use local models for daily tasks (free, private) and switch to cloud models when you need maximum capability.

How much RAM do you actually need?

The biggest question with self-hosted LLMs is always RAM. Here's a practical guide:

Model	Parameters	RAM needed (CPU)	Quality
Phi-4 Mini	3.8B	4 GB	Good for quick tasks
Mistral 7B	7B	8 GB	Strong general use
Llama 3.1 8B	8B	8 GB	Excellent all-rounder
Gemma 2 9B	9B	10 GB	Google's best compact model
Qwen 2.5 Coder 7B	7B	8 GB	Best for coding tasks
Llama 3.1 70B	70B	48 GB+	Near GPT-4 quality (needs GPU)

For CPU inference, 16 GB RAM gives you comfortable headroom for 7-8B models. Responses take 2-5 seconds to start, slower than ChatGPT, but completely private and free.

With a GPU (RTX 4090, 24 GB VRAM), responses are near-instant and you can run models up to 30B parameters comfortably.

Updating

Open WebUI and Ollama ship updates often. To upgrade:

bash

cd /opt/openwebui
docker compose pull
docker compose up -d
docker image prune -f

Your conversations, users, and settings stay in the Docker volumes.

Troubleshooting

"Model not found" error: the model isn't downloaded yet. Run docker exec -it ollama ollama pull model_name.

Slow responses: that's CPU inference. It's normal for 7B models to take 2-5 seconds per response on CPU. For faster results, use a smaller model like Phi-4 Mini or add a GPU.

Out of memory: the model is too big for your RAM. Stick to 7-8B models on 16 GB RAM. Ollama uses quantized models (Q4) by default to reduce memory usage.

WebSocket errors through proxy: check that your Nginx config includes the Upgrade and Connection headers for WebSocket support.

Can't register: by default the first user is admin. If registration got disabled, the admin can create accounts in Settings → Admin → Users.

Tags : tutorial

How to Scale Your Site for High-Traffic Events (in 2 Weeks or Less)

What are you waiting for ?

Create account Access my account

No commitment, deploy in seconds

Community zone

A question ?
Find answers and share your knowledge !

We are waiting you on community zone. More than 70 guides (sysadmin, gaming, devops...) !

Let me check

Need a quote ?

Write us !

Run Your Own ChatGPT with Open WebUI and Ollama

What you'll need

Step 1, Install Docker

Step 2, Create the Docker Compose file

Step 3, Start the stack

Step 4, Download your first model

Step 5, Start chatting

Step 6, Set up HTTPS for remote access

Step 7, Connect cloud APIs (optional)

How much RAM do you actually need?

Updating

Troubleshooting

How to Scale Your Site for High-Traffic Events (in 2 Weeks or Less)

Self-Host Your Photo Library with Immich

How to monitor your servers with Uptime Kuma

Community zone

A question ?
Find answers and share your knowledge !

Need a quote ?

Prendre contact

Run Your Own ChatGPT with Open WebUI and Ollama

What you'll need

Step 1, Install Docker

Step 2, Create the Docker Compose file

Step 3, Start the stack

Step 4, Download your first model

Step 5, Start chatting

Step 6, Set up HTTPS for remote access

Step 7, Connect cloud APIs (optional)

How much RAM do you actually need?

Updating

Troubleshooting

How to Scale Your Site for High-Traffic Events (in 2 Weeks or Less)

Self-Host Your Photo Library with Immich

How to monitor your servers with Uptime Kuma

Community zone

A question ? Find answers and share your knowledge !

Need a quote ?

Prendre contact

A question ?
Find answers and share your knowledge !