Recommended configurations

Ollama lets you run large language models locally with a single command. OpenAI-compatible API, support for Llama, Mistral, Gemma, and hundreds more. CPU mode works for small models; GPU unlocks 10x faster inference.

Small models, CPU

Llama 3 8B, Mistral 7B, Gemma 7B Slower inference, functional for personal use
from €9.99/mo
VPS
CPU
4 cores
RAM
16 GB RAM
Storage
50 GB NVMe
Network
1 Gbps unlimited
Instant

Good entry point — CPU inference for 7-8B models

See matching servers

Large models, GPU

Llama 3 70B, Mixtral 8x7B Maximum capability, production-grade
from €199.00/mo
Dedicated server
GPU 24–80 GB VRAM
CPU
8 cores
RAM
64 GB RAM
Storage
200 GB NVMe
Network
1 Gbps unlimited
24–72h

For 30-70B models and production workloads

See matching servers

Looking for a specific GPU configuration?

Browse all GPU dedicated server plans →

Why Ollama needs the right server

One command install

Install Ollama with a single command: curl -fsSL https://ollama.com/install.sh | sh. It handles everything — service setup, GPU detection, and model management.

OpenAI-compatible API

Ollama exposes an OpenAI-compatible REST API. Any application built for ChatGPT works with Ollama without code changes — just point the base URL to your server.

Quantized models cut VRAM

Quantized models (Q4_K_M) reduce VRAM requirements by ~50% with minimal quality loss. A model that normally needs 16 GB VRAM runs comfortably in 8 GB.

CPU works, GPU transforms it

Small 7-8B models run on CPU with 16 GB RAM — useful for development and testing. A GPU with 8+ GB VRAM delivers 10x faster inference, making it viable for real use.

Frequently asked questions

Can Ollama run without a GPU?

Yes. Ollama supports CPU-only inference. Models like Llama 3 8B and Mistral 7B run on CPU with 16 GB RAM — slower than GPU, but functional for development and personal use. A VPS from €9.99/mo covers CPU inference.

Which GPU do I need for Ollama?

For 7-8B models, a GPU with 8+ GB VRAM (RTX 3070 or similar) is sufficient. For 30-70B models like Llama 3 70B or Mixtral, you need 24–80 GB VRAM (RTX 4090 or A100). Quantized models reduce VRAM requirements by ~50%.

Is Ollama compatible with my existing apps?

Yes. Ollama exposes an OpenAI-compatible API. Any application that uses the OpenAI SDK or REST API works with Ollama without code changes — update the base URL to your server and it works immediately.

How do I manage models with Ollama?

Use the Ollama CLI: ollama pull llama3 to download a model, ollama list to see installed models, ollama run llama3 for interactive chat. Models are stored as files — download them once and run them offline.

Can I run multiple models at once?

Yes. Ollama can load and serve multiple models concurrently, limited by available VRAM. With 24 GB VRAM, you can run two 7B models or one 13B model simultaneously. RAM-based CPU models are limited by system RAM.

Ollama is the easiest way to run open-source large language models on your own infrastructure. A single installation command gives you a local LLM server with an OpenAI-compatible API — point your existing applications at it without any code changes. Small models like Llama 3 8B and Mistral 7B run on CPU with 16 GB RAM, which works for development and experimentation. For production use or faster inference, a GPU server with 8+ GB VRAM delivers 10x the speed. Dedimax VPS plans from €9.99/mo cover CPU workloads; dedicated GPU servers handle anything from 7B to 70B parameter models.

Community zone

A question ?
Find answers and share your knowledge !

We are waiting you on community zone. More than 70 guides (sysadmin, gaming, devops...) !

Let me check
DEDIMAX DEDIMAX DEDIMAX DEDIMAX
DEDIMAX

Need a quote ?

Write us !

Contact us

Prendre contact