Ollama lets you run large language models locally with a single command. OpenAI-compatible API, support for Llama, Mistral, Gemma, and hundreds more. CPU mode works for small models; GPU unlocks 10x faster inference.
Good entry point — CPU inference for 7-8B models
See matching serversRecommended — full-speed inference for small models
See matching serversFor 30-70B models and production workloads
See matching serversLooking for a specific GPU configuration?
Browse all GPU dedicated server plans →Install Ollama with a single command: curl -fsSL https://ollama.com/install.sh | sh. It handles everything — service setup, GPU detection, and model management.
Ollama exposes an OpenAI-compatible REST API. Any application built for ChatGPT works with Ollama without code changes — just point the base URL to your server.
Quantized models (Q4_K_M) reduce VRAM requirements by ~50% with minimal quality loss. A model that normally needs 16 GB VRAM runs comfortably in 8 GB.
Small 7-8B models run on CPU with 16 GB RAM — useful for development and testing. A GPU with 8+ GB VRAM delivers 10x faster inference, making it viable for real use.
Ollama is the easiest way to run open-source large language models on your own infrastructure. A single installation command gives you a local LLM server with an OpenAI-compatible API — point your existing applications at it without any code changes. Small models like Llama 3 8B and Mistral 7B run on CPU with 16 GB RAM, which works for development and experimentation. For production use or faster inference, a GPU server with 8+ GB VRAM delivers 10x the speed. Dedimax VPS plans from €9.99/mo cover CPU workloads; dedicated GPU servers handle anything from 7B to 70B parameter models.
Take control of your dedicated server (settings, data ...) sans limites dans l'installation de vos applications.
What are you waiting for ?
We are waiting you on community zone. More than 70 guides (sysadmin, gaming, devops...) !
Let me check