NixOS - running LLM locally using ollama and Docker

2024-02-26

Ollama is now available as an official Docker image, so we can run it on any machine with docker installed. Since I have an NVIDIA RTX 2070 SUPER I would like to use the GPU to accelerate the LLM to be blazingly fast. First we need to enable proprietary NVIDIA drivers on NixOS machine in configuration.nix:

# Enable Nvidia and OpenGL
hardware = common.hardware // {
  opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
  };
  nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    nvidiaSettings = true;
    package = config.boot.kernelPackages.nvidiaPackages.stable;
  };
};
services = common.services // {
  xserver = common.services.xserver // {
    videoDrivers = [ "nvidia" ];
  };
};

Then we need to also enable Docker with NVIDIA support enabled in configuration.nix:

# nvidia docker
virtualisation = {
  docker = {
    enable = true;
    enableNvidia = true;
  };
};

Now we need to rebuild the system with nixos-rebuild switch to apply the changes to the NixOS system. After successful install I recommend rebooting the machine. After reboot we can test if the NVIDIA drivers are working using nvidia-smi command:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070 ...    Off | 00000000:01:00.0  On |                  N/A |
| 24%   30C    P8              23W / 215W |    649MiB /  8192MiB |      8%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Seems to be working so we can pull the ollama container and run it with NVIDIA GPU support:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Now you can run any LLM model from ollama model library, for example mistral:

$ docker exec -it ollama ollama run mistral
>>> What is 1 + 1? Answer only.
 The answer to the expression "1 + 1" is 2.