NixOS - running LLM locally using ollama and Docker

2024-02-26

Ollama is now available as an official Docker image, so we can run it on any machine with docker installed. Since I have an NVIDIA RTX 2070 SUPER I would like to use the GPU to accelerate the LLM to be blazingly fast. First we need to enable proprietary NVIDIA drivers on NixOS machine in configuration.nix:

  # Enable Nvidia and OpenGL
  hardware = common.hardware // {
    opengl = {
      enable = true;
      driSupport = true;
      driSupport32Bit = true;
    };
    nvidia = {
      modesetting.enable = true;
      powerManagement.enable = false;
      powerManagement.finegrained = false;
      open = false;
      nvidiaSettings = true;
      package = config.boot.kernelPackages.nvidiaPackages.stable;
    };
  };
  services = common.services // {
    xserver = common.services.xserver // {
      videoDrivers = [ "nvidia" ];
    };
  };

Then we need to also enable Docker with NVIDIA support enabled in configuration.nix:

  # nvidia docker
  virtualisation = {
    docker = {
      enable = true;
      enableNvidia = true;
    };
  };

Now we need to rebuild the system with nixos-rebuild switch to apply the changes to the NixOS system. After successful install I recommend rebooting the machine. After reboot we can test if the NVIDIA drivers are working using nvidia-smi command:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070 ...    Off | 00000000:01:00.0  On |                  N/A |
| 24%   30C    P8              23W / 215W |    649MiB /  8192MiB |      8%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Seems to be working so we can pull the ollama container and run it with NVIDIA GPU support:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Now you can run any LLM model from ollama model library, for example mistral:

$ docker exec -it ollama ollama run mistral
>>> What is 1 + 1? Answer only.
 The answer to the expression "1 + 1" is 2.