llama.cpp compilation with CUDA (Linux)

For more detailed guides check llama.cpp’s official build guide

[!IMPORTANT] Prerequisites Install cuda-tookit¹, cmake, git before proceeding

Compilation

First git clone the repository

$ git clone https://github.com/ggml-org/llama.cpp.git
$ cd llama.cpp

Then:

$ cmake -B build -DGGML_CUDA=ON
  cmake --build build --config Release

If CUDACXX or CMAKE_CUDA_COMPILER is wrong (no nvcc CUDA compiler) add -DCMAKE_CUDA_COMPILER=cuda/toolkit/path (probably will be in /usr/local/cuda-VERSION/bin/nvcc) and -DCUDAToolkit_ROOT=cuda/toolkit/root/path (probably will be /usr/local/cuda-VERSION)

If they tell you ccache is not found add -DGGML_CACHE=OFF

On cmake --build build --config Release, if nvcc spews out a warning saying no gpu found, then redo the first step and add -DCMAKE_CUDA_ARCHITECTURES="COMPUTE_LEVELS_OF_YOUR_SYSTEM" (e.g. you only have compute level 8.6 devices do: -DCMAKE_CUDA_ARCHITECTURES="86", add ; to separate difference compute levels, i.e. 86;89)

If cmake says OpenSSL is not found try installing libssl-dev (the shared libraries for OpenSSL)

You may need to restart as well.

Optional Nice-to-haves

Installing nvtop
Creating a symlink to the ./llama.cpp/build/bin folder and the llama.cpp models folder (on Linux: ~/.cache/llama.cpp/; MacOS ~/Libray/Caches/llama.cpp )
Creating an alias to updating llama.cpp (maybe also setup a cron job for that matter)

#ai #ai/realworld #linux

At least on Ubuntu you can just do sudo apt install nvidia-cuda-toolkit, it’s easier to install and manage as it’s installed from apt² ↩
If you decide to follow the Nvidia instructions, it is obviously more official, and if you find it annoying to only install one specified version, you can just install w/ something like this: sudo apt-get -y install cuda (just saying cuda just installs everything) ↩