llama.cpp compilation with CUDA (Linux)
For more detailed guides check llama.cpp’s official build guide
[!IMPORTANT] Prerequisites Install
cuda-tookit1,cmake,gitbefore proceeding
Compilation
First git clone the repository
$ git clone https://github.com/ggml-org/llama.cpp.git
$ cd llama.cpp
Then:
$ cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
If
CUDACXXorCMAKE_CUDA_COMPILERis wrong (nonvccCUDA compiler) add-DCMAKE_CUDA_COMPILER=cuda/toolkit/path(probably will be in/usr/local/cuda-VERSION/bin/nvcc) and-DCUDAToolkit_ROOT=cuda/toolkit/root/path(probably will be/usr/local/cuda-VERSION)
If they tell you
ccacheis not found add-DGGML_CACHE=OFF
On
cmake --build build --config Release, ifnvccspews out a warning sayingno gpu found, then redo the first step and add-DCMAKE_CUDA_ARCHITECTURES="COMPUTE_LEVELS_OF_YOUR_SYSTEM"(e.g. you only have compute level8.6devices do:-DCMAKE_CUDA_ARCHITECTURES="86", add;to separate difference compute levels, i.e.86;89)
If
cmakesays OpenSSL is not found try installinglibssl-dev(the shared libraries for OpenSSL)
You may need to restart as well.
Optional Nice-to-haves
- Installing
nvtop - Creating a symlink to the
./llama.cpp/build/binfolder and thellama.cppmodels folder (on Linux:~/.cache/llama.cpp/; MacOS~/Libray/Caches/llama.cpp) - Creating an alias to updating
llama.cpp(maybe also setup acronjob for that matter)
#ai #ai/realworld #linux
-
At least on Ubuntu you can just do
sudo apt install nvidia-cuda-toolkit, it’s easier to install and manage as it’s installed fromapt2 ↩ -
If you decide to follow the Nvidia instructions, it is obviously more official, and if you find it annoying to only install one specified version, you can just install w/ something like this:
sudo apt-get -y install cuda(just sayingcudajust installs everything) ↩