Using libc for GPUs

Building the GPU library

LLVM’s libc GPU support must be built with an up-to-date clang compiler due to heavy reliance on clang’s GPU support. This can be done automatically using the LLVM_ENABLE_RUNTIMES=libc option. To enable libc for the GPU, enable the LIBC_GPU_BUILD option. By default, libcgpu.a will be built using every supported GPU architecture. To restrict the number of architectures build, either set LIBC_GPU_ARCHITECTURES to the list of desired architectures manually or use native to detect the GPUs on your system. A typical cmake configuration will look like this:

$> cd llvm-project  # The llvm-project checkout
$> mkdir build
$> cd build
$> cmake ../llvm -G Ninja                                \
   -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt"        \
   -DLLVM_ENABLE_RUNTIMES="libc;openmp"                  \
   -DCMAKE_BUILD_TYPE=<Debug|Release>   \ # Select build type
   -DLIBC_GPU_BUILD=ON                  \ # Build in GPU mode
   -DLIBC_GPU_ARCHITECTURES=all         \ # Build all supported architectures
   -DCMAKE_INSTALL_PREFIX=<PATH>        \ # Where 'libcgpu.a' will live
$> ninja install

Since we want to include clang, lld and compiler-rt in our toolchain, we list them in LLVM_ENABLE_PROJECTS. To ensure libc is built using a compatible compiler and to support openmp offloading, we list them in LLVM_ENABLE_RUNTIMES to build them after the enabled projects using the newly built compiler. CMAKE_INSTALL_PREFIX specifies the installation directory in which to install the libcgpu.a library and headers along with LLVM. The generated headers will be placed in include/gpu-none-llvm.

Usage

Once the libcgpu.a static archive has been built it can be linked directly with offloading applications as a standard library. This process is described in the clang documentation. This linking mode is used by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains through the --offload-new-driver` and -fgpu-rdc flags. A typical usage will look this this:

$> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu

The libcgpu.a static archive is a fat-binary containing LLVM-IR for each supported target device. The supported architectures can be seen using LLVM’s llvm-objdump with the --offloading flag:

$> llvm-objdump --offloading libcgpu.a
libcgpu.a(strcmp.cpp.o):    file format elf64-x86-64

OFFLOADING IMAGE [0]:
kind            llvm ir
arch            gfx90a
triple          amdgcn-amd-amdhsa
producer        none

Because the device code is stored inside a fat binary, it can be difficult to inspect the resulting code. This can be done using the following utilities:

$> llvm-ar x libcgpu.a strcmp.cpp.o
$> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
$> opt -S out.bc
...

Please note that this fat binary format is provided for compatibility with existing offloading toolchains. The implementation in libc does not depend on any existing offloading languages and is completely freestanding.