Building libs for GPUs#

Building the GPU C library#

This document will present recipes to build the LLVM C library targeting a GPU architecture. The GPU build uses the same cross build support as the other targets. However, the GPU target has the restriction that it must be built with an up-to-date clang compiler. This is because the GPU target uses several compiler extensions to target GPU architectures.

The LLVM C library currently supports two GPU targets. This is either nvptx64-nvidia-cuda for NVIDIA GPUs or amdgcn-amd-amdhsa for AMD GPUs. Targeting these architectures is done through clang’s cross-compiling support using the --target=<triple> flag. The following sections will describe how to build the GPU support specifically.

Once you have finished building, refer to Using libc for GPUs to get started with the newly built C library.

Bootstrap Build#

The recommended way to build the GPU libc is using a Bootstrap Build (see Build Concepts for an overview of build modes). This approach first builds the compiler (clang) and tools using the host compiler, and then automatically uses that newly-built compiler to build the C library for the GPU targets in a single CMake invocation. This is ideal for building for multiple GPU vendors at once.

Set the environment variables for the build:

export BUILD_DIR=build
export INSTALL_PREFIX=install
export BUILD_TYPE=Release

Configure the project:

cmake -G Ninja -S llvm -B $BUILD_DIR \
   -DLLVM_ENABLE_PROJECTS="clang;lld" \
   -DLLVM_ENABLE_RUNTIMES="openmp;offload" \
   -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
   -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
   -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libc \
   -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libc \
   -DLLVM_RUNTIME_TARGETS="default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda"

Build and install the libraries:

ninja -C $BUILD_DIR install

We need clang to build the GPU C library and lld to link AMDGPU executables, so we enable them in LLVM_ENABLE_PROJECTS. We add openmp to LLVM_ENABLED_RUNTIMES so it is built for the default target and provides OpenMP support. We then set RUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES to enable libc for the GPU targets. The LLVM_RUNTIME_TARGETS sets the enabled targets to build, in this case we want the default target and the GPU targets. Note that if libc were included in LLVM_ENABLE_RUNTIMES it would build targeting the default host environment as well. Alternatively, you can point your build towards the libc/cmake/caches/gpu.cmake cache file with -C.

Two-stage Cross-compiler Build#

Alternatively, you can use a manual Two-stage Cross-compiler Build (see Build Concepts). This separates the build into two distinct CMake invocations: first to build the host compiler and tools (Stage 1), and then to build the target library using those tools (Stage 2). This provides more direct control over the configuration and is useful if you only need to build for a single target or want to avoid the complexity of the combined bootstrap build.

First, build the host compiler. Set up the variables for the host build:

export HOST_BUILD_DIR=build-libc-tools
export HOST_C_COMPILER=clang
export HOST_CXX_COMPILER=clang++

Configure the host build:

cmake -G Ninja -S llvm -B $HOST_BUILD_DIR \
   -DLLVM_ENABLE_PROJECTS="clang" \
   -DCMAKE_C_COMPILER=$HOST_C_COMPILER \
   -DCMAKE_CXX_COMPILER=$HOST_CXX_COMPILER \
   -DLLVM_LIBC_FULL_BUILD=ON \
   -DCMAKE_BUILD_TYPE=Release

Build the host tools:

ninja -C $HOST_BUILD_DIR

Once this has finished, use the newly built compiler to build the C library for the GPU. Select your target architecture (amdgcn-amd-amdhsa or nvptx64-nvidia-cuda).

Set up the variables for the target build:

export TARGET_TRIPLE=amdgcn-amd-amdhsa # or nvptx64-nvidia-cuda
export TARGET_BUILD_DIR=build
export TARGET_C_COMPILER=build-libc-tools/bin/clang
export TARGET_CXX_COMPILER=build-libc-tools/bin/clang++

Configure the target build:

cmake -G Ninja -S runtimes -B $TARGET_BUILD_DIR \
   -DLLVM_ENABLE_RUNTIMES=libc \
   -DCMAKE_C_COMPILER=$TARGET_C_COMPILER \
   -DCMAKE_CXX_COMPILER=$TARGET_CXX_COMPILER   \
   -DLLVM_LIBC_FULL_BUILD=ON                   \
   -DLLVM_DEFAULT_TARGET_TRIPLE=$TARGET_TRIPLE \
   -DCMAKE_BUILD_TYPE=Release

Build and install the target library:

ninja -C $TARGET_BUILD_DIR install

The above steps will result in a build targeting one of the supported GPU architectures. Building for multiple targets requires separate CMake invocations.

Build overview#

Once installed, the GPU build will create several files used for different targets. This section will briefly describe their purpose.

include/<target-triple>

The include directory where all of the generated headers for the target will go. These definitions are strictly for the GPU when being targeted directly.

lib/clang/<llvm-major-version>/include/llvm-libc-wrappers/llvm-libc-decls

These are wrapper headers created for offloading languages like CUDA, HIP, or OpenMP. They contain functions supported in the GPU libc along with attributes and metadata that declare them on the target device and make them compatible with the host headers.

lib/<target-triple>/libc.a

The main C library static archive containing LLVM-IR targeting the given GPU. It can be linked directly or inspected depending on the target support.

lib/<target-triple>/libm.a

The C library static archive providing implementations of the standard math functions.

lib/<target-triple>/libc.bc

An alternate form of the library provided as a single LLVM-IR bitcode blob. This can be used similarly to NVIDIA’s or AMD’s device libraries.

lib/<target-triple>/libm.bc

An alternate form of the library provided as a single LLVM-IR bitcode blob containing the standard math functions.

lib/<target-triple>/crt1.o

An LLVM-IR file containing startup code to call the main function on the GPU. This is used similarly to the standard C library startup object.

bin/amdhsa-loader

A binary utility used to launch executables compiled targeting the AMD GPU. This will be included if the build system found the hsa-runtime64 library either in /opt/rocm or the current CMake installation directory. This is required to build the GPU tests .See the libc GPU usage for more information.

bin/nvptx-loader

A binary utility used to launch executables compiled targeting the NVIDIA GPU. This will be included if the build system found the CUDA driver API. This is required for building tests.

include/llvm-libc-rpc-server.h

A header file containing definitions that can be used to interface with the RPC server.

lib/libllvmlibc_rpc_server.a

The static library containing the implementation of the RPC server. This can be used to enable host services for anyone looking to interface with the RPC client.

CMake options#

This section briefly lists a few of the CMake variables that specifically control the GPU build of the C library. These options can be passed individually to each target using -DRUNTIMES_<target>_<variable>=<value> when using a standard runtime build.

LLVM_LIBC_FULL_BUILD:BOOL

This flag controls whether or not the libc build will generate its own headers. This must always be on when targeting the GPU.

LIBC_GPU_BUILD:BOOL

Shorthand for enabling GPU support. Equivalent to enabling support for both AMDGPU and NVPTX builds for libc.

LIBC_GPU_TEST_ARCHITECTURE:STRING

Sets the architecture used to build the GPU tests for, such as gfx90a or sm_80 for AMD and NVIDIA GPUs respectively. The default behavior is to detect the system’s GPU architecture using the native option. If this option is not set and a GPU was not detected the tests will not be built.

LIBC_GPU_TEST_JOBS:STRING

Sets the number of threads used to run GPU tests. The GPU test suite will commonly run out of resources if this is not constrained so it is recommended to keep it low. The default value is a single thread.

CMAKE_CROSSCOMPILING_EMULATOR:STRING

Overrides the default loader used for running GPU tests. This is set automatically to llvm-gpu-loader for GPU runtime targets when building via the runtimes build.