math.h

Source Locations

Implementation Requirements / Goals

  • The highest priority is to be as accurate as possible, according to the C and IEEE 754 standards. By default, we will aim to be correctly rounded for all rounding modes. The current rounding mode of the floating point environment is used to perform computations and produce the final results.

    • To test for correctness, we compare the outputs with other correctly rounded multiple-precision math libraries such as the GNU MPFR library or the CORE-MATH library.

  • Our next requirement is that the outputs are consistent across all platforms. Notice that the consistency requirement will be satisfied automatically if the implementation is correctly rounded.

  • Our last requirement for the implementations is to have good and predicable performance:

    • The average performance should be comparable to other libc implementations.

    • The worst case performance should be within 10X-20X of the average.

    • Platform-specific implementations or instructions could be added whenever it makes sense and provides significant performance boost.

  • For other use cases that have strict requirements on the code size, memory footprint, or latency, such as embedded systems, we will aim to be as accurate as possible within the memory or latency budgets, and consistent across all platforms.

Add a new math function to LLVM libc

Implementation Status

Basic Operations

<Func>

<Func_f> (float)

<Func> (double)

<Func_l> (long double)

<Func_f16> (float16)

<Func_f128> (float128)

<Func_bf16> (bfloat16)

C23 Definition Section

C23 Error Handling Section

ceil

7.12.9.1

F.10.6.1

canonicalize

7.12.11.7

F.10.8.7

copysign

7.12.11.1

F.10.8.1

dadd

N/A

N/A

N/A

*

N/A

7.12.14.1

F.10.11

ddiv

N/A

N/A

N/A

*

N/A

7.12.14.4

F.10.11

dfma

N/A

N/A

N/A

*

N/A

7.12.14.5

F.10.11

dmul

N/A

N/A

N/A

*

N/A

7.12.14.3

F.10.11

dsub

N/A

N/A

N/A

*

N/A

7.12.14.2

F.10.11

f16add

*

*

*

N/A

N/A

7.12.14.1

F.10.11

f16div

*

*

*

N/A

N/A

7.12.14.4

F.10.11

f16fma

*

*

*

N/A

N/A

7.12.14.5

F.10.11

f16mul

*

*

*

N/A

N/A

7.12.14.3

F.10.11

f16sub

*

*

*

N/A

N/A

7.12.14.2

F.10.11

bf16add

*

*

*

N/A

N/A

7.12.14.1

F.10.11

bf16div

*

*

*

N/A

N/A

7.12.14.4

F.10.11

bf16fma

*

*

*

N/A

N/A

7.12.14.5

F.10.11

bf16mul

*

*

*

N/A

N/A

7.12.14.3

F.10.11

bf16sub

*

*

*

N/A

N/A

7.12.14.2

F.10.11

fabs

7.12.7.3

F.10.4.3

fadd

N/A

N/A

N/A

7.12.14.1

F.10.11

fdim

7.12.12.1

F.10.9.1

fdiv

N/A

N/A

*

N/A

7.12.14.4

F.10.11

ffma

N/A

N/A

*

N/A

7.12.14.5

F.10.11

floor

7.12.9.2

F.10.6.2

fmax

7.12.12.2

F.10.9.2

fmaximum

7.12.12.4

F.10.9.4

fmaximum_mag

7.12.12.6

F.10.9.4

fmaximum_mag_num

7.12.12.10

F.10.9.5

fmaximum_num

7.12.12.8

F.10.9.5

fmin

7.12.12.3

F.10.9.3

fminimum

7.12.12.5

F.10.9.4

fminimum_mag

7.12.12.7

F.10.9.4

fminimum_mag_num

7.12.12.11

F.10.9.5

fminimum_num

7.12.12.9

F.10.9.5

fmod

7.12.10.1

F.10.7.1

fmul

N/A

N/A

*

N/A

7.12.14.3

F.10.11

frexp

7.12.6.7

F.10.3.7

fromfp

7.12.9.10

F.10.6.10

fromfpx

7.12.9.11

F.10.6.11

fsub

N/A

N/A

*

N/A

7.12.14.2

F.10.11

getpayload

F.10.13.1

N/A

ilogb

7.12.6.8

F.10.3.8

iscanonical

7.12.3.2

N/A

issignaling

7.12.3.8

N/A

ldexp

7.12.6.9

F.10.3.9

llogb

7.12.6.10

F.10.3.10

llrint

7.12.9.5

F.10.6.5

llround

7.12.9.7

F.10.6.7

logb

7.12.6.17

F.10.3.17

lrint

7.12.9.5

F.10.6.5

lround

7.12.9.7

F.10.6.7

modf

7.12.6.18

F.10.3.18

nan

7.12.11.2

F.10.8.2

nearbyint

7.12.9.3

F.10.6.3

nextafter

7.12.11.3

F.10.8.3

nextdown

7.12.11.6

F.10.8.6

nexttoward

N/A

7.12.11.4

F.10.8.4

nextup

7.12.11.5

F.10.8.5

remainder

7.12.10.2

F.10.7.2

remquo

7.12.10.3

F.10.7.3

rint

7.12.9.4

F.10.6.4

round

7.12.9.6

F.10.6.6

roundeven

7.12.9.8

F.10.6.8

scalbln

7.12.6.19

F.10.3.19

scalbn

7.12.6.19

F.10.3.19

setpayload

F.10.13.2

N/A

setpayloadsig

F.10.13.3

N/A

totalorder

F.10.12.1

N/A

totalordermag

F.10.12.2

N/A

trunc

7.12.9.9

F.10.6.9

ufromfp

7.12.9.10

F.10.6.10

ufromfpx

7.12.9.11

F.10.6.11

Higher Math Functions

<Func>

<Func_f> (float)

<Func> (double)

<Func_l> (long double)

<Func_f16> (float16)

<Func_f128> (float128)

<Func_bf16> (bfloat16)

C23 Definition Section

C23 Error Handling Section

acos

7.12.4.1

F.10.1.1

acosh

7.12.5.1

F.10.2.1

acospi

7.12.4.8

F.10.1.8

asin

7.12.4.2

F.10.1.2

asinh

7.12.5.2

F.10.2.2

asinpi

7.12.4.9

F.10.1.9

atan

1 ULP

7.12.4.3

F.10.1.3

atan2

1 ULP

1 ULP

7.12.4.4

F.10.1.4

atan2pi

7.12.4.11

F.10.1.11

atanh

7.12.5.3

F.10.2.3

atanpi

7.12.4.10

F.10.1.10

cbrt

7.12.7.1

F.10.4.1

compoundn

7.12.7.2

F.10.4.2

cos

7.12.4.5

F.10.1.5

cosh

7.12.5.4

F.10.2.4

cospi

7.12.4.12

F.10.1.12

dsqrt

N/A

N/A

N/A

*

7.12.14.6

F.10.11

erf

7.12.8.1

F.10.5.1

erfc

7.12.8.2

F.10.5.2

exp

7.12.6.1

F.10.3.1

exp10

7.12.6.2

F.10.3.2

exp10m1

7.12.6.3

F.10.3.3

exp2

7.12.6.4

F.10.3.4

exp2m1

7.12.6.5

F.10.3.5

expm1

7.12.6.6

F.10.3.6

fma

7.12.13.1

F.10.10.1

f16sqrt

*

*

*

N/A

7.12.14.6

F.10.11

fsqrt

N/A

N/A

*

7.12.14.6

F.10.11

hypot

7.12.7.4

F.10.4.4

lgamma

7.12.8.3

F.10.5.3

log

?

7.12.6.11

F.10.3.11

log10

7.12.6.12

F.10.3.12

log10p1

7.12.6.13

F.10.3.13

log1p

7.12.6.14

F.10.3.14

log2

7.12.6.15

F.10.3.15

log2p1

7.12.6.16

F.10.3.16

logp1

7.12.6.14

F.10.3.14

pow

1 ULP

7.12.7.5

F.10.4.5

powi*

pown

7.12.7.6

F.10.4.6

powr

7.12.7.7

F.10.4.7

rootn

7.12.7.8

F.10.4.8

rsqrt

7.12.7.9

F.10.4.9

sin

7.12.4.6

F.10.1.6

sincos

sinh

7.12.5.5

F.10.2.5

sinpi

7.12.4.13

F.10.1.13

sqrt

7.12.7.10

F.10.4.10

tan

7.12.4.7

F.10.1.7

tanh

7.12.5.6

F.10.2.6

tanpi

7.12.4.14

F.10.1.14

tgamma

7.12.8.4

F.10.5.4

Legends:

  • : correctly rounded for all 4 rounding modes.

  • CR: correctly rounded for the default rounding mode (round-to-the-nearest, tie-to-even).

  • x ULPs: largest errors recorded.

  • N/A: Not defined in the standard or will not be added.

  • *: LLVM libc extension.

  • ? Because of a conflict between float16 logb function and bfloat16 log function, the latter is implemented as log_bf16.

GPU Conformance

  • Conformance tests are located at: offload/unittests/Conformance.

  • The math functions for GPUs are compiled with the following optimization options: LIBC_MATH_SKIP_ACCURATE_PASS, LIBC_MATH_INTERMEDIATE_COMP_IN_FLOAT, LIBC_MATH_SMALL_TABLES, LIBC_MATH_NO_ERRNO, and LIBC_MATH_NO_EXCEPT.

  • The conformance test results for higher math functions on GPUs are reported in the table below. The results show the maximum observed ULP distance when comparing a given GPU implementation against the corresponding correctly rounded implementation from LLVM libc, which is computed on the host CPU and serves as the reference. For comparison purposes, results for CUDA Math and HIP Math against the same reference are also included.

Function

Test Method

ULP Tolerance

Max ULP Distance

LLVM libc (AMDGPU)

LLVM libc (CUDA)

CUDA Math (CUDA)

HIP Math (AMDGPU)

acos

Randomized

4

6 (FAILED)

6 (FAILED)

1

1

acosf

Exhaustive

4

1

1

1

1

acosf16

Exhaustive

2

1

1

1

acoshf

Exhaustive

4

1

1

2

1

acoshf16

Exhaustive

2

0

0

0

acospif16

Exhaustive

2

0

0

asin

Randomized

4

6 (FAILED)

6 (FAILED)

2

1

asinf

Exhaustive

4

1

1

1

3

asinf16

Exhaustive

2

0

0

2

asinhf

Exhaustive

4

1

1

2

1

asinhf16

Exhaustive

2

1

1

1

atanf

Exhaustive

5

0

0

1

2

atanf16

Exhaustive

2

1

1

1

atan2f

Randomized

6

1

1

2

3

atanhf

Exhaustive

5

0

0

3

1

atanhf16

Exhaustive

2

0

0

1

cbrt

Randomized

2

1

1

1

1

cbrtf

Exhaustive

2

0

0

1

1

cos

Randomized

4

1

1

2

1

cosf

Exhaustive

4

1

1

2

2

cosf16

Exhaustive

2

1

1

1

coshf

Exhaustive

4

0

0

2

1

coshf16

Exhaustive

2

1

0

1

cospif

Exhaustive

4

0

0

1

1

cospif16

Exhaustive

2

0

0

erff

Exhaustive

16

0

0

1

2

exp

Randomized

3

1

1

1

1

expf

Exhaustive

3

0

0

2

1

expf16

Exhaustive

2

1

1

1

exp10

Randomized

3

1

1

1

1

exp10f

Exhaustive

3

0

0

2

1

exp10f16

Exhaustive

2

1

1

1

exp2

Randomized

3

1

1

1

1

exp2f

Exhaustive

3

1

1

2

1

exp2f16

Exhaustive

2

1

1

0

expm1

Randomized

3

0

0

1

2

expm1f

Exhaustive

3

1

1

1

1

expm1f16

Exhaustive

2

1

1

1

hypot

Randomized

4

0

0

2

1

hypotf

Randomized

4

0

0

1

2

hypotf16

Exhaustive

2

0

0

log

Randomized

3

1

1

1

1

logf

Exhaustive

3

1

1

1

2

logf16

Exhaustive

2

1

1

1

log10

Randomized

3

1

1

1

1

log10f

Exhaustive

3

1

1

2

2

log10f16

Exhaustive

2

1

1

1

log1p

Randomized

2

1

1

1

1

log1pf

Exhaustive

2

1

1

1

1

log2

Randomized

3

1

1

1

1

log2f

Exhaustive

3

0

0

1

1

log2f16

Exhaustive

2

1

1

0

powf (integer exp.)

Randomized

16

0

0

2

1

powf (real exp.)

Randomized

16

0

0

2

1

sin

Randomized

4

1

1

1

1

sinf

Exhaustive

4

1

1

1

2

sinf16

Exhaustive

2

1

1

1

sincos (cos part)

Randomized

4

1

1

2

1

sincos (sin part)

Randomized

4

1

1

1

1

sincosf (cos part)

Exhaustive

4

1

1

2

2

sincosf (sin part)

Exhaustive

4

1

1

1

2

sinhf

Exhaustive

4

1

1

3

1

sinhf16

Exhaustive

2

1

1

1

sinpif

Exhaustive

4

0

0

1

1

sinpif16

Exhaustive

2

0

0

tan

Randomized

5

2

2

2

1

tanf

Exhaustive

5

0

0

3

2

tanf16

Exhaustive

2

1

1

2

tanhf

Exhaustive

5

0

0

2

1

tanhf16

Exhaustive

2

0

0

1

tanpif

Exhaustive

6

0

0

tanpif16

Exhaustive

2

1

1

Notes:

  • Exhaustive tests check every representable point in the input space. This method is used for half-precision functions and single-precision univariate functions.

  • Randomized tests check a large, deterministic subset of the input space, typically using 232 samples. This method is used for functions with larger input spaces, such as single-precision bivariate and double-precision functions.

  • ULP tolerances are based on The Khronos Group, The OpenCL C Specification v3.0.19, Sec. 7.4, Khronos Registry [July 10, 2025].

  • The AMD GPU used for testing is AMD Radeon RX 6950 XT.

  • The NVIDIA GPU used for testing is NVIDIA RTX 4000 SFF Ada Generation.

Performance

  • Simple performance testings are located at: libc/test/src/math/performance_testing.

  • We also use the perf tool from the CORE-MATH project: link. The performance results from the CORE-MATH’s perf tool are reported in the table below, using the system library as reference (such as the GNU C library on Linux). Fmod performance results obtained with “performance_testing”.

<Func>

Reciprocal throughput (clk)

Latency (clk)

Testing ranges

Testing configuration

LLVM libc

Reference (glibc)

LLVM libc

Reference (glibc)

CPU

OS

Compiler

Special flags

acosf

24

29

62

77

\([-1, 1]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

acoshf

18

26

73

74

\([1, 21]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

asinf

23

27

62

62

\([-1, 1]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

asinhf

21

39

77

91

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

atanf

27

29

79

68

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

atanhf

18

66

68

133

\([-1, 1]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

cosf

13

32

53

59

\([0, 2\pi]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

coshf

14

20

50

48

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

expf

9

7

44

38

\([-10, 10]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

exp10f

10

8

40

38

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

exp2f

9

6

35

31

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

expm1f

9

44

42

121

\([-10, 10]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

fmodf

73

263

[MIN_NORMAL, MAX_NORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

9

11

[0, MAX_SUBNORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

fmod

595

3297

[MIN_NORMAL, MAX_NORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

14

13

[0, MAX_SUBNORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

hypotf

25

15

64

49

\([-10, 10] \times [-10, 10]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

logf

12

10

56

46

\([e^{-1}, e]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

log10f

9

17

35

48

\([e^{-1}, e]\)

Ryzen 5900X

Ubuntu 22.04 LTS x86_64

Clang 15.0.6

FMA

log1pf

16

33

61

97

\([e^{-0.5} - 1, e^{0.5} - 1]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

log2f

13

10

57

46

\([e^{-1}, e]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

sinf

12

25

51

57

\([-\pi, \pi]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

sincosf

19

30

57

68

\([-\pi, \pi]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

sinhf

13

63

48

137

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

tanf

16

50

61

107

\([-\pi, \pi]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

tanhf

13

55

57

123

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

Algorithms + Implementation Details

Fixed-point Arithmetics

References