Math Functions

Source Locations

Implementation Requirements / Goals

  • The highest priority is to be as accurate as possible, according to the C and IEEE 754 standards. By default, we will aim to be correctly rounded for all rounding modes. The current rounding mode of the floating point environment is used to perform computations and produce the final results.

    • To test for correctness, we compare the outputs with other correctly rounded multiple-precision math libraries such as the GNU MPFR library or the CORE-MATH library.

  • Our next requirement is that the outputs are consistent across all platforms. Notice that the consistency requirement will be satisfied automatically if the implementation is correctly rounded.

  • Our last requirement for the implementations is to have good and predicable performance:

    • The average performance should be comparable to other libc implementations.

    • The worst case performance should be within 10X-20X of the average.

    • Platform-specific implementations or instructions could be added whenever it makes sense and provides significant performance boost.

  • For other use cases that have strict requirements on the code size, memory footprint, or latency, such as embedded systems, we will aim to be as accurate as possible within the memory or latency budgets, and consistent across all platforms.

Add a new math function to LLVM libc

Implementation Status

Basic Operations

<Func>

Linux

Windows

MacOS

Embedded

GPU

x86_64

aarch64

aarch32

riscv64

x86_64

aarch64

x86_64

aarch64

aarch32

riscv32

AMD

nVidia

ceil

ceilf

ceill

ceilf128

copysign

copysignf

copysignl

copysignf128

fabs

fabsf

fabsl

fabsf128

fdim

fdimf

fdiml

fdimf128

floor

floorf

floorl

floorf128

fmax

fmaxf

fmaxf128

fmaxl

fmin

fminf

fminf128

fminl

fmod

fmodf

fmodl

frexp

frexpf

frexpl

frexpf128

ilogb

ilogbf

ilogbl

ldexp

ldexpf

ldexpl

ldexpf128

llrint

llrintf

llrintl

llround

llroundf

llroundl

logb

logbf

logbl

lrint

lrintf

lrintl

lround

lroundf

lroundl

modf

modff

modfl

nan

nanf

nanl

nearbyint

nearbyintf

nearbyintl

nextafter

nextafterf

nextafterl

nexttoward

nexttowardf

nexttowardl

remainder

remainderf

remainderl

remquo

remquof

remquol

rint

rintf

rintl

round

roundf

roundl

roundf128

scalbn

scalbnf

scalbnl

trunc

truncf

truncl

truncf128

Higher Math Functions

<Func>

Linux

Windows

MacOS

Embedded

GPU

x86_64

aarch64

aarch32

riscv64

x86_64

aarch64

x86_64

aarch64

aarch32

riscv32

AMD

nVidia

acos

acosf

acosl

acosh

acoshf

acoshl

asin

asinf

asinl

asinh

asinhf

asinhl

atan

atanf

atanl

atan2

atan2f

atan2l

atanh

atanhf

atanhl

cbrt

cbrtf

cbrtl

cos

cosf

cosl

cosh

coshf

coshl

erf

erff

erfl

erfc

erfcf

erfcl

exp

expf

expl

exp10

exp10f

exp10l

exp2

exp2f

exp2l

expm1

expm1f

expm1l

fma

fmaf

fmal

hypot

hypotf

hypotl

lgamma

lgammaf

lgammal

log

logf

logl

log10

log10f

log10l

log1p

log1pf

log1pl

log2

log2f

log2l

pow

powf

powl

sin

sinf

sinl

sincos

sincosf

sincosl

sinh

sinhf

sinhl

sqrt

sqrtf

sqrtl

sqrtf128

tan

tanf

tanl

tanh

tanhf

tanhl

tgamma

tgammaf

tgammal

Accuracy of Higher Math Functions

<Func>

<Func_f> (float)

<Func> (double)

<Func_l> (long double)

<Func_f128> (float128)

acos

acosh

asin

asinh

atan

atanh

cos

large

cosh

erf

exp

exp10

exp2

expm1

fma

hypot

log

log10

log1p

log2

pow

sin

large

sincos

large

sinh

sqrt

tan

tanh

Legends:

  • : correctly rounded for all 4 rounding modes.

  • CR: correctly rounded for the default rounding mode (round-to-the-nearest, tie-to-even).

  • x ULPs: largest errors recorded.

Performance

  • Simple performance testings are located at: libc/test/src/math/differential_testing.

  • We also use the perf tool from the CORE-MATH project: link. The performance results from the CORE-MATH’s perf tool are reported in the table below, using the system library as reference (such as the GNU C library on Linux). Fmod performance results obtained with “differential_testing”.

<Func>

Reciprocal throughput (clk)

Latency (clk)

Testing ranges

Testing configuration

LLVM libc

Reference (glibc)

LLVM libc

Reference (glibc)

CPU

OS

Compiler

Special flags

acosf

24

29

62

77

\([-1, 1]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

acoshf

18

26

73

74

\([1, 21]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

asinf

23

27

62

62

\([-1, 1]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

asinhf

21

39

77

91

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

atanf

27

29

79

68

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

atanhf

18

66

68

133

\([-1, 1]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

cosf

13

32

53

59

\([0, 2\pi]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

coshf

14

20

50

48

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

expf

9

7

44

38

\([-10, 10]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

exp10f

10

8

40

38

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

exp2f

9

6

35

31

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

expm1f

9

44

42

121

\([-10, 10]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

fmodf

73

263

[MIN_NORMAL, MAX_NORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

9

11

[0, MAX_SUBNORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

fmod

595

3297

[MIN_NORMAL, MAX_NORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

14

13

[0, MAX_SUBNORMAL]

i5 mobile

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

hypotf

25

15

64

49

\([-10, 10] \times [-10, 10]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

logf

12

10

56

46

\([e^{-1}, e]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

log10f

9

17

35

48

\([e^{-1}, e]\)

Ryzen 5900X

Ubuntu 22.04 LTS x86_64

Clang 15.0.6

FMA

log1pf

16

33

61

97

\([e^{-0.5} - 1, e^{0.5} - 1]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

log2f

13

10

57

46

\([e^{-1}, e]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

sinf

12

25

51

57

\([-\pi, \pi]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

sincosf

19

30

57

68

\([-\pi, \pi]\)

Ryzen 1700

Ubuntu 20.04 LTS x86_64

Clang 12.0.0

FMA

sinhf

13

63

48

137

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

tanf

16

50

61

107

\([-\pi, \pi]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

tanhf

13

55

57

123

\([-10, 10]\)

Ryzen 1700

Ubuntu 22.04 LTS x86_64

Clang 14.0.0

FMA

Algorithms + Implementation Details

Fixed-point Arithmetics

References