Modular format strings#
Introduction#
Several C standard library functions (most notably, printf and scanf),
present a large amount of related features to the caller configured via a
format string. This benefits code size at the caller, since format strings are
typically quite dense, and the equivalent of many individual calls can be
performed with only one. Overall this is a benefit, since to a function calls
typically outnumber the one definition of that function.
However, the implementations of various libc features gated behind aspects of those format strings can be large enough that they completely swamp the programs that call them. Floating point and errno conversion in particular can involve large tables which may be wholly dead. However, due to the format string structure, this code is dead in a way previously invisible to the compiler.
To address this, an clang attribute was introduced: modular_format(<impl_fn>,
<impl_name>, <aspects>...). This adds to the semantics of the existing
format attribute (which must also be present, if implicitly.) The first
argument is a symbol naming a modular version of the implementation; this
version only weakly refers to “aspects” of the implementation that may not be
necessary for certain format strings. The second argument is general
“implementation name” string, and the remaining arguments are a list of handled
aspects of the format string. When the compiler sees that a given call only
needs a fixed set of aspects of the implementation, it may redirect the call to
the implementation function and emit a series of relocations to symbols named
<impl_name>_<aspect>. These in turn bring the needed aspects of the call
into the link. The default entrypoints fall the modular ones, except they bring
in every possible implementation aspect.
Mechanism#
This functionality is currently gated behind LIBC_COPT_PRINTF_MODULAR. When
set, the printf-family functions gain modular variants, and the regular
variants are modified to call them and emit NONE relocations against all
implementation aspects.
The implementation aspects are defined in headers using the
LIBC_PRINTF_MODULE((<decl>), { <body> }) macro. If
LIBC_COPT_PRINTF_MODULAR is not defined, then this macro makes these
LIBC_INLINE definitions as per usual. Otherwise, for normal usage, these
become weak declarations, which causes any references to the module to become
weak. The implementations are moved to a dedicated impl file for groups of
modules. These define the aspect symbol and the module impls by defining
LIBC_PRINTF_DEFINE_MODULES before including the header. This causes the to
be brought into the link exactly when the aspect symbol is referenced.
Template functions present a special complication: the implementation must
instantiate them for any value that may be used. Since the purpose of the
templates is to implement a fixed interface, the possible arguments should
always be fixed and finite. Accordingly, libc contains def files to enumerate
possible arguments and provide handling for each. Templates are instantiated in
the headers whenever LIBC_PRINTF_DEFINE_MODULES is defined.
libc and the compiler may understand different sets of aspect names, but their understanding of what an aspect name means must be identical. libc reports the set of aspect names that it needs a verdict on, and the compiler will only provide a verdict for those aspects. If libc asks for a verdict on an aspect unknown to the compiler, the aspect must be summarily considered to be required.