An informal comparison of the three major implementations of std::string
Raymond Chen
May 10th, 2024
We saw some time ago that the three major implementations of std::string are all quite different. To summarize:
// gcc struct string { char* ptr; size_t size; union { size_t capacity; char buf[16]; }; bool is_large() { return ptr != buf; } auto data() { return ptr; } auto size() { return size; } auto capacity() { return is_large() ? capacity : 15; } }; // msvc struct string { union { char* ptr; char buf[16]; }; size_t size; size_t capacity; bool is_large() { return capacity > 15; } auto data() { return is_large() ? ptr : buf; } auto size() { return size; } auto capacity() { return capacity; } }; // clang union string { struct { size_t capacity; size_t size; char* ptr; } large; struct { unsigned char is_small:1; unsigned char size:7; char buf[sizeof(large) - 1]; } small; bool is_large() { return !small.is_small; } auto data() { return is_large() ? large.ptr : small.buf; } auto size() { return is_large() ? large.size : small.size; } auto capacity() { return is_large() ? large.capacity : sizeof(large) - 2; } };
We’ll compare these versions based on the complexity of some commonly-used operations.
Detecting whether the string is small or large is a single member comparison with msvc and clang, but on gcc, it involves comparing a member against the address of another member, so it will take an extra instruction to calculate that address.
gcc is_large msvc is_large clang is_large lea rdx, [rcx].buf
cmp rdx, [rcx].ptr
jnz large cmp [rcx].capacity, 15
ja large test [rcx].is_small, 1
jz large
Note: gcc could have shaved an instruction by reordering the members so that the buf comes first (thereby avoiding the need to calculate its address). On the other hand, it increases the cost of accessing ptr on some processors: On the x86 family, it forces a larger encoding because the offset is nonzero. On the Itanium, it requires two instructions because the Itanium cannot perform an offset load in a single instruction. On most other processors, the offset can be folded into the load instruction at no extra cost. My guess is that gcc biased their design to optimize for x86.
On the other hand, gcc wins the race to access the data() , since the ptr is always valid, and that’s probably why they chose their design.
gcc data() msvc data() clang data()¹ mov rdx, [rcx].ptr lea rdx, [rcx].buf
cmp [rcx].capacity, 15
cmova rdx, [rdx] lea rdx, [rcx].small.buf
test [rcx].small.is_small, 1
jnz @1
mov rdx, [rcx].large.ptr
@1:
The clang implementation also has extra work to calculate the size.
gcc size() msvc size() clang size()² mov rdx, [rcx].size mov rdx, [rcx].size movzx rax, [rcx].small.is_small
test al, 1
jnz @1
mov rax, [rcx].large.size
jmp @2
@1: shr rax, 1
@2:
A special case of checking the size is checking whether the string is empty.
gcc empty() msvc empty() clang empty() cmp [rcx].size, 0
jz empty cmp [rcx].size, 0
jz empty movzx rax, [rcx].small.is_small
test al, 1
jz @1
mov rax, [rcx].large.size
jmp @2
@1: dec rax
@2: test rax, rax
jz empty
The capacity comes into play behind the scenes when extending the string. For example, append(char) can do a fast-append if there is excess capacity, and delegate to a function call if the capacity needs to be increased. Here, msvc has an edge.
gcc capacity() msvc capacity() clang capacity() lea rax, [rcx].buf
cmp rax, [rcx].ptr
mov eax, 15
cmovnz rax, [rcx].capacity mov rax, [rcx].capacity test [rcx].small.is_small, 1
mov eax, 22
cmovz rax, [rcx].large.capacity
The clang implementation does have an edge in terms of memory usage: Despite an overall smaller size, it has a larger small-string capacity in 64-bit mode.
sizeof / SSO capacity gcc msvc clang 32-bit mode 24 / 16 24 / 16 12 / 11 64-bit mode 32 / 16 32 / 16 24 / 22
If you reserve() a lot of space for a string, but use only a little bit of it, and then call shrink_ to_ fit() , you can potentially get into a mixed state where the string is allocated externally (as if it were a large string), even though the capacity is smaller than the capacity of a small string.
The msvc implementation uses the capacity to detect whether it is using the small string optimization, so this mixed state is illegal for msvc, and it must convert large strings to small strings if shrink_ to_ fit() shrinks the string below the small-string threshold.
The gcc and clang implementations allow external allocations to have a small capacity. Nevertheless, the clang implementation does convert externally-allocated strings to small strings if they shrink below the small-string threshold.
The gcc implementation takes a different approach: With gcc, shrink_ to_ fit() is a nop! This is legal according to the C++ standard: The shrink_ to_ fit() method is an advisory call, and the implementation is permitted to ignore the advice.
One final point of comparison is how the three implementations deal with static initialization.
gcc msvc clang { buf, 15, { 0 } } { { 0 }, 0, 15 } { 1, 0, 0, ... }
A statically-initiaized empty string in gcc consists of a pointer to the internal buffer, a constant 15 (size), and a bunch of zeroes (buf). The presence of a pointer introduces a relocation into the data segment and silently messes up string’s constexpr -ness.
Statically-initiaized empty strings in msvc and clang consist of integer constant data; no pointers. This means no relocations and a better shot at constexpr -ness.
Okay, so let’s summarize all this information into a table.
gcc msvc clang is_large slower faster faster data() fast slower slower size() fast fast much slower empty() fast fast much slower capacity() slowest fast slower 32-bit size 24 24 12 64-bit size 32 32 24 32-bit SSO capacity 16 16 11 64-bit SSO capacity 16 16 22 shrink_to_fit() nop must convert to SSO choose to convert to SSO³ Static initialization relocation no relocation no relocation
¹ I don’t see clang generating this slightly smaller alternative
lea rdx, [rcx].small.buf test [rcx].small.is_small, 1 cmovz rdx, [rcx].large.ptr
perhaps because the cmov instruction always reads from its second parameter even if the value is not used, and there might be a store-forward penalty because in the case of a small string, the read is unlikely to match the size of the previous write.
² I don’t see clang generating this slightly smaller alternative
movzx rax, [rcx].small.is_small shr rax, 1 jc @1 mov rax, [rcx].large.size @1:
probably because “shift right and look at carry” is not something natively expressible in C++. If you really went for it, you could also fold in a cmov .
movzx rax, [rcx].small.is_small shr rax, 1 cmovnc rax, [rcx].large.size
³ Although the ABI allows the implementation to choose whether or not shrinking to an SSO-friendly size actually converts to SSO, the current implementation always converts.