A 1-word sized representation of immutable UTF-8 strings. In-lines up to 1 word bytes. Optimized for memory usage and struct packing.
Use it like a String:
use cold_string::ColdString;
let s = ColdString::new("qwerty");
assert_eq!(s.as_str(), "qwerty");Packs well with other types:
use std::mem;
use cold_string::ColdString;
assert_eq!(mem::size_of::<ColdString>(), mem::size_of::<usize>());
assert_eq!(mem::align_of::<ColdString>(), 1);
assert_eq!(mem::size_of::<(ColdString, u8)>(), mem::size_of::<usize>() + 1);
assert_eq!(mem::align_of::<(ColdString, u8)>(), 1);ColdString is 8-byte tagged pointer (4 bytes on 32-bit machines):
#[repr(packed)]
pub struct ColdString {
/// The first byte of `encoded` is the "tag" and it determines the type:
/// - 10xxxxxx: an encoded address for the heap. To decode, 10 is set to 00 and swapped
/// with the LSB bits of the tag byte. The address is always a multiple of 4 (`HEAP_ALIGN`).
/// - 11111xxx: xxx is the length in range 0..=7, followed by length UTF-8 bytes.
/// - xxxxxxxx (valid UTF-8): 8 UTF-8 bytes.
encoded: *mut u8,
}encoded acts as either a pointer to the heap for strings longer than 8 bytes or is the inlined data itself. The first/"tag" byte indicates one of 3 encodings:
The tag byte has bits 11111xxx, where xxx is the length. self.0[1] to self.0[7] store the bytes of string.
The tag byte is any valid UTF-8 byte. self.0 stores the bytes of string. Since the string is UTF-8, the tag byte is guaranteed to not be 10xxxxx or 11111xxx.
self.0 encodes the pointer to heap, where tag byte is 10xxxxxx. 10xxxxxx is chosen because it's a UTF-8 continuation byte and therefore an impossible tag byte for inline mode. Since a heap-alignment of 4 is chosen, the pointer's least significant 2 bits are guaranteed to be 0 (See more). These bits are swapped with the 10 "tag" bits when de/coding between self.0 and the address value.
On the heap, the data starts with a variable length integer encoding of the length, followed by the bytes.
ptr --> <var int length> <data>
RSS per insertion of various collections containing strings of random lengths 0..=N:
| Vec | 0..=4 | 0..=8 | 0..=16 | 0..=32 | 0..=64 |
|---|---|---|---|---|---|
| cold-string | 8.0 | 8.0 | 23.2 | 33.7 | 53.4 |
| compact_str | 24.0 | 24.0 | 24.0 | 34.6 | 60.6 |
| compact_string | 22.9 | 24.9 | 31.6 | 39.7 | 55.7 |
| smallstr | 24.0 | 24.0 | 38.0 | 50.3 | 68.4 |
| smartstring | 24.0 | 24.0 | 24.0 | 40.4 | 65.4 |
| smol_str | 24.0 | 24.0 | 24.0 | 39.9 | 71.2 |
| std | 35.8 | 37.4 | 45.8 | 54.2 | 70.5 |
| HashMap | 0..=4 | 0..=8 | 0..=16 | 0..=32 | 0..=64 |
|---|---|---|---|---|---|
| cold-string | 35.7 | 35.7 | 63.3 | 88.2 | 125.1 |
| compact_str | 102.8 | 102.8 | 102.8 | 123.7 | 175.5 |
| compact_string | 45.4 | 59.6 | 78.2 | 97.1 | 130.1 |
| smallstr | 102.8 | 102.8 | 129.7 | 155.0 | 191.6 |
| smartstring | 102.8 | 102.8 | 102.8 | 135.9 | 185.8 |
| smol_str | 102.8 | 102.8 | 102.8 | 134.8 | 196.6 |
| std | 112.8 | 123.9 | 143.2 | 161.8 | 195.3 |
| B-Tree Set | 0..=4 | 0..=8 | 0..=16 | 0..=32 | 0..=64 |
|---|---|---|---|---|---|
| cold-string | 10.1 | 18.9 | 49.3 | 79.1 | 117.2 |
| compact_str | 24.8 | 48.4 | 61.5 | 90.5 | 145.7 |
| compact_string | 19.7 | 43.7 | 67.0 | 88.3 | 122.4 |
| smallstr | 24.8 | 48.1 | 89.7 | 121.9 | 162.0 |
| smartstring | 24.5 | 48.6 | 61.1 | 102.3 | 155.8 |
| smol_str | 25.0 | 48.3 | 61.6 | 100.7 | 166.7 |
| std | 35.8 | 70.4 | 102.9 | 128.9 | 165.5 |
Note: Columns represent string length (bytes/chars). Values represent average Resident Set Size (RSS) in bytes per string instance. Measurements taken with 10M iterations.
| Crate | 0..=4 | 0..=8 | 0..=16 | 0..=32 | 0..=64 |
|---|---|---|---|---|---|
| cold-string | 10.0 | 9.2 | 25.3 | 30.0 | 37.2 |
| compact_str | 8.8 | 10.1 | 10.0 | 14.4 | 49.4 |
| compact_string | 34.5 | 34.8 | 37.5 | 34.9 | 38.3 |
| smallstr | 8.9 | 9.4 | 23.1 | 44.9 | 32.7 |
| smartstring | 14.8 | 15.1 | 15.0 | 26.9 | 49.5 |
| smol_str | 19.2 | 19.8 | 20.1 | 23.4 | 33.7 |
| std | 28.6 | 31.4 | 34.9 | 32.0 | 33.1 |
| Crate | 4..=4 | 8..=8 | 16..=16 | 32..=32 | 64..=64 |
|---|---|---|---|---|---|
| cold-string | 6.5 | 4.2 | 34.2 | 34.3 | 36.2 |
| compact_str | 7.5 | 7.5 | 7.6 | 31.0 | 32.4 |
| compact_string | 29.2 | 28.9 | 29.2 | 29.9 | 32.3 |
| smallstr | 4.5 | 2.6 | 28.7 | 28.5 | 29.9 |
| smartstring | 14.7 | 14.8 | 8.6 | 61.6 | 63.4 |
| smol_str | 15.2 | 12.8 | 15.7 | 41.7 | 42.0 |
| std | 28.2 | 27.6 | 28.6 | 29.3 | 30.4 |
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
