Whenever I’ve had to do large-scale string work in Julia, I tend to be less impressed with performance than when doing numerical work. This has produced a niggle, and that niggle as lead to some investigation, which has in turn resulted in me prototyping a rewrite of the String
type.
Performance improvements (compared to the current String
)
==
: 1x to 2.8x fasterhash
: 1x to 2.1x fastercmp
: 0.95x to 1.9x fasterstartswith
: 1.3x to 2.2x faster- dict
getindex
: 1.3x faster
Now that I (hopefully) have your attention, here’s what’s going on.
Background
This is a project that I started on Slack a few months ago, and since that conversation has fallen into the slack-hole I thought I should record this work somewhere a little more permanent. Here it is .
For starters, it’s worth realising that:
- All
String
s in Julia take a minimum of 4-words/32-bytes - Everything other than
length
involves a pointer lookup
The memory layout looks a bit like this (I had a better diagram on Slack, but it’s part of the losses — this might be a bit off but it’s got all the right bits):
There are a few alternative string forms that I’ve become aware of, optimising for space and efficiency of string comparisons. I’d like to draw your attention to:
- Small string optimisations (SSO): GitHub - elliotgoodrich/SSO-23: Memory optimal Small String Optimization implementation for C++, I Wrote A String Type · mcyoung, Inside STL: The string - The Old New Thing, Short String optimization - #29 by mcy - Rust Internals
- German strings: Why German Strings are Everywhere | CedarDB - The All-In-One-Database
I’ve ended up with an idea that seems fun (I think) that’s a bit of a hybrid of these, based around a base size of 2-words/16-bytes:
Regardless of whether we have a short or long-form string, we can look at the first 4 bytes of the string without any pointer lookups, as well as the length. This makes comparing the first word/8-bytes very informative. The downside is we limit the size of a single string to ~4 GiB, which seems like it could be an acceptable compromise. By storing the not
length in the short form, every length from zero (0xff
) to fourteen (0xf1
) is non-zero, and so works as a tag to differentiate between pointers (which malloc
aligns to the nearest 8 or 16 bytes, leaving the last byte available for an informative tag , and if need be we could make do with 4 bits instead of a whole byte).
The complexity of the dual-format structure adds some overhead, but you can do a whole bunch of basic operations in the same time as an L1 cache lookup, which made me think this would be worth actually trying.
Unlike the German strings used in databases, we duplicate the prefix in the data section, making both the short and long representation contain complete null-terminated (C-compatible) string forms.
All in all, this strikes me as potentially a rather nice compromise, potentially improving both the footprint and performance of String
-intensive tasks in Julia.
Implementation
This string type essentially takes two forms
struct InlineStringForm
data::NTuple{14, UInt8}
_null::UInt8
nlen::UInt8
end
struct HeapStringForm
prefix::NTuple{4, UInt8}
len::UInt32
ptr::Ptr{UInt8}
end
Because of the overhead of doing reinterpreting + union types, I’ve instead opted for a primative 16-byte type with logic to determine how it should be treated. This type is somewhat optimistically called String2
.
primitive type String2 <: AbstractString 128 end
We can tell if this is an inline string or not by checking the last byte of the String2
.
const INLINE_LENGTH_MASK = UInt128(0xff) << 120
isinline(s::UInt128) = !iszero(s & INLINE_LENGTH_MASK)
isinline(s::String2) = isinline(reinterpret(UInt128, s))
Without fiddling with the C side of Julia, the garbage collector has no way of knowing when there’s heap-allocated data to collect. That said, to work with this type for prototyping and benchmarking here’s a constructor that takes a current String
:
String2/String conversion + deallocation function
function String2(s::String)
if ncodeunits(s) <= 14
data = Memory{UInt8}(undef, 15)
fill!(data, 0x00)
unsafe_copyto!(pointer(data), pointer(s), ncodeunits(s))
reinterpret(String2, (NTuple{15, UInt8}(data)..., ~(length(s) % UInt8)))
else
prefix = Memory{UInt8}(undef, 4)
unsafe_copyto!(pointer(prefix), pointer(s), 4)
dataref = Libc.malloc(ncodeunits(s) + 1)
Libc.memcpy(dataref, pointer(s), ncodeunits(s) + 1)
ss_tuple = (NTuple{4, UInt8}(prefix)..., UInt32(ncodeunits(s)), UInt64(dataref))
ss = reinterpret(String2, ss_tuple)
# finalizer(deallocate, ss) # Can't attach finalizers to immutable types
ss
end
end
function deallocate(s::String2) # Can't call automatically 🥲
isinline(s) && return
_, ptr = reinterpret(Tuple{UInt64, UInt64}, s)
Libc.free(ptr)
end
function Base.String(s::String2)
if isinline(s)
data..., _, nlen = reinterpret(NTuple{16, UInt8}, s)
String(collect(data[1:~nlen]))
else
_, ptr = reinterpret(Tuple{UInt64, UInt64}, s)
unsafe_string(Ptr{UInt8}(ptr))
end
end
There’s a bit more boring work to implement the AbstractString
interface just so we can see the string in the REPL, etc.
AbstractString interface
function Base.ncodeunits(s::String2)
u = reinterpret(UInt128, s)
if isinline(s)
~(u & INLINE_LENGTH_MASK) >> 120 % Int
else
(u1 % Int64) >> 32 % Int
end
end
function Base.codeunit(s::String2, i::Int)
if isinline(s)
data..., _, nlen = reinterpret(NTuple{16, UInt8}, s)
@boundscheck i <= ~nlen || throw(BoundsError(s, i))
data[i]
else
_, len, ptr = reinterpret(Tuple{UInt32, UInt32, UInt64}, s)
@boundscheck i <= len || throw(BoundsError(s, i))
unsafe_load(Ptr{UInt8}(ptr), i)
end
end
Base.codeunit(::String2) = UInt8
Base.iterate(s::String2, i::Int=1) = iterate(String(s), i)
Base.isvalid(s::String2) = Base.isvalid(String, codeunits(s))
Base.isvalid(s::String2, i::Int) = checkbounds(Bool, s, i) && thisind(s, i) == i
Base.thisind(s::String2, i::Int) = thisind(String(s), i)
Now we get onto the fun stuff, implementing optimised performance-critical basic string methods. Checking Base.infer_effects
was wildly pessimistic, so after a read of Base.@assume_effects
I think that :total
may be appropriate here (). One or two methods probably need to be downgraded to just :foldable
, but nothing’s segfaulted so far.
Implementing equality
Base.@assume_effects :total function Base.:(==)(s1::String2, s2::String2)
u1, u2 = reinterpret(UInt128, s1), reinterpret(UInt128, s2) # For efficiency
s1a, s2a = u1 % UInt64, u2 % UInt64 # Prefix1-8 or Prefix1-4 + Length
s1a == s2a || return false # ∵ Different prefix and/or length
s1b, s2b = u1 >> 64 % UInt64, u2 >> 64 % UInt64 # Prefix9-14 + Length or Ptr
s1b == s2b && return true # ∵ Completely identical
if isinline(s1) || isinline(s2)
false # ∵ We've checked the entire inline form, if either are inline, they're not equal
else
len = s1a >> 32 # `s1a == s2a` implies identical lengths
Libc.memcmp(Ptr{UInt8}(s1b), Ptr{UInt8}(s2b), len) == 0
end
end
Implementing comparison
Base.@assume_effects :total function Base.hash(s::String2, h::UInt)
h += hash(String2)
u = reinterpret(UInt128, s)
ub = u >> 64 % UInt64
if isinline(s)
ua = u % UInt64
hash(hash(ua, h), ub) # This is faster than `hash(u, h)` for some reason
else
len = ua >> 32
h += Base.memhash_seed
ccall(Base.memhash, UInt, (Ptr{UInt8}, Csize_t, UInt32), Ptr{UInt8}(ub), len, h % UInt32) + h
end
end
Implementing cmp
Base.@assume_effects :total function Base.cmp(s1::String2, s2::String2)
u1, u2 = reinterpret(UInt128, s1), reinterpret(UInt128, s2)
pre1, pre2 = u1 % UInt32, u2 % UInt32
cmp0 = cmp(pre1, pre2)
if cmp0 != 0
cmp0
elseif isinline(s1) && isinline(s2)
cmp(u1, u2)
elseif isinline(s1)
ptr2 = u2 >> 64 % UInt64
s2pre = unsafe_load(Ptr{UInt128}(ptr2), 1) & typemax(UInt128) >> 8
cmp(u1, s2pre)
elseif isinline(s2)
ptr1 = u1 >> 64 % UInt64
s1pre = unsafe_load(Ptr{UInt128}(ptr1), 1) & typemax(UInt128) >> 8
cmp(s1pre, u2)
else
len1, len2 = (u1 % UInt64) >> 32, (u2 % UInt64) >> 32
cmp1 = Base.memcmp(Ptr{UInt8}(u1 >> 64 % UInt64), Ptr{UInt8}(u2 >> 64 % UInt64), min(len1, len2) % Csize_t)
if cmp1 != 0
sign(cmp1)
else
cmp(len1, len2)
end
end
end
Implementing startswith
Base.@assume_effects :total function Base.startswith(s::String2, prefix::String2)
us, up = reinterpret(UInt128, s), reinterpret(UInt128, prefix)
us == up && return true
# Unfortunately for readability, doing `u >> 8 << 8` seems
# to be a little faster than `u | INLINE_LENGTH_MASK`.
if isinline(us) && isinline(up) # => inline
up >> 8 << 8 == (us & up) >> 8 << 8
elseif isinline(up)
usp = unsafe_load(Ptr{UInt128}(us >> 64 % UInt64), 1)
up >> 8 << 8 == (usp & up) >> 8 << 8
else
slen, plen = (us % UInt64) >> 32, (up % UInt64) >> 32
slen >= plen || return false
sptr, pptr = us >> 64 % UInt64, up >> 64 % UInt64
Base.memcmp(Ptr{UInt8}(sptr), Ptr{UInt8}(pptr), plen) == 0
end
end
I’ve also managed to get efficient concatination of short-form strings working nicely (it’s essentially an or + bitshift), but long/mixed cases are a bit more annoying.
Implementing 2-arg concat (WIP/more optimisation needed)
Base.@assume_effects :total function Base.:*(s1::String2, s2::String2)
u1, u2 = reinterpret(UInt128, s1), reinterpret(UInt128, s2)
len1 = if isinline(s1) ~(u1 >> 120 % UInt8) else (u1 % UInt64) >> 32 end
len2 = if isinline(s2) ~(u2 >> 120 % UInt8) else (u2 % UInt64) >> 32 end
newlen = len1 + len2
if newlen <= 14
newdata = u1 | (u2 << (8 * len1)) # Concatenate the data
newdata = newdata & ~INLINE_LENGTH_MASK # Clear length
newdata = newdata | (~newlen % UInt128) << 120 # Apply new length
reinterpret(String2, newdata)
else
concat_large(u1, len1 % UInt64, u2, len2 % UInt64, newlen % UInt64)
end
end
function concat_large(u1::UInt128, len1::UInt64, u2::UInt128, len2::UInt64, newlen::UInt64)
newdata = Libc.malloc(newlen + 1)
s1a = u1 % UInt64
s1b, s2b = u1 >> 64 % UInt64, u2 >> 64 % UInt64
if isinline(u1)
unsafe_store!(Ptr{UInt64}(newdata), s1a, 1)
unsafe_store!(Ptr{UInt64}(newdata), s1b, 2)
else
Libc.memcpy(Ptr{UInt8}(newdata), Ptr{UInt8}(s1b), len1)
end
if isinline(u2)
for b in 1:len2
unsafe_store!(Ptr{UInt8}(newdata), (u2 >> (8 * (b - 1))) % UInt8, len1 + b)
end
else
Libc.memcpy(Ptr{UInt8}(newdata + len1), Ptr{UInt8}(s2b), len2 + 1)
end
ss_tuple = (unsafe_load(Ptr{UInt32}(newdata), 1),
newlen % UInt32, newdata)
ss = reinterpret(String2, ss_tuple)
# finalizer(deallocate, ss)
ss
end
For a serious/complete reimplementation of String
, some fiddling around on the C side of Julia (in particular with GC) is needed. For now, I’ve left this alone and focused on seeing if the fundamental design seemed promising just writing Julia code.
Simple benchmarking
Current implementation, all in one place
# An experimental short string type for Julia
# This is a proof of concept for a short string type in Julia. The idea is to
# have a string type that can store short strings inline, without heap
# allocation. This is useful for strings that are often short, such as keys in
# dictionaries or identifiers.
#
# This is a hybrid of C++'s small string optimisation, and a German string.
# On little endian systems, we can use a tagged pointer for the length of
# the string, and store the data inline. This allows us to store strings up to
# 14 bytes inline, without heap allocation. For longer strings, we fall back to
# a heap allocated string and make use of the extra space after the pointer
# to store the first 8 bytes of the string.
#
# To handle the two representations, we define an opaque (primitive) type
# that we will reinterpret to the appropriate form as needed.
#
# This is just a proof of concept, and needs adjustment to also work on
# big endian systems. GC is also broken for this type, as it doesn't know
# to deallocate the heap allocated strings.
primitive type String2 <: AbstractString 128 end
#=
struct InlineStringForm
data::NTuple{14, UInt8}
_null::UInt8
nlen::UInt8
end
struct HeapStringForm
prefix::NTuple{4, UInt8}
len::UInt32
ptr::Ptr{UInt8}
end
=#
const INLINE_LENGTH_MASK = UInt128(0xff) << 120
# Really, the only thing that might happen here is memory
# allocation failure. That's basically `:total`, right?
Base.@assume_effects :total function String2(s::String)
if ncodeunits(s) <= 14
data = Memory{UInt8}(undef, 15)
fill!(data, 0x00)
unsafe_copyto!(pointer(data), pointer(s), ncodeunits(s))
reinterpret(String2, (NTuple{15, UInt8}(data)..., ~(length(s) % UInt8)))
else
prefix = Memory{UInt8}(undef, 4)
unsafe_copyto!(pointer(prefix), pointer(s), 4)
dataref = Libc.malloc(ncodeunits(s) + 1)
Libc.memcpy(dataref, pointer(s), ncodeunits(s) + 1)
ss_tuple = (NTuple{4, UInt8}(prefix)..., UInt32(ncodeunits(s)), UInt64(dataref))
ss = reinterpret(String2, ss_tuple)
# finalizer(deallocate, ss) # Can't attach finalizers to immutable types
ss
end
end
function deallocate(s::String2) # Can't call automatically 🥲
isinline(s) && return
_, ptr = reinterpret(Tuple{UInt64, UInt64}, s)
Libc.free(ptr)
end
isinline(s::UInt128) = !iszero(s & INLINE_LENGTH_MASK)
isinline(s::String2) = isinline(reinterpret(UInt128, s))
Base.@assume_effects :total function Base.String(s::String2)
if isinline(s)
data..., _, nlen = reinterpret(NTuple{16, UInt8}, s)
String(collect(data[1:~nlen]))
else
_, ptr = reinterpret(Tuple{UInt64, UInt64}, s)
unsafe_string(Ptr{UInt8}(ptr))
end
end
Base.@assume_effects :total @inline function Base.ncodeunits(s::String2)
u = reinterpret(UInt128, s)
if isinline(s)
~(u & INLINE_LENGTH_MASK) >> 120 % Int
else
(u1 % Int64) >> 32 % Int
end
end
Base.@assume_effects :total function Base.codeunit(s::String2, i::Int)
if isinline(s)
data..., _, nlen = reinterpret(NTuple{16, UInt8}, s)
@boundscheck i <= ~nlen || throw(BoundsError(s, i))
data[i]
else
_, len, ptr = reinterpret(Tuple{UInt32, UInt32, UInt64}, s)
@boundscheck i <= len || throw(BoundsError(s, i))
unsafe_load(Ptr{UInt8}(ptr), i)
end
end
Base.codeunit(::String2) = UInt8
Base.iterate(s::String2, i::Int=1) = iterate(String(s), i)
Base.isvalid(s::String2) = Base.isvalid(String, codeunits(s))
Base.isvalid(s::String2, i::Int) = checkbounds(Bool, s, i) && thisind(s, i) == i
Base.thisind(s::String2, i::Int) = thisind(String(s), i)
# Here I implement faster versions of basic string operations
# to compare with the built-in string type.
Base.@assume_effects :total function Base.:(==)(s1::String2, s2::String2)
u1, u2 = reinterpret(UInt128, s1), reinterpret(UInt128, s2) # For efficiency
s1a, s2a = u1 % UInt64, u2 % UInt64 # Prefix1-8 or Prefix1-4 + Length
s1a == s2a || return false # ∵ Different prefix and/or length
s1b, s2b = u1 >> 64 % UInt64, u2 >> 64 % UInt64 # Prefix9-14 + Length or Ptr
s1b == s2b && return true # ∵ Completely identical
if isinline(s1) || isinline(s2)
false # ∵ We've checked the entire inline form, if either are inline, they're not equal
else
len = s1a >> 32 # `s1a == s2a` implies identical lengths
Libc.memcmp(Ptr{UInt8}(s1b), Ptr{UInt8}(s2b), len) == 0
end
end
Base.@assume_effects :total function Base.hash(s::String2, h::UInt)
h += hash(String2)
u = reinterpret(UInt128, s)
ub = u >> 64 % UInt64
if isinline(s)
ua = u % UInt64
hash(hash(ua, h), ub) # This is faster than `hash(u, h)` for some reason
else
len = ua >> 32
h += Base.memhash_seed
ccall(Base.memhash, UInt, (Ptr{UInt8}, Csize_t, UInt32), Ptr{UInt8}(ub), len, h % UInt32) + h
end
end
Base.@assume_effects :total function Base.cmp(s1::String2, s2::String2)
u1, u2 = reinterpret(UInt128, s1), reinterpret(UInt128, s2)
pre1, pre2 = u1 % UInt32, u2 % UInt32
cmp0 = cmp(pre1, pre2)
if cmp0 != 0
cmp0
elseif isinline(s1) && isinline(s2)
cmp(u1, u2)
elseif isinline(s1)
ptr2 = u2 >> 64 % UInt64
s2pre = unsafe_load(Ptr{UInt128}(ptr2), 1) & typemax(UInt128) >> 8
cmp(u1, s2pre)
elseif isinline(s2)
ptr1 = u1 >> 64 % UInt64
s1pre = unsafe_load(Ptr{UInt128}(ptr1), 1) & typemax(UInt128) >> 8
cmp(s1pre, u2)
else
len1, len2 = (u1 % UInt64) >> 32, (u2 % UInt64) >> 32
cmp1 = Base.memcmp(Ptr{UInt8}(u1 >> 64 % UInt64), Ptr{UInt8}(u2 >> 64 % UInt64), min(len1, len2) % Csize_t)
if cmp1 != 0
sign(cmp1)
else
cmp(len1, len2)
end
end
end
Base.@assume_effects :total function Base.startswith(s::String2, prefix::String2)
us, up = reinterpret(UInt128, s), reinterpret(UInt128, prefix)
us == up && return true
# Unfortunately for readability, doing `u >> 8 << 8` seems
# to be a little faster than `u | INLINE_LENGTH_MASK`.
if isinline(us) && isinline(up) # => inline
up >> 8 << 8 == (us & up) >> 8 << 8
elseif isinline(up)
usp = unsafe_load(Ptr{UInt128}(us >> 64 % UInt64), 1)
up >> 8 << 8 == (usp & up) >> 8 << 8
else
slen, plen = (us % UInt64) >> 32, (up % UInt64) >> 32
slen >= plen || return false
sptr, pptr = us >> 64 % UInt64, up >> 64 % UInt64
Base.memcmp(Ptr{UInt8}(sptr), Ptr{UInt8}(pptr), plen) == 0
end
end
# More work-in-progress methods
Base.@assume_effects :total function Base.:*(s1::String2, s2::String2)
u1, u2 = reinterpret(UInt128, s1), reinterpret(UInt128, s2)
len1 = if isinline(s1) ~(u1 >> 120 % UInt8) else (u1 % UInt64) >> 32 end
len2 = if isinline(s2) ~(u2 >> 120 % UInt8) else (u2 % UInt64) >> 32 end
newlen = len1 + len2
if newlen <= 14
newdata = u1 | (u2 << (8 * len1)) # Concatenate the data
newdata = newdata & ~INLINE_LENGTH_MASK # Clear length
newdata = newdata | (~newlen % UInt128) << 120 # Apply new length
reinterpret(String2, newdata)
else
concat_large(u1, len1 % UInt64, u2, len2 % UInt64, newlen % UInt64)
end
end
function concat_large(u1::UInt128, len1::UInt64, u2::UInt128, len2::UInt64, newlen::UInt64)
newdata = Libc.malloc(newlen + 1)
s1a = u1 % UInt64
s1b, s2b = u1 >> 64 % UInt64, u2 >> 64 % UInt64
if isinline(u1)
unsafe_store!(Ptr{UInt64}(newdata), s1a, 1)
unsafe_store!(Ptr{UInt64}(newdata), s1b, 2)
else
Libc.memcpy(Ptr{UInt8}(newdata), Ptr{UInt8}(s1b), len1)
end
if isinline(u2)
for b in 1:len2
unsafe_store!(Ptr{UInt8}(newdata), (u2 >> (8 * (b - 1))) % UInt8, len1 + b)
end
else
Libc.memcpy(Ptr{UInt8}(newdata + len1), Ptr{UInt8}(s2b), len2 + 1)
end
ss_tuple = (unsafe_load(Ptr{UInt32}(newdata), 1),
newlen % UInt32, newdata)
ss = reinterpret(String2, ss_tuple)
# finalizer(deallocate, ss)
ss
end
function Base.sizeof(s::String2)
if isinline(s)
sizeof(typeof(s))
else
_, len, _ = reinterpret(Tuple{UInt32, UInt32, UInt64}, s)
sizeof(typeof(s)) + len + 1
end
end
Base.summarysize(s::String2) = sizeof(s)
REPL copypasta
julia> using Chairmarks
julia> s1to6 = s1, s2, s3, s4, s5, s6 = "hey", "hey", "woah", "a long string now", "another long string", "another long string"
("hey", "hey", "woah", "a long string now", "another long string", "another long string")
julia> ss1to6 = ss1, ss2, ss3, ss4, ss5, ss6 = map(String2, s1to6)
("hey", "hey", "woah", "a long string now", "another long string", "another long string")
julia> [(a, b) for (a, b) in Iterators.product(1:6, 1:6)]
6×6 Matrix{Tuple{Int64, Int64}}:
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
julia> [@b a == b for (a, b) in Iterators.product(s1to6, s1to6)] # Base String
6×6 Matrix{Chairmarks.Sample}:
1.571 ns 4.325 ns 2.947 ns 2.948 ns 2.949 ns 2.948 ns
4.323 ns 1.571 ns 2.947 ns 2.948 ns 2.947 ns 2.947 ns
2.947 ns 2.947 ns 1.570 ns 2.948 ns 2.949 ns 2.947 ns
2.948 ns 2.947 ns 2.947 ns 1.571 ns 2.948 ns 2.949 ns
2.947 ns 2.948 ns 2.948 ns 2.948 ns 1.571 ns 4.337 ns
2.947 ns 2.950 ns 2.946 ns 2.951 ns 4.324 ns 1.571 ns
julia> [@b a == b for (a, b) in Iterators.product(ss1to6, ss1to6)] # My String
6×6 Matrix{Chairmarks.Sample}:
1.571 ns 1.571 ns 1.571 ns 1.571 ns 1.571 ns 1.570 ns
1.571 ns 1.571 ns 1.570 ns 1.570 ns 1.570 ns 1.570 ns
1.570 ns 1.570 ns 1.571 ns 1.571 ns 1.571 ns 1.570 ns
1.571 ns 1.571 ns 1.570 ns 1.571 ns 1.570 ns 1.570 ns
1.571 ns 1.571 ns 1.570 ns 1.571 ns 1.571 ns 2.357 ns
1.570 ns 1.571 ns 1.570 ns 1.571 ns 2.357 ns 1.571 ns
julia> [@b hash(s) for s in s1to6]
6-element Vector{Chairmarks.Sample}:
4.518 ns
4.370 ns
4.322 ns
3.994 ns
4.390 ns
4.399 ns
julia> [@b hash(s) for s in ss1to6]
6-element Vector{Chairmarks.Sample}:
2.143 ns
2.152 ns
2.144 ns
4.104 ns
4.428 ns
4.422 ns
julia> [@b cmp(a, b) for (a, b) in Iterators.product(s1to6, s1to6)]
6×6 Matrix{Chairmarks.Sample}:
2.750 ns 2.946 ns 2.946 ns 2.946 ns 2.947 ns 2.946 ns
2.946 ns 2.750 ns 2.946 ns 2.946 ns 2.946 ns 2.946 ns
2.946 ns 2.947 ns 2.750 ns 2.946 ns 2.946 ns 2.946 ns
2.946 ns 2.946 ns 2.946 ns 2.750 ns 2.946 ns 2.946 ns
2.946 ns 2.912 ns 2.946 ns 2.916 ns 2.750 ns 2.946 ns
2.946 ns 2.946 ns 2.752 ns 2.947 ns 2.946 ns 2.750 ns
julia> [@b cmp(a, b) for (a, b) in Iterators.product(ss1to6, ss1to6)]
6×6 Matrix{Chairmarks.Sample}:
2.160 ns 2.160 ns 1.571 ns 1.571 ns 1.571 ns 1.571 ns
2.160 ns 2.160 ns 1.571 ns 1.571 ns 1.571 ns 1.571 ns
1.571 ns 1.571 ns 2.160 ns 1.571 ns 1.571 ns 1.571 ns
1.571 ns 1.571 ns 1.571 ns 2.946 ns 1.571 ns 1.571 ns
1.571 ns 1.571 ns 1.571 ns 1.571 ns 2.947 ns 2.947 ns
1.571 ns 1.571 ns 1.571 ns 1.571 ns 2.947 ns 2.947 ns
julia> [@b startswith(a, b) for (a, b) in Iterators.product(s1to6, s1to6)]
6×6 Matrix{Chairmarks.Sample}:
3.536 ns 3.536 ns 2.356 ns 2.356 ns 2.356 ns 2.356 ns
3.537 ns 3.536 ns 2.356 ns 2.356 ns 2.356 ns 2.356 ns
3.536 ns 3.536 ns 3.535 ns 2.356 ns 2.356 ns 2.356 ns
3.538 ns 3.536 ns 3.536 ns 3.536 ns 2.356 ns 2.356 ns
3.537 ns 3.536 ns 3.537 ns 3.536 ns 3.536 ns 3.536 ns
3.536 ns 3.537 ns 3.538 ns 3.536 ns 3.536 ns 3.536 ns
julia> [@b startswith(a, b) for (a, b) in Iterators.product(ss1to6, ss1to6)]
6×6 Matrix{Chairmarks.Sample}:
1.571 ns 1.571 ns 1.571 ns 1.571 ns 1.573 ns 1.572 ns
1.571 ns 1.571 ns 1.575 ns 1.572 ns 1.571 ns 1.572 ns
1.574 ns 1.571 ns 1.571 ns 1.572 ns 1.571 ns 1.572 ns
1.571 ns 1.571 ns 1.571 ns 1.571 ns 1.767 ns 1.767 ns
1.571 ns 1.571 ns 1.571 ns 2.750 ns 1.571 ns 2.750 ns
1.571 ns 1.571 ns 1.571 ns 2.750 ns 2.750 ns 1.571 ns
julia> dS = Dict(s1 => 1)
Dict{String, Int64} with 1 entry:
"hey" => 1
julia> dsS = Dict(ss1 => 1)
Dict{String2, Int64} with 1 entry:
"hey" => 1
julia> @b $dS[$s1]
6.314 ns
julia> @b $dsS[$ss1]
5.216 ns
Concluding comments
This isn’t a complete drop-in-replacement for String
yet, but I think this demonstrates the potential of an alternate string layout in Julia, and hopefully can provoke some collaboration/help to improve this prototype/proposal, do more extensive benchmarking, and if it still looks good start work on an actual PR implementing this (eventually).