Hi everyone,
I am trying to wrap my head around why creating a view of a nested StructVector allocates or not depending on the number of fields in the eltype of the StructVector.
Could you help me understand why this is the case?
Here is an MWE where the type with only two fields does not allocate while the version with three fields create an allocation:
using StructArrays
using PrettyChairmarks
struct Inner
a::Float64
b::Float64
end
abstract type Outer end
struct OuterNOK <: Outer
a::Inner
b::Inner
c::Inner
end
struct OuterOK <: Outer
a::Inner
b::Inner
end
_rand(::Type{Inner}) = Inner(rand(), rand())
_rand(T::Type{<:Outer}) = T((_rand(Inner) for _ in 1:fieldcount(T))...)
sa_fine = StructVector([_rand(OuterOK) for _ in 1:100]; unwrap = T -> !(T <: Real))
sa_allocating = StructVector([_rand(OuterNOK) for _ in 1:100]; unwrap = T -> !(T <: Real))
@bs view($sa_fine, 1:10) # This is fast an non-allocating
@bs view($sa_allocating , 1:10) # This allocates and is around 30x slower
with the following benchmark results:
versioninfo()
Julia Version 1.12.2
Commit ca9b6662be4 (2025-11-20 16:25 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × Intel(R) Xeon(R) Platinum 8462Y+
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, haswell)
GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 24 virtual cores)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_VSCODE_REPL = 1
JULIA_NUM_THREADS = 8
JULIA_EDITOR = code
@bs view($sa_fine, 1:10)
Chairmarks.Benchmark: 3364 samples with 2196 evaluations. [5/843]
Range (min … max): 12.195 ns … 54.919 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 12.210 ns ┊ GC (median): 0.00%
Time (mean ± σ): 12.519 ns ± 1.201 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
██▅▅▅▄▅▄▅▅▅▅▅▄▅▆▆▇▆▆▆▇▇▆▇▇▇▆▇▇▆▇▆▆▄▅▆▆▅▄▅▅▄▄▄▃▄▃▁▁▃▃▄▁▁▅▁▆▆ █
12.2 ns Histogram: log(frequency) by time 17 ns <
Memory estimate: 0.0 bytes, allocs estimate: 0.
@bs view($sa_allocating , 1:10)
Chairmarks.Benchmark: 1603 samples with 72 evaluations.
Range (min … max): 379.069 ns … 326.059 μs ┊ GC (min … max): 0.00% … 99.23%
Time (median): 407.847 ns ┊ GC (median): 0.00%
Time (mean ± σ): 917.265 ns ± 10.238 μs ┊ GC (mean ± σ): 0.12% ± 3.50%
▆█▇▄▂▁ ▁▂▃▂▂▂▂▂▂▁
██████▇█▅▇▅▅▅▆▅▅▇▆▅▇▇▄▁▅▄▆▅▄▄▁▁▁▁▁▁▁▁▁▁▁▁▆█████████████▇▇▅▆▆▆ █
379 ns Histogram: log(frequency) by time 1.1 μs <
Memory estimate: 864.0 bytes, allocs estimate: 6.
EDIT: This seems to happen also on windows, and also on 1.11