Understanding allocations when creating view of nested StructVector

Hi everyone,

I am trying to wrap my head around why creating a view of a nested StructVector allocates or not depending on the number of fields in the eltype of the StructVector.

Could you help me understand why this is the case?

Here is an MWE where the type with only two fields does not allocate while the version with three fields create an allocation:

using StructArrays
using PrettyChairmarks

struct Inner
	a::Float64
	b::Float64
end

abstract type Outer end

struct OuterNOK <: Outer
	a::Inner
	b::Inner
	c::Inner
end

struct OuterOK <: Outer
	a::Inner
	b::Inner
end

_rand(::Type{Inner}) = Inner(rand(), rand())
_rand(T::Type{<:Outer}) = T((_rand(Inner) for _ in 1:fieldcount(T))...)

sa_fine = StructVector([_rand(OuterOK) for _ in 1:100]; unwrap = T -> !(T <: Real))
sa_allocating = StructVector([_rand(OuterNOK) for _ in 1:100]; unwrap = T -> !(T <: Real))

@bs view($sa_fine, 1:10)  # This is fast an non-allocating

@bs view($sa_allocating , 1:10) # This allocates and is around 30x slower

with the following benchmark results:

versioninfo()
Julia Version 1.12.2
Commit ca9b6662be4 (2025-11-20 16:25 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × Intel(R) Xeon(R) Platinum 8462Y+
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, haswell)
  GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 24 virtual cores)
Environment:
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_VSCODE_REPL = 1
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code


@bs view($sa_fine, 1:10)
Chairmarks.Benchmark: 3364 samples with 2196 evaluations.                                                                                                                                                                [5/843]
 Range (min … max):  12.195 ns … 54.919 ns  ┊ GC (min … max): 0.00% … 0.00%                                                                                                                                                     
 Time  (median):     12.210 ns              ┊ GC (median):    0.00%                                                                                                                                                             
 Time  (mean ± σ):   12.519 ns ±  1.201 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%                                                                                                                                                     
                                                                                                                                                                                                                                
  █                                                                                                                                                                                                                             
  ██▅▅▅▄▅▄▅▅▅▅▅▄▅▆▆▇▆▆▆▇▇▆▇▇▇▆▇▇▆▇▆▆▄▅▆▆▅▄▅▅▄▄▄▃▄▃▁▁▃▃▄▁▁▅▁▆▆ █
  12.2 ns      Histogram: log(frequency) by time        17 ns <

 Memory estimate: 0.0 bytes, allocs estimate: 0.

@bs view($sa_allocating , 1:10)
Chairmarks.Benchmark: 1603 samples with 72 evaluations.
 Range (min … max):  379.069 ns … 326.059 μs  ┊ GC (min … max): 0.00% … 99.23%
 Time  (median):     407.847 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   917.265 ns ±  10.238 μs  ┊ GC (mean ± σ):  0.12% ±  3.50%

  ▆█▇▄▂▁                                    ▁▂▃▂▂▂▂▂▂▁           
  ██████▇█▅▇▅▅▅▆▅▅▇▆▅▇▇▄▁▅▄▆▅▄▄▁▁▁▁▁▁▁▁▁▁▁▁▆█████████████▇▇▅▆▆▆ █
  379 ns        Histogram: log(frequency) by time        1.1 μs <

 Memory estimate: 864.0 bytes, allocs estimate: 6.

EDIT: This seems to happen also on windows, and also on 1.11

I tried debugging a bit more, and it seems that the cause of allocation is the call to checkbounds in StructArray’s view method here:

Definining a custom view to avoid checkbounds and @inbounds brings back speed and lack of allocations

@inline _view(x, I...) = view(x, I...)
@inline function _view(s::StructArray{T, N, C}, I...) where {T, N, C}
    StructArray{T}(map(v -> _view(v, I...), components(s)))
end

@bs _view($sa_allocating, 1:10)
Chairmarks.Benchmark: 3423 samples with 1900 evaluations.
 Range (min … max):  13.981 ns … 34.591 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.991 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.273 ns ±  1.013 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                            
  ██▅▆▃▄▃▄▄▅▁▁▆▃▄▃▅▅▅▄▄▄▅▆▅▅▆▅▆▆▆▅▆▇▅▆▆▆▅▆▅▅▅▆▆▇▄▅▅▅▁▄▃▅▄▄▃▅▅ █
  14 ns        Histogram: log(frequency) by time      18.5 ns <

 Memory estimate: 0.0 bytes, allocs estimate: 0.


@bs _view($sa_fine, 1:10)
Chairmarks.Benchmark: 3254 samples with 2219 evaluations.
 Range (min … max):  12.189 ns … 32.763 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.208 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.821 ns ±  1.528 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                            
  █▆▆▄▅▄▅▆▅▄▅▅▅▅▆▇▇▇▇▇▇▇▆▆▆▆▇▇▆▆▅▅▄▆▅▅▅▅▆▇▆▆▆▆▅▅▃▃▅▃▅▄▄▄▄▂▃▆▇ █
  12.2 ns      Histogram: log(frequency) by time      18.5 ns <

 Memory estimate: 0.0 bytes, allocs estimate: 0.

EDIT: on 1.10 this seems to allocate regardless, so on both Outer structs and for both plain Base.view and the custom _view function