[ANN] FixedSizeArrays.jl: What Array probably should have been

nsajko · June 11, 2025, 3:38pm

Is it possible you introduced some type instability while switching to FixedSizeArrays.jl? Perhaps in some branch somewhere you return something like T[], which is not a FixedSizeVector? That could perhaps explain the increase in allocated memory.

If not that, perhaps it’s the issue pointed out by @foobar_lv2, where object identity would sometimes be preferable to value identity for the deduplicating effect.

Oscar_Smith · June 11, 2025, 3:43pm

was special-cased in the compiler to be (effectively?) immutable.

This isn’t quite true. IIUC the actual “size” field was known by the compiler not to change, but the backing was still mutable (preventing optimization). Also as of 1.11, the compiler dropped this optimization since Array is just a normal Julia object.

giordano · June 11, 2025, 3:45pm

julia> using FixedSizeArrays

julia> @noinline f(A::AbstractMatrix) = length(A)
f (generic function with 1 method)

julia> g() = f(FixedSizeMatrixDefault{Float64}(undef, 3, 3))
g (generic function with 1 method)

julia> h() = f(Matrix{Float64}(undef, 3, 3))
h (generic function with 1 method)

julia> code_llvm(g)

; Function Signature: g()
;  @ REPL[92]:1 within `g`
define i64 @julia_g_9806() local_unnamed_addr #0 {
top:
;  @ REPL[92] within `g`
  ret i64 9
}

julia> code_llvm(h)

; Function Signature: h()
;  @ REPL[91]:1 within `h`
define i64 @julia_h_9808() local_unnamed_addr #0 {
top:
  %gcframe1 = alloca [3 x ptr], align 16
  call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true)
  %pgcstack = call ptr inttoptr (i64 4377493260 to ptr)(i64 4377493296) #12
  store i64 4, ptr %gcframe1, align 8
  %task.gcstack = load ptr, ptr %pgcstack, align 8
  %frame.prev = getelementptr inbounds nuw i8, ptr %gcframe1, i64 8
  store ptr %task.gcstack, ptr %frame.prev, align 8
  store ptr %gcframe1, ptr %pgcstack, align 8
; ┌ @ boot.jl:651 within `Array`
; │┌ @ boot.jl:604 within `new_as_memoryref`
; ││┌ @ boot.jl:588 within `GenericMemory`
     %ptls_field = getelementptr inbounds nuw i8, ptr %pgcstack, i64 16
     %ptls_load = load ptr, ptr %ptls_field, align 8
     %"Memory{Float64}[]" = call noalias nonnull align 8 dereferenceable(96) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 664, i32 96, i64 4761282016) #7
     %"Memory{Float64}[].tag_addr" = getelementptr inbounds i8, ptr %"Memory{Float64}[]", i64 -8
     store atomic i64 4761282016, ptr %"Memory{Float64}[].tag_addr" unordered, align 8
     %memory_ptr = getelementptr inbounds nuw i8, ptr %"Memory{Float64}[]", i64 8
     %memory_data = getelementptr inbounds nuw i8, ptr %"Memory{Float64}[]", i64 16
     store ptr %memory_data, ptr %memory_ptr, align 8
     store i64 9, ptr %"Memory{Float64}[]", align 8
     %gc_slot_addr_0 = getelementptr inbounds nuw i8, ptr %gcframe1, i64 16
     store ptr %"Memory{Float64}[]", ptr %gc_slot_addr_0, align 8
; │└└
   %ptls_load11 = load ptr, ptr %ptls_field, align 8
   %"new::Array" = call noalias nonnull align 8 dereferenceable(48) ptr @ijl_gc_small_alloc(ptr %ptls_load11, i32 520, i32 48, i64 4761248240) #7
   %"new::Array.tag_addr" = getelementptr inbounds i8, ptr %"new::Array", i64 -8
   store atomic i64 4761248240, ptr %"new::Array.tag_addr" unordered, align 8
   %0 = getelementptr inbounds nuw i8, ptr %"new::Array", i64 8
   store ptr %memory_data, ptr %"new::Array", align 8
   store ptr %"Memory{Float64}[]", ptr %0, align 8
   %"new::Array.size_ptr" = getelementptr inbounds nuw i8, ptr %"new::Array", i64 16
   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %"new::Array.size_ptr", ptr noundef nonnull align 8 dereferenceable(16) @"_j_const#1", i64 16, i1 false)
   store ptr %"new::Array", ptr %gc_slot_addr_0, align 8
; └
  %1 = call i64 @j_f_9812(ptr nonnull %"new::Array")
  %frame.prev14 = load ptr, ptr %frame.prev, align 8
  store ptr %frame.prev14, ptr %pgcstack, align 8
  ret i64 %1
}

Hence why the title of the JuliaCon talk My personal hope is that at some point we’ll be able to have a fixed-size array in Base because it’s so much useful in numerical applications, of course the question that this brings is how to handle two different array types in Base

araujoms · June 11, 2025, 7:31pm

Cthulhu tells me there was a type instability in my code unrelated to FArray. Once I fixed that both the number of allocations and the total amount of allocated memory go up with FArray w.r.t plain Array =(

matthias314 · June 11, 2025, 7:43pm

I’ve added MutableSmallVector to my benchmarks above.

I think that FixedSizeArrays is a great package, and that the title of this thread sounds exactly right. However, looking at the benchmarks (which do not cover all sizes and element types!), I’m asking myself where, say, FixedSizeVector really shines in the current ecosystem. If you only need a mutable, indexable container, then it’s indeed a lightweight solution.

If you want to have algebraic operations, then for large vectors the difference to Vector seems negligible. For small vectors whose size is known in advance, MVector is faster (and so is MutableFixedVector). If the size unknown, but with a small upper bound, then MutableSmallVector appears to be the better choice. (MVector and Mutable(Fixed|Small)Vector need isbits elements, but that is usually the case.)

For which sizes and/or element types can FixedSizeVector play out its strengths? Or would it be for higher-dimensional arrays?

giordano · June 11, 2025, 9:42pm

In my mind the main competitor of FixedSizeArray is only Base.Array, in the role of a general purpose container. Specialised containers like MArray are different beasts: they can be useful when you want to do something smart in very specific cases (e.g. dispatching on the size, or when you know you only deal with very specific, small sizes), but they may also come at the cost of longer compilation latency.

Performance difference with Base.Array is probably negligible for microbenchmarks, but the benefit is in enabling compiler optimisations which are just impossible with the base type: improving effect inference of a inner function (if for example it doesn’t throw errors anymore) can have positive cascade effects in a larger program as a whole.

I expect that it’d take some time for the ecosystem to digest a new container array type like FixedSizeArray. Also, FixedSizeArrays.jl is by no means perfect: we had found some small missed optimisations in Julia itself along the way like inbounds propagation mostly missing in base/genericmemory.jl? · Issue #56145 · JuliaLang/julia · GitHub, Memory stores aren't vectorised in a `for` loop unless explicit at-inbounds is used · Issue #70 · JuliaArrays/FixedSizeArrays.jl · GitHub, so other small issues like that are possible, but ideally fixing them would have a positive impact on the whole ecosystem (it’s usually a matter of making Memory work efficiently).

cshen · June 12, 2025, 8:19am

How does this compare with SizedArrays from StaticArrays.jl? IIRC SizedArrays are also backed by normal memory and the only notable difference I can see is that the size information is stored as value instead of type parameter.

Benny · June 12, 2025, 9:09am

Besides the size being a type parameter versus an instance value, SizedArray wraps any AbstractArray, while FixedSizeArray is a DenseArray that wraps a DenseVector (default to Memory). For example, SizedArray can wrap a StepRange, but a FixedSizeArray can’t (but the constructor may copy the elements of an input AbstractArray into a Memory to wrap). Those two differences alone make the use cases fairly different, and there isn’t a straightforward answer to which is faster. The static size of SizedArray may leverage some optimized StaticArrays code, but it could wrap an AbstractArray where getindex takes 2 minutes each call.

foobar_lv2 · June 12, 2025, 10:15am

I think the issue that this package is addressing is fundamentally the following:

julia> foo()=Base.llvmcall("""call void asm ";", "~{memory}"()
       ret void""", Tuple{} , Nothing)

julia> function bar(x)
       @inbounds a=x[1]
       foo()
       @inbounds b=x[1]
       a+b
       end

julia> function bar2(x)
       a=x.x
       foo()
       b=x.x
       a+b
       end
julia> mutable struct XM x::Int end

julia> @code_llvm bar2(XM(1))
; Function Signature: bar2(Main.XM)
;  @ REPL[52]:1 within `bar2`
define i64 @julia_bar2_5182(ptr noundef nonnull align 8 dereferenceable(8) %"x::XM") local_unnamed_addr #0 {
top:
;  @ REPL[52]:2 within `bar2`
; ┌ @ Base_compiler.jl:54 within `getproperty`
   %"x::XM.x" = load i64, ptr %"x::XM", align 8
; └
;  @ REPL[52]:3 within `bar2`
; ┌ @ REPL[50]:1 within `foo`
   call void asm ";", "~{memory}"() #4
; └
;  @ REPL[52]:4 within `bar2`
; ┌ @ Base_compiler.jl:54 within `getproperty`
   %"x::XM.x1" = load i64, ptr %"x::XM", align 8
; └
;  @ REPL[52]:5 within `bar2`
; ┌ @ int.jl:87 within `+`
   %0 = add i64 %"x::XM.x1", %"x::XM.x"
   ret i64 %0

julia> mutable struct XI const x::Int end

julia> @code_llvm bar2(XI(1))
; Function Signature: bar2(Main.XI)
;  @ REPL[52]:1 within `bar2`
define i64 @julia_bar2_5188(ptr noundef nonnull align 8 dereferenceable(8) %"x::XI") local_unnamed_addr #0 {
top:
;  @ REPL[52]:3 within `bar2`
; ┌ @ REPL[50]:1 within `foo`
   call void asm ";", "~{memory}"() #4
; └
;  @ REPL[52]:5 within `bar2`
; ┌ @ int.jl:87 within `+`
   %.unbox = load i64, ptr %"x::XI", align 8
   %0 = shl i64 %.unbox, 1
   ret i64 %0
; └
}

We see that the const allows us to avoid reloading. Now,

julia> @code_llvm bar([1])
; Function Signature: bar(Array{Int64, 1})
;  @ REPL[51]:1 within `bar`
define i64 @julia_bar_5191(ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") local_unnamed_addr #0 {
top:
;  @ REPL[51]:2 within `bar`
; ┌ @ essentials.jl:953 within `getindex`
   %memoryref_data = load ptr, ptr %"x::Array", align 8
   %0 = load i64, ptr %memoryref_data, align 8
; └
;  @ REPL[51]:3 within `bar`
; ┌ @ REPL[50]:1 within `foo`
   call void asm ";", "~{memory}"() #5
; └
;  @ REPL[51]:4 within `bar`
; ┌ @ essentials.jl:953 within `getindex`
   %memoryref_data6 = load ptr, ptr %"x::Array", align 8
   %1 = load i64, ptr %memoryref_data6, align 8
; └
;  @ REPL[51]:5 within `bar`
; ┌ @ int.jl:87 within `+`
   %2 = add i64 %1, %0
   ret i64 %2
; └
}
julia> @code_llvm bar(zeros(Int, 1, 1))
; Function Signature: bar(Array{Int64, 2})
;  @ REPL[51]:1 within `bar`
define i64 @julia_bar_5195(ptr noundef nonnull align 8 dereferenceable(32) %"x::Array") local_unnamed_addr #0 {
top:
;  @ REPL[51]:2 within `bar`
; ┌ @ essentials.jl:953 within `getindex`
   %memoryref_data = load ptr, ptr %"x::Array", align 8
   %0 = load i64, ptr %memoryref_data, align 8
; └
;  @ REPL[51]:3 within `bar`
; ┌ @ REPL[50]:1 within `foo`
   call void asm ";", "~{memory}"() #5
; └
;  @ REPL[51]:4 within `bar`
; ┌ @ essentials.jl:953 within `getindex`
   %memoryref_data5 = load ptr, ptr %"x::Array", align 8
   %1 = load i64, ptr %memoryref_data5, align 8
; └
;  @ REPL[51]:5 within `bar`
; ┌ @ int.jl:87 within `+`
   %2 = add i64 %1, %0
   ret i64 %2
; └
}

That is a missed optimization! Julia could “simply” specialize code emission such that non-Vector Arrays have constant MemoryRef and size. If that optimization was still present today, then we would be able to work around Vector’s resizeability by using n x 1 matrices. (the load %memoryref_data5 = load ptr, ptr %"x::Array", align 8 for the Matrix case is the redundant one – the contents of the array are not constant and must be reloaded since we clobbered memory)

(main useful point here is the code snippet with llvmcall asm memory clobber to test optimizations)

JeffreySarnoff · June 13, 2025, 12:34pm

Is it possible to use the e.g. 64-byte aligned memory provided by AlignedAllocs.jl as the memory for a FixedSizeVector?

giordano · June 13, 2025, 12:42pm

The memory backend can be any DenseVector:

github.com/JuliaArrays/FixedSizeArrays.jl

src/FixedSizeArray.jl

35fb27aac


      
          struct FixedSizeArray{T,N,Mem<:DenseVector{T}} <: DenseArray{T,N}
              mem::Mem

JeffreySarnoff · June 13, 2025, 1:01pm

what am I missing? or what do I need to define/specialize for this to work?

using AlignedAllocs, FixedSizeArrays
alignedmem = memalign_clear(Float32, 8)
alignedmem[:] = 1.0f0:8.0f0
alignment(alignedmem) == 64 # or higher power of 2

fixedmem = FixedSizeVector{Float32, typeof(alignedmem)}(alignedmem)
alignment(fixedmem) == 32
alignment(fixedmem.mem) == 32
# and
alignment(FixedSizeVector{Float32}(alignedmem)) == 32

giordano · June 14, 2025, 5:06pm

Sorry, I’m not very familiar with AlignedAllocs, maybe let’s move this discussion to another thread?

JeffreySarnoff · June 14, 2025, 5:39pm

done
split to this topic

stevengj · June 14, 2025, 8:26pm

A post was merged into an existing topic: FixedSizeVectors with cache-aligned memory?

giordano · June 23, 2025, 11:30pm

While looking at an old issue I found a case where FixedSizeVector is more efficient than Memory:

julia> using FixedSizeArrays

julia> function f(v)
           for idx in eachindex(v)
               v[idx] = 1
           end
       end
f (generic function with 1 method)

julia> code_llvm(f, (Memory{Int64},); debuginfo=:none)

; Function Signature: f(Memory{Int64})
define void @julia_f_2112(ptr noundef nonnull align 8 dereferenceable(16) %"v::GenericMemory") local_unnamed_addr #0 {
top:
  %pgcstack = call ptr inttoptr (i64 4303044364 to ptr)(i64 4303044400) #10
  %.unbox = load i64, ptr %"v::GenericMemory", align 8
  %0 = icmp slt i64 %.unbox, 1
  br i1 %0, label %L29, label %preloop.pseudo.exit

L11:                                              ; preds = %vector.body, %L11.preheader33, %load
  %value_phi3 = phi i64 [ %1, %load ], [ 1, %L11.preheader33 ], [ %13, %vector.body ]
  %exitcond.not.not = icmp eq i64 %value_phi3, %7
  br i1 %exitcond.not.not, label %oob, label %load

L29:                                              ; preds = %load.postloop, %main.exit.selector, %top
  ret void

oob:                                              ; preds = %L11.postloop, %L11
  %value_phi3.lcssa = phi i64 [ %value_phi3.postloop, %L11.postloop ], [ %7, %L11 ]
  %ptls_field = getelementptr inbounds nuw i8, ptr %pgcstack, i64 16
  %ptls_load = load ptr, ptr %ptls_field, align 8
  %"box::GenericMemoryRef" = call noalias nonnull align 8 dereferenceable(32) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 472, i32 32, i64 4630575984) #7
  %"box::GenericMemoryRef.tag_addr" = getelementptr inbounds i8, ptr %"box::GenericMemoryRef", i64 -8
  store atomic i64 4630575984, ptr %"box::GenericMemoryRef.tag_addr" unordered, align 8
  store ptr %memoryref_data, ptr %"box::GenericMemoryRef", align 8
  %.repack16 = getelementptr inbounds nuw i8, ptr %"box::GenericMemoryRef", i64 8
  store ptr %"v::GenericMemory", ptr %.repack16, align 8
  call void @ijl_bounds_error_int(ptr nonnull %"box::GenericMemoryRef", i64 %value_phi3.lcssa)
  unreachable

load:                                             ; preds = %L11
  %gep = getelementptr i64, ptr %invariant.gep, i64 %value_phi3
  store i64 1, ptr %gep, align 8
  %1 = add nuw nsw i64 %value_phi3, 1
  %exitcond39.not = icmp eq i64 %value_phi3, %5
  br i1 %exitcond39.not, label %main.exit.selector, label %L11

main.exit.selector:                               ; preds = %load
  %2 = icmp ult i64 %4, %.unbox
  br i1 %2, label %L11.postloop, label %L29

preloop.pseudo.exit:                              ; preds = %top
  %memory_data_ptr = getelementptr inbounds nuw i8, ptr %"v::GenericMemory", i64 8
  %memoryref_data = load ptr, ptr %memory_data_ptr, align 8
  %3 = shl nuw i64 %.unbox, 1
  %memoryref_bytelen = shl i64 %.unbox, 3
  %smin21 = call i64 @llvm.smin.i64(i64 %.unbox, i64 %3)
  %4 = sub i64 %3, %smin21
  %isnotneg.inv = icmp slt i64 %3, 0
  %5 = call i64 @llvm.umin.i64(i64 %.unbox, i64 %4)
  %.not45 = icmp eq i64 %3, %smin21
  %.not = or i1 %isnotneg.inv, %.not45
  br i1 %.not, label %L11.postloop, label %L11.preheader33

L11.preheader33:                                  ; preds = %preloop.pseudo.exit
  %6 = and i64 %.unbox, 2305843009213693951
  %7 = add nuw nsw i64 %6, 1
  %8 = add nuw i64 %5, 1
  %invariant.gep = getelementptr i8, ptr %memoryref_data, i64 -8
  %9 = add nsw i64 %5, -1
  %umin = call i64 @llvm.umin.i64(i64 %9, i64 %6)
  %min.iters.check = icmp samesign ult i64 %umin, 8
  br i1 %min.iters.check, label %L11, label %vector.ph

vector.ph:                                        ; preds = %L11.preheader33
  %10 = add nuw nsw i64 %umin, 1
  %n.mod.vf = and i64 %10, 7
  %11 = icmp eq i64 %n.mod.vf, 0
  %12 = select i1 %11, i64 8, i64 %n.mod.vf
  %n.vec = sub nuw nsw i64 %10, %12
  %13 = add nuw nsw i64 %n.vec, 1
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %14 = getelementptr i64, ptr %memoryref_data, i64 %index
  %15 = getelementptr i8, ptr %14, i64 16
  %16 = getelementptr i8, ptr %14, i64 32
  %17 = getelementptr i8, ptr %14, i64 48
  store <2 x i64> splat (i64 1), ptr %14, align 8
  store <2 x i64> splat (i64 1), ptr %15, align 8
  store <2 x i64> splat (i64 1), ptr %16, align 8
  store <2 x i64> splat (i64 1), ptr %17, align 8
  %index.next = add nuw i64 %index, 8
  %18 = icmp eq i64 %index.next, %n.vec
  br i1 %18, label %L11, label %vector.body

L11.postloop:                                     ; preds = %load.postloop, %preloop.pseudo.exit, %main.exit.selector
  %value_phi3.postloop = phi i64 [ %20, %load.postloop ], [ %8, %main.exit.selector ], [ 1, %preloop.pseudo.exit ]
  %memoryref_offset.postloop = add nsw i64 %value_phi3.postloop, -1
  %19 = add i64 %memoryref_offset.postloop, %.unbox
  %memoryref_ovflw.not.postloop = icmp ult i64 %19, %3
  %memoryref_data_offset.idx.postloop = shl i64 %memoryref_offset.postloop, 3
  %memoryref_isinbounds.postloop = icmp ult i64 %memoryref_data_offset.idx.postloop, %memoryref_bytelen
  %"memoryref_isinbounds&notovflw.postloop" = and i1 %memoryref_ovflw.not.postloop, %memoryref_isinbounds.postloop
  br i1 %"memoryref_isinbounds&notovflw.postloop", label %load.postloop, label %oob

load.postloop:                                    ; preds = %L11.postloop
  %memoryref_data10.postloop = getelementptr inbounds i64, ptr %memoryref_data, i64 %memoryref_offset.postloop
  store i64 1, ptr %memoryref_data10.postloop, align 8
  %.not.postloop = icmp eq i64 %value_phi3.postloop, %.unbox
  %20 = add nuw nsw i64 %value_phi3.postloop, 1
  br i1 %.not.postloop, label %L29, label %L11.postloop
}

julia> code_llvm(f, (FixedSizeVectorDefault{Int64},); debuginfo=:none)

; Function Signature: f(FixedSizeArrays.FixedSizeArray{Int64, 1, Memory{Int64}})
define void @julia_f_2115(ptr nocapture noundef nonnull readonly align 8 dereferenceable(16) %"v::FixedSizeArray", ptr nocapture readonly %.roots.v) local_unnamed_addr #0 {
top:
  %0 = getelementptr inbounds nuw i8, ptr %"v::FixedSizeArray", i64 8
  %.unbox = load i64, ptr %0, align 8
  %1 = icmp slt i64 %.unbox, 1
  br i1 %1, label %L49, label %L13.preheader19

L13.preheader19:                                  ; preds = %top
  %memoryref_mem = load ptr, ptr %.roots.v, align 8
  %memory_data_ptr = getelementptr inbounds nuw i8, ptr %memoryref_mem, i64 8
  %memoryref_data.pre = load ptr, ptr %memory_data_ptr, align 8
  %invariant.gep = getelementptr i8, ptr %memoryref_data.pre, i64 -8
  %min.iters.check = icmp samesign ult i64 %.unbox, 8
  br i1 %min.iters.check, label %L13, label %vector.ph

vector.ph:                                        ; preds = %L13.preheader19
  %n.vec = and i64 %.unbox, 9223372036854775800
  %2 = or disjoint i64 %n.vec, 1
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %3 = getelementptr i64, ptr %memoryref_data.pre, i64 %index
  %4 = getelementptr i8, ptr %3, i64 16
  %5 = getelementptr i8, ptr %3, i64 32
  %6 = getelementptr i8, ptr %3, i64 48
  store <2 x i64> splat (i64 1), ptr %3, align 8
  store <2 x i64> splat (i64 1), ptr %4, align 8
  store <2 x i64> splat (i64 1), ptr %5, align 8
  store <2 x i64> splat (i64 1), ptr %6, align 8
  %index.next = add nuw i64 %index, 8
  %7 = icmp eq i64 %index.next, %n.vec
  br i1 %7, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %cmp.n = icmp eq i64 %.unbox, %n.vec
  br i1 %cmp.n, label %L49, label %L13

L13:                                              ; preds = %L13, %middle.block, %L13.preheader19
  %value_phi3 = phi i64 [ %8, %L13 ], [ 1, %L13.preheader19 ], [ %2, %middle.block ]
  %gep = getelementptr i64, ptr %invariant.gep, i64 %value_phi3
  store i64 1, ptr %gep, align 8
  %8 = add i64 %value_phi3, 1
  %exitcond.not = icmp eq i64 %value_phi3, %.unbox
  br i1 %exitcond.not, label %L49, label %L13

L49:                                              ; preds = %L13, %middle.block, %top
  ret void
}

Note that the version with Memory{Int64} has the out-of-bound error block, while that’s removed entirely for FixedSizeVectorDefault{Int64}. Honestly, right now I can’t tell why it’s that, it feels like the compiler should have enough information to elide the error path also for Memory{Int64}, but that’s what we get. In general the generated code for FixedSizeVectorDefault{Int64} for this function looks a lot simpler than for Memory{Int64}.

Oscar_Smith · June 24, 2025, 2:14am

Can you check if expand memoryrefnew capabilities by oscardssmith · Pull Request #58768 · JuliaLang/julia · GitHub fixes this? It simplifies the Memory boundscheck a good amount.

giordano · June 24, 2025, 8:07am

No, the oob block is still there:

$ julia +pr58768 -q
julia> function f(v)
           for idx in eachindex(v)
               v[idx] = 1
           end
       end
f (generic function with 1 method)

julia> code_llvm(f, (Memory{Int64},); debuginfo=:none)

; Function Signature: f(Memory{Int64})
define void @julia_f_1115(ptr noundef nonnull align 8 dereferenceable(16) %"v::GenericMemory") local_unnamed_addr #0 {
top:
  %thread_ptr = call ptr asm "mrs $0, tpidr_el0", "=r"() #10
  %tls_ppgcstack = getelementptr inbounds nuw i8, ptr %thread_ptr, i64 16
  %tls_pgcstack = load ptr, ptr %tls_ppgcstack, align 8
  %.unbox = load i64, ptr %"v::GenericMemory", align 8
  %0 = icmp slt i64 %.unbox, 1
  br i1 %0, label %L29, label %preloop.pseudo.exit

L11:                                              ; preds = %vector.body, %L11.preheader33, %load
  %value_phi3 = phi i64 [ %1, %load ], [ 1, %L11.preheader33 ], [ %13, %vector.body ]
  %exitcond.not.not = icmp eq i64 %value_phi3, %7
  br i1 %exitcond.not.not, label %oob, label %load

L29:                                              ; preds = %load.postloop, %main.exit.selector, %top
  ret void

oob:                                              ; preds = %L11.postloop, %L11
  %value_phi3.lcssa = phi i64 [ %value_phi3.postloop, %L11.postloop ], [ %7, %L11 ]
  %ptls_field = getelementptr inbounds nuw i8, ptr %tls_pgcstack, i64 16
  %ptls_load = load ptr, ptr %ptls_field, align 8
  %"box::GenericMemoryRef" = call noalias nonnull align 8 dereferenceable(32) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 408, i32 32, i64 281473313876576) #7
  %"box::GenericMemoryRef.tag_addr" = getelementptr inbounds i8, ptr %"box::GenericMemoryRef", i64 -8
  store atomic i64 281473313876576, ptr %"box::GenericMemoryRef.tag_addr" unordered, align 8
  store ptr %memoryref_data, ptr %"box::GenericMemoryRef", align 8
  %.repack16 = getelementptr inbounds nuw i8, ptr %"box::GenericMemoryRef", i64 8
  store ptr %"v::GenericMemory", ptr %.repack16, align 8
  call void @ijl_bounds_error_int(ptr nonnull %"box::GenericMemoryRef", i64 %value_phi3.lcssa)
  unreachable

; [...]

raman_kumar · June 24, 2025, 3:38pm

What is this in your code and its use?

digital_carver · June 24, 2025, 3:50pm

The +pr... part is an argument to juliaup (when you install julia via juliaup, running julia actually runs juliaup first so that you can do things like this). Usually you’ll use this + option to select a Julia version, for eg. julia +1.10 to run the LTS. Being able to specify a PR number there is (I think) a new feature, so julia +pr58768 runs the specific version of Julia that was built based on that pull request.

The -q is just a “quiet” option so that the REPL doesn’t print the starting banner with the colourful Julia logo.

Topic		Replies	Views
Memory allocations when returning vectors General Usage array , memory-allocation	15	1546	June 6, 2018
Strange performance of a loop Performance	37	3334	July 21, 2018
The cost of size() in for-loops Performance	11	1858	July 20, 2018
Performance of mutable static arrays and compilation cost General Usage	15	1257	July 17, 2018
Resize!(matrix) General Usage question	44	11282	April 22, 2018

[ANN] FixedSizeArrays.jl: What Array probably should have been

Related topics