HDR pixels: tuples vs. StaticArrays

I’m working with an HDR pixel type, which at the moment I am representing as a tuple. I can make a case for using an SVector or MVector instead, so I wobble back and forth between leaving the code as it is and switching it to static arrays.

ColorTypes, representing relative rather than absolute luminance values, uses immutable types with fields named R, G, and B, which makes an argument for using tuples or SVectors. On the other hand, being able to alter the values of a pixel might be a savings in storage space and execution time.

What do people here think?

You don’t need to alter the values inside a pixel, a pixel is trivial to reconstruct with one changed value. Look at Accessors.jl for example. You’ll have an array of pixels and you can change the pixels in that array. If you make the pixel type mutable your image array can only store pointers, making the whole thing much slower than with contiguous inline storage of the actual pixel values.

Not true. If you use MVectors, the vector itself is stored in the array.

Storage “inline” is guaranteed for static arrays of eltype where isbitstype(eltype) is true, which includes common cases like MVector{N, Float64} or MVector{N, Int}.

From the StaticArrays docs:

Mutable arrays: MVector, MMatrix and MArray

These statically sized arrays are identical to the above, but are defined as mutable structs, instead of immutable structs. Because they are mutable, they allow setindex! to be defined (achieved through pointer manipulation, into a tuple).

As a consequence of Julia’s internal implementation, these mutable containers live on the heap, not the stack. Their memory must be allocated and tracked by the garbage collector.

3 Likes

I would say, the docs are wrong. I wrote large pieces of software using hundreds of MVectors, and I have zero allocations. I also compared the performance with SVectors, and in most cases the performance is the same.

I some situations you need an SVector, though, to achieve zero allocations.

In Julia, MVectors (from the StaticArrays.jl package) are generally allocated on the stack when possible, but this is not a strict guarantee—specific cases, like use in mutable structs or certain compiler contexts, may result in heap allocation instead.

I created a bug-report: Error in documentation · Issue #1316 · JuliaArrays/StaticArrays.jl · GitHub

Huh?

julia> 3*3*sizeof(Int)
72

julia> Base.allocatedinline(SVector{3,Int})
true

julia> sizeof(zeros(SVector{3,Int}, 3))
72

julia> Base.allocatedinline(MVector{3,Int})
false

julia> sizeof(zeros(MVector{3,Int}, 3))
24
2 Likes

Regardless of whether MVectors can be optimized to avoid heap allocation, you can achieve predictable performance with SVectors and setindex:

jl> using StaticArrays

jl> pix = SVector{3, Float64}(0.5, 0.6, 0.7)
3-element SVector{3, Float64} with indices SOneTo(3):
 0.5
 0.6
 0.7

jl> pix = setindex(pix, 0.25, 2)
3-element SVector{3, Float64} with indices SOneTo(3):
 0.5
 0.25
 0.7

Just remember that setindex does not work in-place, so you must capture the return value, e.g. like in

img[i, j] = setindex(img[i, j], 0.25, 2)  # change second field of pixel (i, j)

You are testing in global scope, so the results are not very relevant.

Scope doesn’t matter there, he’s just showing that a Vector of MVectors is a vector of pointers, while SVectors are stored inline. We’re not talking about eliding allocations for MVectors that never escape a function. We’re talking about MVectors vs. SVectors stored in arrays that persist. Like you usually would have when dealing with images which the original question was about.

2 Likes

You are indeed right. AI told me the wrong answer, and I was not critical enough. Here a code to check:

#!/usr/bin/env julia

# Demonstration of Vector{MVector} memory layout - corrected analysis
# Copyright (c) 2025 Uwe Fechner
# SPDX-License-Identifier: MIT

using StaticArrays

function demonstrate_mvector_memory_layout()
    println("=== Memory Layout Demonstration: Vector{MVector} Storage ===\n")
    
    # Create a Vector of MVectors
    mvec_array = [MVector{3, Float64}([i, i+0.1, i+0.2]) for i in 1:5]
    
    # Create a Vector of regular Vectors (stored by reference)
    vec_array = [Vector{Float64}([i, i+0.1, i+0.2]) for i in 1:5]
    
    println("1. SIZE COMPARISON:")
    println("   Vector{MVector{3,Float64}} size: $(sizeof(mvec_array)) bytes")
    println("   Vector{Vector{Float64}} size:    $(sizeof(vec_array)) bytes")
    println("   MVector element size: $(sizeof(MVector{3, Float64})) bytes")
    println("   Pointer size: $(sizeof(Ptr{MVector{3, Float64}})) bytes\n")
    
    # Both are pointer-based storage, but with different heap allocation patterns
    expected_ptr_size = 5 * sizeof(Ptr{Any})
    
    println("2. STORAGE ANALYSIS:")
    println("   Expected pointer array size: $(expected_ptr_size) bytes")
    println("   Actual Vector{MVector} size: $(sizeof(mvec_array)) bytes")
    println("   Actual Vector{Vector} size:  $(sizeof(vec_array)) bytes")
    println("   → Both use pointer-based storage\n")
    
    println("3. HEAP ALLOCATION PATTERN:")
    
    # Check where the actual MVector data is stored
    for i in 1:3
        mvec_ref = mvec_array[i]
        mvec_addr = pointer_from_objref(mvec_ref)
        
        println("   MVector $i:")
        println("     Object reference: $(mvec_ref)")
        println("     Heap address: $(mvec_addr)")
        println("     Data: [$(mvec_ref[1]), $(mvec_ref[2]), $(mvec_ref[3])]")
    end
    
    println("\n4. STACK vs HEAP ALLOCATION COMPARISON:")
    
    # Compare with stack-allocated MVector
    function test_stack_mvector()
        # This MVector is allocated on the stack
        stack_mv = MVector{3, Float64}([1.0, 2.0, 3.0])
        stack_addr = pointer(stack_mv)
        println("   Stack MVector address: $(stack_addr)")
        return stack_mv
    end
    
    function test_heap_mvector()
        # This creates a heap-allocated MVector (when returned)
        heap_mv = MVector{3, Float64}([1.0, 2.0, 3.0])
        heap_addr = pointer_from_objref(heap_mv)
        println("   Heap MVector address: $(heap_addr)")
        return heap_mv
    end
    
    stack_result = test_stack_mvector()
    heap_result = test_heap_mvector()
    
    println("\n5. PERFORMANCE COMPARISON:")
    
    # Create arrays for benchmarking
    n_elements = 1000
    mvec_large = [MVector{3, Float64}([i, i+0.1, i+0.2]) for i in 1:n_elements]
    vec_large = [Vector{Float64}([i, i+0.1, i+0.2]) for i in 1:n_elements]
    
    # Benchmark access patterns
    n_iterations = 10_000
    
    # MVector access
    mvec_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iterations
            for mv in mvec_large
                sum_val += mv[1] + mv[2] + mv[3]
            end
        end
        sum_val
    end
    
    # Vector access  
    vec_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iterations
            for v in vec_large
                sum_val += v[1] + v[2] + v[3]
            end
        end
        sum_val
    end
    
    println("   MVector access time:  $(round(mvec_time*1000, digits=3)) ms")
    println("   Vector access time:   $(round(vec_time*1000, digits=3)) ms")
    println("   MVector is $(round(vec_time/mvec_time, digits=2))x faster")
    
    println("\n=== CORRECTED UNDERSTANDING ===")
    println("✓ Vector{MVector} stores REFERENCES to heap-allocated MVectors")
    println("✓ Each MVector is a separate heap object (unlike C arrays)")
    println("✓ Still faster than Vector{Vector} due to:")
    println("  - Fixed-size, stack-like allocation pattern")
    println("  - Better memory layout within each MVector")
    println("  - Compiler optimizations for StaticArrays")
    println("✓ For truly contiguous storage, use regular Arrays or StaticArrays")
end

function demonstrate_true_contiguous_storage()
    println("\n=== TRUE CONTIGUOUS STORAGE EXAMPLES ===")
    
    # Method 1: Regular Array (contiguous)
    regular_array = [[i, i+0.1, i+0.2] for i in 1:5]
    reshaped = reshape(hcat(regular_array...), 3, 5)'
    flat_data = vec(reshaped)
    
    println("1. FLATTENED ARRAY (truly contiguous):")
    println("   Data: $(flat_data)")
    println("   Size: $(sizeof(flat_data)) bytes")
    println("   Element size: $(sizeof(Float64)) bytes")
    
    # Method 2: SMatrix storage
    smatrix_array = [SVector{3, Float64}([i, i+0.1, i+0.2]) for i in 1:5]
    
    println("\n2. VECTOR{SVECTOR} (immutable, better layout):")
    println("   Size: $(sizeof(smatrix_array)) bytes")
    println("   Expected if contiguous: $(5 * sizeof(SVector{3, Float64})) bytes")
    
    # Check if SVector is truly stored by value
    is_svector_contiguous = sizeof(smatrix_array) == 5 * sizeof(SVector{3, Float64})
    println("   → SVector stored contiguously: $(is_svector_contiguous ? "✓ YES" : "✗ NO")")
    
    if is_svector_contiguous
        println("   ✓ SVector achieves true value-based storage!")
    else
        println("   ✗ SVector also uses reference-based storage")
    end
    
    println("\n3. PERFORMANCE COMPARISON:")
    n_iter = 100_000
    
    # Benchmark flattened access
    flat_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iter
            for i in 1:3:length(flat_data)-2
                sum_val += flat_data[i] + flat_data[i+1] + flat_data[i+2]
            end
        end
        sum_val
    end
    
    # Benchmark SVector access
    svec_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iter
            for sv in smatrix_array
                sum_val += sv[1] + sv[2] + sv[3]
            end
        end
        sum_val
    end
    
    println("   Flattened array time: $(round(flat_time*1000, digits=3)) ms")
    println("   SVector array time:   $(round(svec_time*1000, digits=3)) ms")
end

# Run the demonstrations
if abspath(PROGRAM_FILE) == @__FILE__
    demonstrate_mvector_memory_layout()
    demonstrate_true_contiguous_storage()
    
    println("\n" * "="^60)
    println("FINAL ANSWER TO THE ORIGINAL QUESTION:")
    println("="^60)
    println("❌ Vector{MVector} is NOT stored in contiguous memory")
    println("✅ Vector{MVector} stores POINTERS to heap-allocated MVectors")
    println("✅ Vector{SVector} IS stored contiguously (value-based)")
    println("✅ For contiguous storage, use:")
    println("   - Vector{SVector{N,T}} (immutable, stack-friendly)")
    println("   - Regular flat arrays: Matrix or Vector{T}")
    println("   - Custom struct with contiguous fields")
    println("="^60)
end

So SVectors are the way to go, and you can mutate them using Accessors.jl for example.

1 Like

Perhaps the sentence of the year!

3 Likes

Thank you.

I already have tuples working. Is there enough difference between a tuple and an SVector to make switching to SVector worthwhile? Would that be a better choice either for computational or stylistic reasons?

Then, another issue. A second color model has been added, one with 24 spectral bands. I am thinking that MVectors might be a better choice there, since it’s noticeably more computational effort to replace 24 values than three, and all the heuristics to improve performance tend to aimed at triplets or quadruplets. Any intuitions here?

There’s no difference on a low level, SVectors are tuples with an array interface. A struct with n fields would also be the same in memory.

24 elements should still be ok with tuples I think, at larger sizes tuples have much higher compilation cost than arrays but I’ve often seen around 100 elements as a heuristic threshold. You’d still have gains from the cache friendliness of the contiguous storage I’d think. Also, if you write a function that always just replaces one specific entry of a 24 tuple in an array, chances are the compiler optimizes that to a single write instead of overwriting the 23 unchanged components as well. In the end you need to benchmark anyway but these would be my intuitions.

Thanks; that means that my current code is pretty good. I suppose having the array interface might be an advantage. I’ll have to look into that more, but other than that I think you’ve answered all of my questions.