HDR pixels: tuples vs. StaticArrays

rmfritz · August 29, 2025, 5:10am

I’m working with an HDR pixel type, which at the moment I am representing as a tuple. I can make a case for using an SVector or MVector instead, so I wobble back and forth between leaving the code as it is and switching it to static arrays.

ColorTypes, representing relative rather than absolute luminance values, uses immutable types with fields named R, G, and B, which makes an argument for using tuples or SVectors. On the other hand, being able to alter the values of a pixel might be a savings in storage space and execution time.

What do people here think?

jules · August 29, 2025, 5:28am

You don’t need to alter the values inside a pixel, a pixel is trivial to reconstruct with one changed value. Look at Accessors.jl for example. You’ll have an array of pixels and you can change the pixels in that array. If you make the pixel type mutable your image array can only store pointers, making the whole thing much slower than with contiguous inline storage of the actual pixel values.

ufechner7 · August 29, 2025, 6:37am

Not true. If you use MVectors, the vector itself is stored in the array.

Storage “inline” is guaranteed for static arrays of eltype where isbitstype(eltype) is true, which includes common cases like MVector{N, Float64} or MVector{N, Int}.

jules · August 29, 2025, 7:14am

From the StaticArrays docs:

Mutable arrays: MVector, MMatrix and MArray

These statically sized arrays are identical to the above, but are defined as mutable structs, instead of immutable structs. Because they are mutable, they allow setindex! to be defined (achieved through pointer manipulation, into a tuple).

As a consequence of Julia’s internal implementation, these mutable containers live on the heap, not the stack. Their memory must be allocated and tracked by the garbage collector.

ufechner7 · August 29, 2025, 12:36pm

I would say, the docs are wrong. I wrote large pieces of software using hundreds of MVectors, and I have zero allocations. I also compared the performance with SVectors, and in most cases the performance is the same.

I some situations you need an SVector, though, to achieve zero allocations.

In Julia, MVectors (from the StaticArrays.jl package) are generally allocated on the stack when possible, but this is not a strict guarantee—specific cases, like use in mutable structs or certain compiler contexts, may result in heap allocation instead.

I created a bug-report: Error in documentation · Issue #1316 · JuliaArrays/StaticArrays.jl · GitHub

Benny · August 29, 2025, 12:47pm

Huh?

julia> 3*3*sizeof(Int)
72

julia> Base.allocatedinline(SVector{3,Int})
true

julia> sizeof(zeros(SVector{3,Int}, 3))
72

julia> Base.allocatedinline(MVector{3,Int})
false

julia> sizeof(zeros(MVector{3,Int}, 3))
24

DNF · August 29, 2025, 1:45pm

Regardless of whether MVectors can be optimized to avoid heap allocation, you can achieve predictable performance with SVectors and setindex:

jl> using StaticArrays

jl> pix = SVector{3, Float64}(0.5, 0.6, 0.7)
3-element SVector{3, Float64} with indices SOneTo(3):
 0.5
 0.6
 0.7

jl> pix = setindex(pix, 0.25, 2)
3-element SVector{3, Float64} with indices SOneTo(3):
 0.5
 0.25
 0.7

Just remember that setindex does not work in-place, so you must capture the return value, e.g. like in

img[i, j] = setindex(img[i, j], 0.25, 2)  # change second field of pixel (i, j)

ufechner7 · August 29, 2025, 3:46pm

You are testing in global scope, so the results are not very relevant.

jules · August 29, 2025, 3:56pm

Scope doesn’t matter there, he’s just showing that a Vector of MVectors is a vector of pointers, while SVectors are stored inline. We’re not talking about eliding allocations for MVectors that never escape a function. We’re talking about MVectors vs. SVectors stored in arrays that persist. Like you usually would have when dealing with images which the original question was about.

ufechner7 · August 29, 2025, 5:04pm

You are indeed right. AI told me the wrong answer, and I was not critical enough. Here a code to check:

#!/usr/bin/env julia

# Demonstration of Vector{MVector} memory layout - corrected analysis
# Copyright (c) 2025 Uwe Fechner
# SPDX-License-Identifier: MIT

using StaticArrays

function demonstrate_mvector_memory_layout()
    println("=== Memory Layout Demonstration: Vector{MVector} Storage ===\n")
    
    # Create a Vector of MVectors
    mvec_array = [MVector{3, Float64}([i, i+0.1, i+0.2]) for i in 1:5]
    
    # Create a Vector of regular Vectors (stored by reference)
    vec_array = [Vector{Float64}([i, i+0.1, i+0.2]) for i in 1:5]
    
    println("1. SIZE COMPARISON:")
    println("   Vector{MVector{3,Float64}} size: $(sizeof(mvec_array)) bytes")
    println("   Vector{Vector{Float64}} size:    $(sizeof(vec_array)) bytes")
    println("   MVector element size: $(sizeof(MVector{3, Float64})) bytes")
    println("   Pointer size: $(sizeof(Ptr{MVector{3, Float64}})) bytes\n")
    
    # Both are pointer-based storage, but with different heap allocation patterns
    expected_ptr_size = 5 * sizeof(Ptr{Any})
    
    println("2. STORAGE ANALYSIS:")
    println("   Expected pointer array size: $(expected_ptr_size) bytes")
    println("   Actual Vector{MVector} size: $(sizeof(mvec_array)) bytes")
    println("   Actual Vector{Vector} size:  $(sizeof(vec_array)) bytes")
    println("   → Both use pointer-based storage\n")
    
    println("3. HEAP ALLOCATION PATTERN:")
    
    # Check where the actual MVector data is stored
    for i in 1:3
        mvec_ref = mvec_array[i]
        mvec_addr = pointer_from_objref(mvec_ref)
        
        println("   MVector $i:")
        println("     Object reference: $(mvec_ref)")
        println("     Heap address: $(mvec_addr)")
        println("     Data: [$(mvec_ref[1]), $(mvec_ref[2]), $(mvec_ref[3])]")
    end
    
    println("\n4. STACK vs HEAP ALLOCATION COMPARISON:")
    
    # Compare with stack-allocated MVector
    function test_stack_mvector()
        # This MVector is allocated on the stack
        stack_mv = MVector{3, Float64}([1.0, 2.0, 3.0])
        stack_addr = pointer(stack_mv)
        println("   Stack MVector address: $(stack_addr)")
        return stack_mv
    end
    
    function test_heap_mvector()
        # This creates a heap-allocated MVector (when returned)
        heap_mv = MVector{3, Float64}([1.0, 2.0, 3.0])
        heap_addr = pointer_from_objref(heap_mv)
        println("   Heap MVector address: $(heap_addr)")
        return heap_mv
    end
    
    stack_result = test_stack_mvector()
    heap_result = test_heap_mvector()
    
    println("\n5. PERFORMANCE COMPARISON:")
    
    # Create arrays for benchmarking
    n_elements = 1000
    mvec_large = [MVector{3, Float64}([i, i+0.1, i+0.2]) for i in 1:n_elements]
    vec_large = [Vector{Float64}([i, i+0.1, i+0.2]) for i in 1:n_elements]
    
    # Benchmark access patterns
    n_iterations = 10_000
    
    # MVector access
    mvec_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iterations
            for mv in mvec_large
                sum_val += mv[1] + mv[2] + mv[3]
            end
        end
        sum_val
    end
    
    # Vector access  
    vec_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iterations
            for v in vec_large
                sum_val += v[1] + v[2] + v[3]
            end
        end
        sum_val
    end
    
    println("   MVector access time:  $(round(mvec_time*1000, digits=3)) ms")
    println("   Vector access time:   $(round(vec_time*1000, digits=3)) ms")
    println("   MVector is $(round(vec_time/mvec_time, digits=2))x faster")
    
    println("\n=== CORRECTED UNDERSTANDING ===")
    println("✓ Vector{MVector} stores REFERENCES to heap-allocated MVectors")
    println("✓ Each MVector is a separate heap object (unlike C arrays)")
    println("✓ Still faster than Vector{Vector} due to:")
    println("  - Fixed-size, stack-like allocation pattern")
    println("  - Better memory layout within each MVector")
    println("  - Compiler optimizations for StaticArrays")
    println("✓ For truly contiguous storage, use regular Arrays or StaticArrays")
end

function demonstrate_true_contiguous_storage()
    println("\n=== TRUE CONTIGUOUS STORAGE EXAMPLES ===")
    
    # Method 1: Regular Array (contiguous)
    regular_array = [[i, i+0.1, i+0.2] for i in 1:5]
    reshaped = reshape(hcat(regular_array...), 3, 5)'
    flat_data = vec(reshaped)
    
    println("1. FLATTENED ARRAY (truly contiguous):")
    println("   Data: $(flat_data)")
    println("   Size: $(sizeof(flat_data)) bytes")
    println("   Element size: $(sizeof(Float64)) bytes")
    
    # Method 2: SMatrix storage
    smatrix_array = [SVector{3, Float64}([i, i+0.1, i+0.2]) for i in 1:5]
    
    println("\n2. VECTOR{SVECTOR} (immutable, better layout):")
    println("   Size: $(sizeof(smatrix_array)) bytes")
    println("   Expected if contiguous: $(5 * sizeof(SVector{3, Float64})) bytes")
    
    # Check if SVector is truly stored by value
    is_svector_contiguous = sizeof(smatrix_array) == 5 * sizeof(SVector{3, Float64})
    println("   → SVector stored contiguously: $(is_svector_contiguous ? "✓ YES" : "✗ NO")")
    
    if is_svector_contiguous
        println("   ✓ SVector achieves true value-based storage!")
    else
        println("   ✗ SVector also uses reference-based storage")
    end
    
    println("\n3. PERFORMANCE COMPARISON:")
    n_iter = 100_000
    
    # Benchmark flattened access
    flat_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iter
            for i in 1:3:length(flat_data)-2
                sum_val += flat_data[i] + flat_data[i+1] + flat_data[i+2]
            end
        end
        sum_val
    end
    
    # Benchmark SVector access
    svec_time = @elapsed begin
        sum_val = 0.0
        for _ in 1:n_iter
            for sv in smatrix_array
                sum_val += sv[1] + sv[2] + sv[3]
            end
        end
        sum_val
    end
    
    println("   Flattened array time: $(round(flat_time*1000, digits=3)) ms")
    println("   SVector array time:   $(round(svec_time*1000, digits=3)) ms")
end

# Run the demonstrations
if abspath(PROGRAM_FILE) == @__FILE__
    demonstrate_mvector_memory_layout()
    demonstrate_true_contiguous_storage()
    
    println("\n" * "="^60)
    println("FINAL ANSWER TO THE ORIGINAL QUESTION:")
    println("="^60)
    println("❌ Vector{MVector} is NOT stored in contiguous memory")
    println("✅ Vector{MVector} stores POINTERS to heap-allocated MVectors")
    println("✅ Vector{SVector} IS stored contiguously (value-based)")
    println("✅ For contiguous storage, use:")
    println("   - Vector{SVector{N,T}} (immutable, stack-friendly)")
    println("   - Regular flat arrays: Matrix or Vector{T}")
    println("   - Custom struct with contiguous fields")
    println("="^60)
end

So SVectors are the way to go, and you can mutate them using Accessors.jl for example.

Jeff_Emanuel · August 29, 2025, 7:15pm

Perhaps the sentence of the year!

rmfritz · August 29, 2025, 8:44pm

Thank you.

I already have tuples working. Is there enough difference between a tuple and an SVector to make switching to SVector worthwhile? Would that be a better choice either for computational or stylistic reasons?

Then, another issue. A second color model has been added, one with 24 spectral bands. I am thinking that MVectors might be a better choice there, since it’s noticeably more computational effort to replace 24 values than three, and all the heuristics to improve performance tend to aimed at triplets or quadruplets. Any intuitions here?

jules · August 29, 2025, 9:20pm

There’s no difference on a low level, SVectors are tuples with an array interface. A struct with n fields would also be the same in memory.

24 elements should still be ok with tuples I think, at larger sizes tuples have much higher compilation cost than arrays but I’ve often seen around 100 elements as a heuristic threshold. You’d still have gains from the cache friendliness of the contiguous storage I’d think. Also, if you write a function that always just replaces one specific entry of a 24 tuple in an array, chances are the compiler optimizes that to a single write instead of overwriting the 23 unchanged components as well. In the end you need to benchmark anyway but these would be my intuitions.

rmfritz · August 29, 2025, 9:26pm

Thanks; that means that my current code is pretty good. I suppose having the array interface might be an advantage. I’ll have to look into that more, but other than that I think you’ve answered all of my questions.

Topic		Replies	Views
Save access to vector elements in struct General Usage question	5	774	January 15, 2019
Usage of arrays of static arrays New to Julia performance , staticarrays	16	1198	February 22, 2023
Allocations when pushing to SVector in a loop General Usage	4	558	February 6, 2019
Fastest way to convert an MVector/NTuple to a different sized SVector/NTuple? Performance staticarrays	8	470	February 27, 2023
Best way to handle a list of arrays (Asking for recommendations) General Usage	3	476	December 21, 2018

HDR pixels: tuples vs. StaticArrays

Related topics