Fortran vs Julia stack allocated arrays

Isn’t that heap allocated?
I do not know of any stack-allocated objects in Julia that are mutable. I thought Julia drew a neat divide between immutable structs and mutable structs, where only some of the former are stack allocated (ie, when they’re isbits/don’t contain any reference types).

They’re just a mutable struct wrapping a tuple. Create and destroy them, and you trigger the garbage collector.

julia> using StaticArrays, BenchmarkTools

julia> @benchmark @MVector randn(4)
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     25.867 ns (0.00% GC)
  median time:      27.715 ns (0.00% GC)
  mean time:        37.415 ns (22.50% GC)
  maximum time:     48.765 μs (99.92% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark @SVector randn(4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     21.204 ns (0.00% GC)
  median time:      22.369 ns (0.00% GC)
  mean time:        22.478 ns (0.00% GC)
  maximum time:     52.456 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark @MVector zeros(4)
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     4.744 ns (0.00% GC)
  median time:      8.484 ns (0.00% GC)
  mean time:        18.205 ns (47.74% GC)
  maximum time:     49.892 μs (99.97% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> @benchmark @SVector zeros(4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.020 ns (0.00% GC)
  median time:      0.030 ns (0.00% GC)
  mean time:        0.029 ns (0.00% GC)
  maximum time:     26.590 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

pointer_from_objref works just fine on them, as does unsafe_load and unsafe_store!. In fact, setindex! is defined using pointer_from_objref and unsafe_store!.

I made the HN comment because neither foobar’s question above, or my question on masked loads and stores had a positive answer.
Neither of us could even achieve what we wanted via llvmcall.

So if this is possible:

meaning you can’t use masked load/store operations to vectorize code when the array dimensions aren’t a multiple of SIMD-vector-width.

I’d love to learn how!

2 Likes