Best practices for including data in a module as a constant array

I’m having trouble finding the best way to include data in a module. Basically I have a small array (5 x 5) of constant data, but to interface with other packages, I want to define a function that just returns the element in the constant array at a given index.

From what I can tell, defining a constant array in the global scope of the module isn’t possible. Is that right? Is the best way to just define the data as a normal global array and then reference it inside my function? I can’t think of any other way to avoid having an array in the global scope of the module.

I ran some mini-benchmarks and it looks way more expensive to define the array inside the function (not surprisingly). Is there a better way to include constant data in a module than using a globally defined array?

Here’s a minimal working example that shows the difference between the global and local approach, with some quick @btime benchmarks.

using BenchmarkTools

# data defined globally
data_outer = [1.0 2.0; 3.0 4.0]
f_outer(x,y) = data_outer[x, y]

# data redefined inside function
function f_inner(x,y)
    data_inner = [1.0 2.0; 3.0 4.0]
    data_inner[x, y]
end

# large data defined globally
data_outer_large = rand(1_000, 1_000)
f_outer_large(x,y) = data_outer_large[x, y]

# large data redefined inside function
function f_inner_large(x,y)
    data_inner_large = rand(1_000, 1_000)
    data_inner_large[x, y]
end
julia> @btime f_outer(1,2)
  26.666 ns (1 allocation: 16 bytes)
2.0

julia> @btime f_inner(1,2)
  41.318 ns (1 allocation: 112 bytes)
2.0

julia> @btime f_outer_large(1,2)
  28.890 ns (1 allocation: 16 bytes)
0.6037250079523222

julia> @btime f_inner_large(1,2)
  4.157 ms (2 allocations: 7.63 MiB)
0.36772602420929323

Huh? It’s definitely possible. Why do you think it isn’t?

const data_outer = [1.0 2.0; 3.0 4.0]

should work just fine. Is there some other piece to this that I’m missing?

Okay that definitely works, thanks - I should have benchmarked against that in the first place:

# global data
data =  rand(10_000, 10_000)
f(x,y) = data[x, y]

# constant global data
const data_const = rand(10_000, 10_000)
f_const(x,y) = data_const[x, y]

@btime f(1,2) # 26.256 ns (1 allocation: 16 bytes)
@btime f_const(1,2) #   2.051 ns (0 allocations: 0 bytes)

My confusion was coming from the behavior that re-defining a scalar constant throws a warning but re-defining an element in a vector constant doesn’t throw a warning. I did some googling and came across this (very dated) issue which made me think that I wouldn’t get any additional performance.

This may be a premature optimization, but it’s worth checking out StaticArrays.jl for cases like this:

edit: it’s fast, but wildly unsafe, with a risk of segfaults

using StaticArrays

const stackdata = @SArray(rand(5,5))
const heapdata = rand(5,5)

getstack(i,j) = @inbounds stackdata[i,j]
getheap(i,j) = @inbounds heapdata[i,j]
julia> using BenchmarkTools

julia> @btime getstack(1,2)
  0.042 ns (0 allocations: 0 bytes)

julia> @btime getheap(1,2)
  1.550 ns (0 allocations: 0 bytes)

Oh, I see, I missed that you were trying to mutate it.

Hmmm… isn’t that a wildly reckless way of using @inbounds, especially in a thread titled “Best practices”? :slight_smile:

Just out of curiosity, is a segfault the worst thing that can happen from this or can it open up for arbitrary code execution down the road? Or maybe fry the nearest nuclear power plant?

You’re absolutely right–careless use of @inbounds is dangerous, and even more so for a user-facing function. A segfault’s probably the worst that could happen, though, since the function is read-only.

This is a bad assumption. Out of bounds accesses are undefined behavior, so LLVM may happily delete the entire code path calling this function with out of bounds indicies.