Best practices for including data in a module as a constant array


#1

I’m having trouble finding the best way to include data in a module. Basically I have a small array (5 x 5) of constant data, but to interface with other packages, I want to define a function that just returns the element in the constant array at a given index.

From what I can tell, defining a constant array in the global scope of the module isn’t possible. Is that right? Is the best way to just define the data as a normal global array and then reference it inside my function? I can’t think of any other way to avoid having an array in the global scope of the module.

I ran some mini-benchmarks and it looks way more expensive to define the array inside the function (not surprisingly). Is there a better way to include constant data in a module than using a globally defined array?

Here’s a minimal working example that shows the difference between the global and local approach, with some quick @btime benchmarks.

using BenchmarkTools

# data defined globally
data_outer = [1.0 2.0; 3.0 4.0]
f_outer(x,y) = data_outer[x, y]

# data redefined inside function
function f_inner(x,y)
    data_inner = [1.0 2.0; 3.0 4.0]
    data_inner[x, y]
end

# large data defined globally
data_outer_large = rand(1_000, 1_000)
f_outer_large(x,y) = data_outer_large[x, y]

# large data redefined inside function
function f_inner_large(x,y)
    data_inner_large = rand(1_000, 1_000)
    data_inner_large[x, y]
end
julia> @btime f_outer(1,2)
  26.666 ns (1 allocation: 16 bytes)
2.0

julia> @btime f_inner(1,2)
  41.318 ns (1 allocation: 112 bytes)
2.0

julia> @btime f_outer_large(1,2)
  28.890 ns (1 allocation: 16 bytes)
0.6037250079523222

julia> @btime f_inner_large(1,2)
  4.157 ms (2 allocations: 7.63 MiB)
0.36772602420929323

#2

Huh? It’s definitely possible. Why do you think it isn’t?

const data_outer = [1.0 2.0; 3.0 4.0]

should work just fine. Is there some other piece to this that I’m missing?


#3

Okay that definitely works, thanks - I should have benchmarked against that in the first place:

# global data
data =  rand(10_000, 10_000)
f(x,y) = data[x, y]

# constant global data
const data_const = rand(10_000, 10_000)
f_const(x,y) = data_const[x, y]

@btime f(1,2) # 26.256 ns (1 allocation: 16 bytes)
@btime f_const(1,2) #   2.051 ns (0 allocations: 0 bytes)

My confusion was coming from the behavior that re-defining a scalar constant throws a warning but re-defining an element in a vector constant doesn’t throw a warning. I did some googling and came across this (very dated) issue which made me think that I wouldn’t get any additional performance.


#4

This may be a premature optimization, but it’s worth checking out StaticArrays.jl for cases like this:

edit: it’s fast, but wildly unsafe, with a risk of segfaults

using StaticArrays

const stackdata = @SArray(rand(5,5))
const heapdata = rand(5,5)

getstack(i,j) = @inbounds stackdata[i,j]
getheap(i,j) = @inbounds heapdata[i,j]
julia> using BenchmarkTools

julia> @btime getstack(1,2)
  0.042 ns (0 allocations: 0 bytes)

julia> @btime getheap(1,2)
  1.550 ns (0 allocations: 0 bytes)

#5

Oh, I see, I missed that you were trying to mutate it.


#6

Hmmm… isn’t that a wildly reckless way of using @inbounds, especially in a thread titled “Best practices”? :slight_smile:

Just out of curiosity, is a segfault the worst thing that can happen from this or can it open up for arbitrary code execution down the road? Or maybe fry the nearest nuclear power plant?


#7

You’re absolutely right–careless use of @inbounds is dangerous, and even more so for a user-facing function. A segfault’s probably the worst that could happen, though, since the function is read-only.


#8

This is a bad assumption. Out of bounds accesses are undefined behavior, so LLVM may happily delete the entire code path calling this function with out of bounds indicies.