I’m having trouble finding the best way to include data in a module. Basically I have a small array (5 x 5) of constant data, but to interface with other packages, I want to define a function that just returns the element in the constant array at a given index.
From what I can tell, defining a constant array in the global scope of the module isn’t possible. Is that right? Is the best way to just define the data as a normal global array and then reference it inside my function? I can’t think of any other way to avoid having an array in the global scope of the module.
I ran some mini-benchmarks and it looks way more expensive to define the array inside the function (not surprisingly). Is there a better way to include constant data in a module than using a globally defined array?
Here’s a minimal working example that shows the difference between the global and local approach, with some quick @btime benchmarks.
using BenchmarkTools
# data defined globally
data_outer = [1.0 2.0; 3.0 4.0]
f_outer(x,y) = data_outer[x, y]
# data redefined inside function
function f_inner(x,y)
data_inner = [1.0 2.0; 3.0 4.0]
data_inner[x, y]
end
# large data defined globally
data_outer_large = rand(1_000, 1_000)
f_outer_large(x,y) = data_outer_large[x, y]
# large data redefined inside function
function f_inner_large(x,y)
data_inner_large = rand(1_000, 1_000)
data_inner_large[x, y]
end
Okay that definitely works, thanks - I should have benchmarked against that in the first place:
# global data
data = rand(10_000, 10_000)
f(x,y) = data[x, y]
# constant global data
const data_const = rand(10_000, 10_000)
f_const(x,y) = data_const[x, y]
@btime f(1,2) # 26.256 ns (1 allocation: 16 bytes)
@btime f_const(1,2) # 2.051 ns (0 allocations: 0 bytes)
My confusion was coming from the behavior that re-defining a scalar constant throws a warning but re-defining an element in a vector constant doesn’t throw a warning. I did some googling and came across this (very dated) issue which made me think that I wouldn’t get any additional performance.
Hmmm… isn’t that a wildly reckless way of using @inbounds, especially in a thread titled “Best practices”?
Just out of curiosity, is a segfault the worst thing that can happen from this or can it open up for arbitrary code execution down the road? Or maybe fry the nearest nuclear power plant?
You’re absolutely right–careless use of @inbounds is dangerous, and even more so for a user-facing function. A segfault’s probably the worst that could happen, though, since the function is read-only.
This is a bad assumption. Out of bounds accesses are undefined behavior, so LLVM may happily delete the entire code path calling this function with out of bounds indicies.