Best practices for including data in a module as a constant array

ajkeith · May 22, 2018, 4:56pm

I’m having trouble finding the best way to include data in a module. Basically I have a small array (5 x 5) of constant data, but to interface with other packages, I want to define a function that just returns the element in the constant array at a given index.

From what I can tell, defining a constant array in the global scope of the module isn’t possible. Is that right? Is the best way to just define the data as a normal global array and then reference it inside my function? I can’t think of any other way to avoid having an array in the global scope of the module.

I ran some mini-benchmarks and it looks way more expensive to define the array inside the function (not surprisingly). Is there a better way to include constant data in a module than using a globally defined array?

Here’s a minimal working example that shows the difference between the global and local approach, with some quick @btime benchmarks.

using BenchmarkTools

# data defined globally
data_outer = [1.0 2.0; 3.0 4.0]
f_outer(x,y) = data_outer[x, y]

# data redefined inside function
function f_inner(x,y)
    data_inner = [1.0 2.0; 3.0 4.0]
    data_inner[x, y]
end

# large data defined globally
data_outer_large = rand(1_000, 1_000)
f_outer_large(x,y) = data_outer_large[x, y]

# large data redefined inside function
function f_inner_large(x,y)
    data_inner_large = rand(1_000, 1_000)
    data_inner_large[x, y]
end

julia> @btime f_outer(1,2)
  26.666 ns (1 allocation: 16 bytes)
2.0

julia> @btime f_inner(1,2)
  41.318 ns (1 allocation: 112 bytes)
2.0

julia> @btime f_outer_large(1,2)
  28.890 ns (1 allocation: 16 bytes)
0.6037250079523222

julia> @btime f_inner_large(1,2)
  4.157 ms (2 allocations: 7.63 MiB)
0.36772602420929323

ExpandingMan · May 22, 2018, 5:05pm

Huh? It’s definitely possible. Why do you think it isn’t?

const data_outer = [1.0 2.0; 3.0 4.0]

should work just fine. Is there some other piece to this that I’m missing?

ajkeith · May 22, 2018, 5:25pm

Okay that definitely works, thanks - I should have benchmarked against that in the first place:

# global data
data =  rand(10_000, 10_000)
f(x,y) = data[x, y]

# constant global data
const data_const = rand(10_000, 10_000)
f_const(x,y) = data_const[x, y]

@btime f(1,2) # 26.256 ns (1 allocation: 16 bytes)
@btime f_const(1,2) #   2.051 ns (0 allocations: 0 bytes)

My confusion was coming from the behavior that re-defining a scalar constant throws a warning but re-defining an element in a vector constant doesn’t throw a warning. I did some googling and came across this (very dated) issue which made me think that I wouldn’t get any additional performance.

stillyslalom · May 22, 2018, 5:43pm

This may be a premature optimization, but it’s worth checking out StaticArrays.jl for cases like this:

edit: it’s fast, but wildly unsafe, with a risk of segfaults

using StaticArrays

const stackdata = @SArray(rand(5,5))
const heapdata = rand(5,5)

getstack(i,j) = @inbounds stackdata[i,j]
getheap(i,j) = @inbounds heapdata[i,j]

julia> using BenchmarkTools

julia> @btime getstack(1,2)
  0.042 ns (0 allocations: 0 bytes)

julia> @btime getheap(1,2)
  1.550 ns (0 allocations: 0 bytes)

ExpandingMan · May 22, 2018, 5:46pm

Oh, I see, I missed that you were trying to mutate it.

NiclasMattsson · May 22, 2018, 10:00pm

Hmmm… isn’t that a wildly reckless way of using @inbounds, especially in a thread titled “Best practices”?

Just out of curiosity, is a segfault the worst thing that can happen from this or can it open up for arbitrary code execution down the road? Or maybe fry the nearest nuclear power plant?

stillyslalom · May 22, 2018, 11:22pm

You’re absolutely right–careless use of @inbounds is dangerous, and even more so for a user-facing function. A segfault’s probably the worst that could happen, though, since the function is read-only.

Keno · May 23, 2018, 1:23am

This is a bad assumption. Out of bounds accesses are undefined behavior, so LLVM may happily delete the entire code path calling this function with out of bounds indicies.

Topic		Replies	Views
On modules and globals General Usage module	14	840	January 8, 2022
Global variables / performance / data passing New to Julia	21	3256	January 2, 2019
Use constant variable from outer module in inner module New to Julia question	3	483	October 9, 2020
Using module variables in a module function. Is there a performance disadvantage? Performance question	4	376	July 3, 2020
How to prevent using global arrays in iteration in the module General Usage	11	881	August 5, 2021

Best practices for including data in a module as a constant array

Related topics