I’d like to use a datastructure with named indexes for optimization models (using JuMP or Optim) or simulation models in ordinary Julia. The main purpose of named indexes is to enhance readability of complex models. I’m willing to sacrifice a bit of performance for this, but naturally as little as possible. I’ve implemented some test code using NamedArrays vs Dictionaries of Tuples, see below. My questions based on the test results:
- Why is “Dict assignment” seven times faster than “Dict comprehension”?
- Why are NamedArrays three times faster than “Dict assignment”. (I understand why arrays with numerical indexes can be faster than dictionaries, but in this case both datastructures are indexed by name. Is it iterator efficiency somehow? I also tried OrderedDicts, but they were just as slow as ordinary Dicts.)
- NamedArrays seem to be the way to go, but they are still four times slower than ordinary arrays. Are there other datastructures I should consider or is this the best I can hope for with named indexes?
- On a side note, out of curiosity: Julia does not currently have syntax for redefining some elements of dictionaries or arrays as follows:
d4[i,j] = d1[i,j]*d2[i,j] for i in subset_of_I, j in subset_of_J
Would that syntax introduce parsing conflicts, or is it more a question of syntactic sugar that people haven’t asked for yet?
Here is my test code:
using NamedArrays, BenchmarkTools
I = [randstring(4) for i=1:100]
J = [randstring(4) for j=1:100]
d1 = Dict((i,j) => rand() for i in I, j in J)
d2 = Dict((i,j) => rand() for i in I, j in J)
println("\nDict comprehension\n", @benchmark d3 = Dict((i,j) => d1[i,j]*d2[i,j] for i in I, j in J))
println("\nDict assignment\n", @benchmark begin
d4 = Dict{Tuple{String,String}, Float64}()
for i in I, j in J
d4[i,j] = d1[i,j]*d2[i,j]
end
end)
n1 = NamedArray([rand() for i in I, j in J], (I,J))
n2 = NamedArray([rand() for i in I, j in J], (I,J))
println("\nNamedArray comprehension\n", @benchmark n3 = NamedArray([n1[i,j]*n2[i,j] for i in I, j in J], (I,J)))
println("\nNamedArray assignment\n", @benchmark begin
n4 = NamedArray(zeros(100,100), (I,J))
for i in I, j in J
n4[i,j] = n1[i,j]*n2[i,j]
end
end)
b1 = [rand() for i=1:100, j=1:100]
b2 = [rand() for i=1:100, j=1:100]
println("\nBasic Array comprehension (no name lookup)\n", @benchmark b3 = [b1[i,j]*b2[i,j] for i=1:100, j=1:100])
And here is the output:
Dict comprehension
BenchmarkTools.Trial:
memory estimate: 3.56 mb
allocs estimate: 120088
--------------
minimum time: 60.430 ms (0.00% GC)
median time: 61.437 ms (0.00% GC)
mean time: 62.008 ms (0.93% GC)
maximum time: 66.993 ms (6.28% GC)
--------------
samples: 81
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
Dict assignment
BenchmarkTools.Trial:
memory estimate: 2.04 mb
allocs estimate: 70163
--------------
minimum time: 7.331 ms (0.00% GC)
median time: 8.639 ms (0.00% GC)
mean time: 8.879 ms (3.19% GC)
maximum time: 13.601 ms (27.86% GC)
--------------
samples: 563
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
NamedArray comprehension
BenchmarkTools.Trial:
memory estimate: 1.93 mb
allocs estimate: 70261
--------------
minimum time: 2.537 ms (0.00% GC)
median time: 2.602 ms (0.00% GC)
mean time: 2.957 ms (10.76% GC)
maximum time: 7.928 ms (64.96% GC)
--------------
samples: 1690
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
NamedArray assignment
BenchmarkTools.Trial:
memory estimate: 882.22 kb
allocs estimate: 40355
--------------
minimum time: 4.068 ms (0.00% GC)
median time: 4.151 ms (0.00% GC)
mean time: 4.271 ms (2.41% GC)
maximum time: 7.848 ms (45.66% GC)
--------------
samples: 1170
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
Basic Array comprehension (no name lookup)
BenchmarkTools.Trial:
memory estimate: 547.14 kb
allocs estimate: 30007
--------------
minimum time: 694.766 μs (0.00% GC)
median time: 713.204 μs (0.00% GC)
mean time: 746.701 μs (3.31% GC)
maximum time: 2.600 ms (68.04% GC)
--------------
samples: 6689
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%