Way to have a function "mutate an immutable" without much performance loss


#1

As the title says, is there a way to pass an immutable into a function, have it allowed to be mutated, but make it have low overhead? It seems like wrapping it in some reference makes it about >2x slower:

using BenchmarkTools

a = Array{Float64}()
a[] = 1.0
function test_dim0_array(a)
  for i = 1:1000
    a[] += rand()
  end
end
test_dim0_array(a)
@benchmark test_dim0_array($a)
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.074 μs (0.00% GC)
  median time:      3.220 μs (0.00% GC)
  mean time:        3.610 μs (0.00% GC)
  maximum time:     28.469 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     8
  time tolerance:   5.00%
  memory tolerance: 1.00%
a = Ref{Float64}()
a[] = 1.0
@benchmark test_dim0_array($a)
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.374 μs (0.00% GC)
  median time:      2.472 μs (0.00% GC)
  mean time:        2.651 μs (0.00% GC)
  maximum time:     16.068 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%
type Container
  a::Float64
end
a = Container(1.0)
function test_container(a)
  for i = 1:1000
    a.a += rand()
  end
end
test_container(a)
@benchmark test_container($a)
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.374 μs (0.00% GC)
  median time:      2.537 μs (0.00% GC)
  mean time:        2.747 μs (0.00% GC)
  maximum time:     12.490 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%

For comparison, here it is just using the number:

function test_num(a)
  for i = 1:1000
    a += rand()
  end
end
test_num(1.0)
@benchmark test_num(1.0)
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.288 μs (0.00% GC)
  median time:      1.347 μs (0.00% GC)
  mean time:        1.479 μs (0.00% GC)
  maximum time:     12.149 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10
  time tolerance:   5.00%
  memory tolerance: 1.00%

Is there something better than using a Ref, or is this as good as it gets?


#2

Just for reference, on my machine I get the following mean times:
I tested 2 versions: a+=rand(), and also a+=i to eliminate rand() overwhelming times.
In both cases, the Ref and Container versions are not that much slower than the standard float64 function.

              a+=rand()          a+=i           
dim0 array      4.788           2.925
dim0 ref        3.325           0.907
container       3.203           0.902
float64         3.001           0.881

#3

Hmm, is your system image tuned to your setup? You’re getting a pretty high number for pure floats.


#4

Hmm, is your system image tuned to your setup? You’re getting a pretty high number for pure floats.

Probably not. Standard win64 download.

I did modify your code to return a (or equivalent a[], a.a) for all functions.
Benchmark for a+=rand() without returning a is 2.246 us.


#5

To be fair though, there is an easy way for a user to get around the issue that I am seeing. If they were passed a Ref, they can just de-reference it once if they have a bunch of calculations, and the resulting speed is pretty much the same as not using the Ref.

function test_dim0_array2(a_tmp)
  a = a_tmp[]
  for i = 1:1000
    a += rand()
  end
  a_tmp[] = a
  nothing
end
a = Ref{Float64}()
a[] = 1.0
@benchmark test_dim0_array2($a)
julia> @benchmark test_dim0_array2($a)
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.522 μs (0.00% GC)
  median time:      1.610 μs (0.00% GC)
  mean time:        1.792 μs (0.00% GC)
  maximum time:     7.084 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10
  time tolerance:   5.00%
  memory tolerance: 1.00%

Note that, since all except the number test mutate, only the number test should return the value:

function test_num(a)
  for i = 1:1000
    a += rand()
  end
  a
end

and when it returns the value, the number test gets slightly slower:

julia> @benchmark test_num($a)
BenchmarkTools.Trial:
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.552 μs (0.00% GC)
  median time:      1.610 μs (0.00% GC)
  mean time:        1.773 μs (0.00% GC)
  maximum time:     13.993 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10
  time tolerance:   5.00%
  memory tolerance: 1.00%

and ends up matches the single dereference test.

I’d still like to see if other people’s benchmarks match my results or @greg_plowman 's.