Hi,
I am trying to generate data for logistic regression y
(which is {0, 1}) from X
and β
. I already allocate the space for y
and then wrote function to computing the same in two different ways. I then use TimerOutputs.jl
to analyze the time and allocations.
Approach 1: Here the linear combination is first computed as z
and is then used for generating samples.
function generate1!(y, X, β)
@timeit "X * β" z = X * β
@timeit "p" p = logistic.(y)
@timeit "y" y .= rand.(Bernoulli.(p))
return y
end
Since, in the above approach, z
is an auxiliary variable with new allocation (?), I thought of reusing y
to compute the linear combination thinking it will save some allocations and hence time.
Approach 2: Here y
is reused to store the intermediate result of X * β
function generate2!(y, X, β)
@timeit "X * β" y .= X * β # reuse already allocated "y"
@timeit "p" p = logistic.(y)
@timeit "y" y .= rand.(Bernoulli.(p))
return y
end
Now I test it using the following code.
# Test data
X = randn(1000, 50)
β = rand(size(X, 2))
y = Vector(undef, size(X, 1))
# do precompile - both the test functions as well as timers
reset_timer!()
@timeit "" generate1!(y, X, β);
@timeit "" generate2!(y, X, β);
# Actual test
reset_timer!()
for i = 1:100
@timeit "generate1!()" generate1!(y, X, β);
@timeit "generate2!()" generate2!(y, X, β);
end
print_timer()
OUTPUT: After running, the second approach takes more time and more allocations.
───────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 1.09s / 3.77% 83.5MiB / 8.28%
Section ncalls time %tot avg alloc %tot avg
───────────────────────────────────────────────────────────────────────
generate1!() 100 11.3ms 27.5% 113μs 1.55MiB 22.4% 15.9KiB
z = X * β 100 2.72ms 6.62% 27.2μs 794KiB 11.2% 7.94KiB
p 100 5.16ms 12.6% 51.6μs 794KiB 11.2% 7.94KiB
y 100 3.17ms 7.72% 31.7μs 0.00B 0.00% 0.00B
generate2!() 100 29.8ms 72.5% 298μs 5.37MiB 77.6% 55.0KiB
y .= X * β 100 5.05ms 12.3% 50.5μs 2.30MiB 33.3% 23.6KiB <<< Extra allocation even on reuse of "y"
p 100 20.3ms 49.6% 203μs 3.06MiB 44.2% 31.3KiB <<< Extra allocation; don't know why
y 100 4.17ms 10.1% 41.7μs 7.81KiB 0.11% 80.0B <<< Extra allocation; don't know why
───────────────────────────────────────────────────────────────────────
Why this is happening?