Hi,
I am trying to better understand how closure works. I’ve inspected captured variable section of the manual, and tried inspecting the following functions.
function abmult(r::Int)
if r < 0
r = -r
end
f = x -> x * r
return f
end
function abmult2(r::Int)
f = x -> x * r
return f
end
function abmult3(r::Int)
if r < 0
r = -r
end
f = let r = r
x -> x * r
end
return f
end
mul1 = abmult(3)
mul2 = abmult2(3)
mul3 = abmult3(3)
@code_llvm mul1(5)
@code_llvm mul2(5)
@code_llvm mul3(5)
As explained in the manual, @code_llvm
of mul1
is a complete mess due to the parser’s inability to handle this code. On the other hand, mul2
and mul3
have identical @code_llvm
:
define i64 @"julia_#65_2320"([1 x i64]* nocapture nonnull readonly dereferenceable(8), i64) {
top:
%2 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0
%3 = load i64, i64* %2, align 8
%4 = mul i64 %3, %1
ret i64 %4
}
This has a lot of keywords I do not understand (nocapture
, nonull
etc.). Naively I would have thought the output will be identical to the following:
mul4 = x->x*5
@code_llvm debuginfo=:none mul4(3)
define i64 @"julia_#71_2323"(i64) {
top:
%1 = mul i64 %0, 5
ret i64 %1
}
since the captured variable can no longer change. On the other hand, despite having extra steps, my crude benchmarks could not detect meaningful difference between mul2
and mul4
:
A = rand(1000)
@btime sum($mul1, $A)
@btime sum($mul2, $A)
@btime sum($mul3, $A)
@btime sum($mul4, $A)
24.046 μs (2999 allocations: 46.86 KiB)
57.928 ns (0 allocations: 0 bytes)
57.923 ns (0 allocations: 0 bytes)
56.711 ns (0 allocations: 0 bytes)
My questions
- In a very broad streak, what does the output of
@code_llvm mul2(3)
mean? What is it doing? (I am very unfamiliar with LLVM IR, and only know most basic commands likeret
andmul
, so a highly dumbed-down version is very welcome). - It looks like
mul2
andmul3
are carrying around an extra variable, even though there is no way to modify it afterward. Why doesn’t it just get processed via constant propagation? - Should I expect any performance difference between
mul2
andmul4
? Should I expect any difference in more complicated cases?