This might be a silly question but I something which I do not understand. I was playing around with different implementations of a function to check the performance changes and sometimes I left an unnecessary where {T}
in the function line. Then I was wondering about weird results in the @btime
output and could not reproduce the fastest run, until I figured out that where {T}
has in an impact, even if no T
is defined or used anywhere:
function f1(x) where {T}
return x*2
end
function f2(x)
return x*2
end
function f3(::Type{T}, x) where {T}
return x*2
end
I thought all three functions will compile to the same machine code but f1
does is not playing the game:
a = rand(1000);
@btime f1.($a);
14.736 ÎĽs (2001 allocations: 39.19 KiB)
@btime f2.($a);
471.755 ns (1 allocation: 7.94 KiB)
@btime f3.(Bool, $a);
445.223 ns (1 allocation: 7.94 KiB)
What are those allocations and what’s happening here?
2 Likes
oheil
November 14, 2019, 10:34am
2
julia> @code_llvm f1(1.0)
; @ REPL[1]:2 within `f1'
; Function Attrs: uwtable
define nonnull %jl_value_t addrspace(10)* @japi3_f1_16234(%jl_value_t addrspace(10)**, %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
top:
%4 = alloca %jl_value_t addrspace(10)**, align 8
store volatile %jl_value_t addrspace(10)** %2, %jl_value_t addrspace(10)*** %4, align 8
%5 = call %jl_value_t*** inttoptr (i64 1801343456 to %jl_value_t*** ()*)() #3
%6 = bitcast %jl_value_t addrspace(10)** %2 to double addrspace(10)**
%7 = load double addrspace(10)*, double addrspace(10)** %6, align 8
; ┌ @ promotion.jl:314 within `*' @ float.jl:399
%8 = load double, double addrspace(10)* %7, align 8
%9 = fmul double %8, 2.000000e+00
; â””
%10 = bitcast %jl_value_t*** %5 to i8*
%11 = call noalias nonnull %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %10, i32 1744, i32 16) #1
%12 = bitcast %jl_value_t addrspace(10)* %11 to %jl_value_t addrspace(10)* addrspace(10)*
%13 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(10)* %12, i64 -1
store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 114258656 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(10)* %13
%14 = bitcast %jl_value_t addrspace(10)* %11 to double addrspace(10)*
store double %9, double addrspace(10)* %14, align 8
ret %jl_value_t addrspace(10)* %11
}
julia> @code_llvm f2(1.0)
; @ REPL[2]:2 within `f2'
; Function Attrs: uwtable
define double @julia_f2_16235(double) #0 {
top:
; ┌ @ promotion.jl:314 within `*' @ float.jl:399
%1 = fmul double %0, 2.000000e+00
; â””
ret double %1
}
For me this looks like LLVM is creating a Float64 from whatever is coming, but its more guessing than understanding.
I don’t understand why the LLVM code differs at all
1 Like
I also find it strange, since
julia> @code_warntype f1(1.0)
Variables
#self#::Core.Compiler.Const(f1, false)
x::Float64
Body::Float64
1 ─ %1 = (x * 2)::Float64
└── return %1
julia> @code_warntype f2(1.0)
Variables
#self#::Core.Compiler.Const(f2, false)
x::Float64
Body::Float64
1 ─ %1 = (x * 2)::Float64
└── return %1
I can replicate this problem on v"1.3.0-rc4.1"
. Also note that broadcasting is not needed for an MWE, as
julia> @btime f1(1.0)
15.680 ns (1 allocation: 16 bytes)
2.0
julia> @btime f2(1.0)
0.027 ns (0 allocations: 0 bytes)
2.0
Please check if there is an existing issue about this, and if not, open one.
2 Likes
Thanks for the feedback, I was using 1.3rc2.
Yes the broadcasting was a copy paste leftover
I’ll check older Julia versions too and look through the issues then.
1 Like
I suspect it is because @code_warntype
is lying to you
https://github.com/JuliaLang/julia/pull/32817
1 Like
Alright, I could not find anything related so I quickly opened an issue.
https://github.com/JuliaLang/julia/issues/33847
Update: I did a silly copy&paste mistake, which made me believe that in Julia 0.7 it’s “OK”. It is not… Julia 0.7 shows the same output.
3 Likes