The Compiler indeed failed to optimize the code due to that the global variable was not typed.
Full technical details:
I included the suggestion of @karei and looked into the runtime behavior using CodeGlass, this is what I noticed:
Full Example code:
using CodeGlass
function one()
for i = 1:2
n = 1
for j = 1:100
n += j
end
end
end
function two()
n_T_local = 100
for i = 1:2
n = 1
for j = 1:n_T_local
n += j
end
end
end
n_T = 100
function three()
for i = 1:2
n = 1
for j = 1:n_T
n += j
end
end
end
@cgprofile one()
@cgprofile two()
@cgprofile three()
CodeGlass reported that function one() and two() was fully optimized and inlined by the compiler, no allocations or other calls to other methods.
however function three() reported the following:
# Reconstructed runtime executed code, showing methods and allocations:
function three()
for i = 1:2
n = 1
# Allocation of 2 UnitRange{Int64} for 32 bytes each iteration of i, total 64 bytes.
n_T_iter::UnitRange{Int64} = n_T
j_next::Union{Nothing, Tuple{Int64, Int64}} = (dynamic dispatch call) Main.Base.iterate(n_T_iter::UnitRange{Int64}) in range.jl:917
while j_next !== nothing # Iterated 100 times each iteration of i, 200 iterations in total.
# Allocation of 200 Tuple{Int64,Int64} for 32 bytes each iteration of j_next, total 6400 bytes
(item, state)::Tuple{Int64,Int64} = j_next
n = (dynamic dispatch call) Main.base.+(n::Int64, item::Int64)::Int64 in int.jl:87
# Allocation of 138 Int64 for 16 bytes each iteration after iteration 31 of j_next, total of 2208 bytes
?::Int64;
j_next = (dynamic dispatch call) Main.Base.iterate(n_T_iter::UnitRange{Int64}, item::Int64)::Union{Nothing, Tuple{Int64, Int64}} in range.jl:919
end
end
end
This clearly shows that it was unable to determine n_T at compile time, thus having to insert dynamic dispatch calls that needed to be resolved during runtime.
I also ran @ForceBru & @sgaure suggestions
n_T_typed::Int = 100
function four()
for i = 1:2
n = 1
for j = 1:n_T_typed
n += j
end
end
end
function sumit(n_t)
n = 1
for j = 1:n_t
n += j
end
return n
end
@cgprofile four()
@cgprofile sumit(100)
function four() and sumit() was also fully optimized and inlined by the compiler, no allocations no other calls to other methods.
Small note on sumit(n_t), the compiler did assume Int64 instead of Int:
sumit(n_t::Int64)::Int64