Evaluation speed during debugging

Hello,

During debugging with Julia v1.1, Atom 1.37.0 x64, Juno 0.8.3 and the latest Rebugger,Debugger, Revise and JulianInterpreter packages. I have noticed that a snippet of my code takes really long (several minutes) to evaluate in debug mode in comparision to just execution in Jupyter (few milliseconds). I don´t know if I have code something completely unusual for the Julia programming language or that I have forgotten something in the right handling of the debugger, because I´m a beginner in Julia, but maybe it is useful to share this issue. In my experience especially the logical comparisions of the following code snippet need relatively much time to evaluate in debug mode.

Best regards,
Volker

tmp_obj_func_val_of_solutions = Matrix([229.089388704372 56.3149480658474;
        253.587799294809 71.6934594453945;
	174.045446329888 95.4087482984243;
	314.713719005600 16.9789770374431;
	107.270176000269 4.17292201085117;
	409.423507539160 9.98841342722803;
	57.8677783059807 94.3392990341471;
	144.390663233047 26.2182710974064;
	232.138043405738 26.3135844177542;
	242.549084501498 59.9271862760961;
	306.231395638336 75.1473323917022;
	187.695129844772 87.3039104189180;
	154.020969004055 12.8634409242762;
	377.907857967353 40.5525109787078;
	14.1560415745987 27.6619598615001;
	28.0930582180656 84.5969388128942;
	283.752143943893 39.6536470993152;
	7.16623935655875 68.1360805626363;
	131.480007805545 15.3380316354526;
	292.874715190892 70.9747295528459;
	71.1688200182173 86.0581447363156;
	241.257347886778 92.7759035161938;
	133.734769208663 46.2504549481241;
	153.242542284531 2.73103063913056;
	286.538366914618 10.2479325697062;
	297.496939798786 26.1805297405844;
	22.3587162007941 28.5222985249201;
	324.906087262909 53.4568103165390;
	21.6849217203626 47.3065307149836;
	131.298914266567 12.7624597870792;
	24.9396552035601 68.6094218030466;
	302.739935777282 9.61713284182833;
	12.6746233729183 63.8776564259853;
	43.8806968842897 49.7861901104724;
	158.812145399203 13.4940310818536;
	299.973551583323 87.3259275869917;
	142.900640727849 63.2453967134893;
	264.167407316865 57.3064048423519;
	107.152728105518 4.23044621358957;
	122.956480444815 155.542456087301])


cur_pop_size = size(tmp_obj_func_val_of_solutions,1)

tmp_obj_func_val_of_solutions = hcat(tmp_obj_func_val_of_solutions,  zeros(cur_pop_size, num_obj_func + 2))

less_zero = repeat([false], cur_pop_size, num_obj_func)
less_equal_zero = copy(less_zero)
greater_zero = copy(less_zero)
greater_equal_zero = copy(less_zero)

all_solutions = Vector{Vector{Int}}(undef, cur_pop_size)
dominanted_solutions = zeros(cur_pop_size, 1)
for p in 1:cur_pop_size
    for k in 1:num_obj_func
        less_zero[:, k] = (tmp_obj_func_val_of_solutions[p, k] .- tmp_obj_func_val_of_solutions[:, k]) .< 0
        less_equal_zero[:, k] = (tmp_obj_func_val_of_solutions[p, k] .- tmp_obj_func_val_of_solutions[:, k]) .<= 0
        greater_zero[:, k] = .!less_equal_zero[:, k]
        greater_equal_zero[:, k] = .!less_zero[:, k]
    end

    less_zero_all = less_zero[:, 1]
    greater_zero_all = greater_zero[:, 1]
    for k in 2:num_obj_func
        less_zero_all = less_zero_all .& less_zero[:,k]
        greater_zero_all = greater_zero_all .& greater_zero[:,k]
    end
    less_equal_zero_all = repeat([false], cur_pop_size, 1)
    greater_equal_zero_all = copy(less_equal_zero_all)
    for k in 1:num_obj_func
        less_equal_zero_tmp = repeat([false], cur_pop_size, num_obj_func-1)
        greater_equal_zero_tmp = copy(less_equal_zero_tmp)
        aux_index = 1
        for n in 1:num_obj_func
            if n != k
                less_equal_zero_tmp[:, aux_index] = less_equal_zero[:, n]
                greater_equal_zero_tmp[:, aux_index] = greater_equal_zero[:, n]
                aux_index += 1
            end
        end

        less_equal_zero_all = (less_equal_zero_all .| (less_zero[:,k] .& (convert.(Bool, sum(convert.(Int, less_equal_zero_tmp), dims=2) .== num_obj_func-1))))
        greater_equal_zero_all = (greater_equal_zero_all .| (greater_zero[:,k] .& (convert.(Bool, sum(convert.(Int, greater_equal_zero_tmp), dims=2) .== num_obj_func-1))))
    end
    all_solutions[p]= DzBase.findtrue(less_zero_all .| less_equal_zero_all)
    dominanted_solutions[p] = length(DzBase.findtrue(greater_zero_all .| greater_equal_zero_all))
end
´´´

This is because when you use the debugger the code has to be stepped through by an interpreter, which is far slower than simply running compiled code, see here.

2 Likes

I’m surprised it’s thousands times slower (minutes vs milliseconds). Is this expected, or something that will improve greatly as the interpreter is refined?

I knew interpreted languages like python were slower than Julia but I thought maybe 50x slower, not 50000x slower…

1 Like

I have wondered this also. My company has a large package that we’ve been developing for some time. I’ve tried using the new Debugger, but it just hangs each time. After many minutes, I usually exit the process and revert to my former debugging methods. This isn’t a huge deal for me because I’m pretty accustomed to Revise + @show for pretty quick debugging. But I have been curious which factors might slow the interpreter in particular. My code is private, so I can’t share the details here, so just looking for any rules of thumb.

The most substantial extant analysis is in https://github.com/JuliaDebug/JuliaInterpreter.jl/pull/204, particularly https://github.com/JuliaDebug/JuliaInterpreter.jl/pull/204#issuecomment-478566966. Help wanted.

Main tip: debug small test cases, not big data sets.

5 Likes

The next version of Juno will have an option for toggling compiled execution (Debugger.jl already has that, IIRC), which might help in specific cases:
image

Apart from that you could push methods you know are expensive to execute but don’t want to debug into JuliaInterpreter.compilde_methods.

Both of those tips are very situational though, so

is much more effective.

6 Likes

Thanks for your help. I was also wondering why is there such a big difference, because I read about the 50x slower. Furthermore a 40x2 matrix isn´t so huge I would say, or not? Somehow I can´t switch to compiled mode by just pressing c if my julia repl is in debug mode.

Could you please explain your approach in a little more detail? It sounds like it could help me.

Hmm, that is strange. Now that I look at your test case, what exactly are you doing to run this in the interpreter? @interpret or Juno.@enter requires a function call, but all I see is a big loop at top level.

Also, when I try to run your code:

julia> include("debugslow.jl")
ERROR: LoadError: UndefVarError: num_obj_func not defined
Stacktrace:
 [1] top-level scope at /tmp/debugslow.jl:45
 [2] include at ./boot.jl:328 [inlined]
 [3] include_relative(::Module, ::String) at ./loading.jl:1094
 [4] include(::Module, ::String) at ./Base.jl:31
 [5] include(::String) at ./client.jl:432
 [6] top-level scope at REPL[1]:1
in expression starting at /tmp/debugslow.jl:45

With

function main()
    tmp_obj_func_val_of_solutions = Matrix([229.089388704372 56.3149480658474;
        253.587799294809 71.6934594453945;
    174.045446329888 95.4087482984243;
    314.713719005600 16.9789770374431;
    107.270176000269 4.17292201085117;
    409.423507539160 9.98841342722803;
    57.8677783059807 94.3392990341471;
    144.390663233047 26.2182710974064;
    232.138043405738 26.3135844177542;
    242.549084501498 59.9271862760961;
    306.231395638336 75.1473323917022;
    187.695129844772 87.3039104189180;
    154.020969004055 12.8634409242762;
    377.907857967353 40.5525109787078;
    14.1560415745987 27.6619598615001;
    28.0930582180656 84.5969388128942;
    283.752143943893 39.6536470993152;
    7.16623935655875 68.1360805626363;
    131.480007805545 15.3380316354526;
    292.874715190892 70.9747295528459;
    71.1688200182173 86.0581447363156;
    241.257347886778 92.7759035161938;
    133.734769208663 46.2504549481241;
    153.242542284531 2.73103063913056;
    286.538366914618 10.2479325697062;
    297.496939798786 26.1805297405844;
    22.3587162007941 28.5222985249201;
    324.906087262909 53.4568103165390;
    21.6849217203626 47.3065307149836;
    131.298914266567 12.7624597870792;
    24.9396552035601 68.6094218030466;
    302.739935777282 9.61713284182833;
    12.6746233729183 63.8776564259853;
    43.8806968842897 49.7861901104724;
    158.812145399203 13.4940310818536;
    299.973551583323 87.3259275869917;
    142.900640727849 63.2453967134893;
    264.167407316865 57.3064048423519;
    107.152728105518 4.23044621358957;
    122.956480444815 155.542456087301])


    cur_pop_size = size(tmp_obj_func_val_of_solutions,1)
    num_obj_func = 10

    tmp_obj_func_val_of_solutions = hcat(tmp_obj_func_val_of_solutions,  zeros(cur_pop_size, num_obj_func + 2))

    less_zero = repeat([false], cur_pop_size, num_obj_func)
    less_equal_zero = copy(less_zero)
    greater_zero = copy(less_zero)
    greater_equal_zero = copy(less_zero)
    p = 1
    for k in 1:num_obj_func
        less_zero[:, k] = (tmp_obj_func_val_of_solutions[p, k] .- tmp_obj_func_val_of_solutions[:, k]) .< 0
        less_equal_zero[:, k] = (tmp_obj_func_val_of_solutions[p, k] .- tmp_obj_func_val_of_solutions[:, k]) .<= 0
        greater_zero[:, k] = .!less_equal_zero[:, k]
        greater_equal_zero[:, k] = .!less_zero[:, k]
    end
end

I’m getting a pretty bad slowdown:

julia> @btime main()
  34.359 μs (171 allocations: 195.02 KiB)

julia> @btime JuliaInterpreter.@interpret main()
  14.982 s (84683621 allocations: 2.74 GiB)

Seems like most of the time is spent in maybe_evaluate_builtin, get_call_framecode, and prepare_frame_caller (in eval_rhs -> evaluate_call_recurse!).

2 Likes

It is just a snippet of function. Because the other parts of the function evaluate in expected evaluation time, I just copied the part that evaluates slow. I enter my function with Juno.@run

In addition to @profile's ability to attribute time to the interpreter itself, methinks we need an “interpreting profiler,” one that attributes the cost to the statement being interpreted. EDIT: https://github.com/JuliaDebug/JuliaInterpreter.jl/pull/292

5 Likes

So here’s a partial list of the interpreted methods, together with a count of the number of statement evaluations that occurred in each (this is essentially the “flat” view of profiling). I only kept ones with more than 100k executed statements. In increasing order, they are

Integer(x::Integer) in Core at boot.jl:765 => 162400
_indices_sub(S::SubArray) in Base at subarray.jl:398 => 324640
IndexLinear() in Base at indices.jl:30 => 330082
first(r::Base.OneTo{T}) where T in Base at range.jl:563 => 335688
<(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8} in Base at int.jl:49 => 347596
==(x::T, y::T) where T<:Number in Base at promotion.jl:403 => 348542
&(x::Bool, y::Bool) in Base at bool.jl:40 => 349364
eachindex(::IndexLinear, A::AbstractArray{T,1} where T) in Base at abstractarray.jl:214 => 491880
(::Type{Base.OneTo})(stop::T) where T<:Integer in Base at range.jl:317 => 498876
max(x::T, y::T) where T<:Real in Base at promotion.jl:414 => 499116
toInt64(x::Int64) in Core at boot.jl:653 => 503400
getindex(t::Tuple, i::Int64) in Base at tuple.jl:24 => 555309
unsafe_indices(r::AbstractRange) in Base at abstractarray.jl:97 => 649280
axes1(A::AbstractArray) in Base at abstractarray.jl:93 => 656324
setindex!(A::Array{T,N} where N, x, i1::Int64) where T in Base at array.jl:767 => 664004
first(r::OrdinalRange{T,S} where S) where T in Base at range.jl:562 => 670744
<=(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8} in Base at int.jl:428 => 688450
checkbounds(::Type{Bool}, A::AbstractArray, i) in Base at abstractarray.jl:433 => 811400
axes(S::SubArray) in Base at subarray.jl:397 => 811600
_indices_sub(S::SubArray, i1::AbstractArray, I...) in Base at subarray.jl:401 => 973920
checkbounds(A::AbstractArray, I...) in Base at abstractarray.jl:448 => 976560
checkindex(::Type{Bool}, inds::AbstractUnitRange, i::Real) in Base at abstractarray.jl:506 => 993600
-(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} in Base at int.jl:52 => 996654
zero(::Type{T}) where T<:Number in Base at number.jl:263 => 998044
one(::Type{T}) where T<:Number in Base at number.jl:297 => 1003998
+(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} in Base at int.jl:53 => 1035844
unsafe_length(r::AbstractUnitRange) in Base at range.jl:513 => 1136240
(::Type{Base.OneTo{T}})(stop) where T<:Integer in Base at range.jl:308 => 1164044
step(r::AbstractUnitRange{T}) where T in Base at range.jl:499 => 1330916
oneunit(::Type{T}) where T in Base at number.jl:321 => 1501719
Int64(x::Union{Bool, Int32, Int64, UInt32, UInt64, UInt8, Int128, Int16, Int8, UInt128, UInt16}) in Core at boot.jl:733 => 1510200
iterate(r::OrdinalRange{T,S} where S, i) where T in Base at range.jl:593 => 1879755
last(r::OrdinalRange{T,S} where S) where T in Base at range.jl:567 => 2012492
setindex!(V::SubArray{#s72,1,P,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where P where #s72, x, i::Int64) in Base at subarray.jl:291 => 2109120
fill!(A::AbstractArray{T,N} where N, x) where T in Base at multidimensional.jl:837 => 2109360
convert(::Type{T}, x::T) where T<:Number in Base at number.jl:6 => 2180167
getproperty(x, f::Symbol) in Base at sysimg.jl:18 => 3540033

It doesn’t seem like there’s anything truly unexpected here, though I am slightly amazed at the number of statements executed. Aggregate number of statements executed is 38294027, for an average of ~4μs per statement. That’s pretty much in line with my previous estimates, so unfortunately I don’t think this is a special case.

5 Likes

To give some reference, this is what Julia compiles iterate on a matrix to:

julia> @code_native iterate(a)
        .text
; ┌ @ array.jl:704 within `iterate' @ array.jl:704
        cmpq    $0, 8(%rsi)
        jle     L29
; │┌ @ array.jl:728 within `getindex'
        movq    (%rsi), %rax
        movq    (%rax), %rax
; │└
; │ @ array.jl:704 within `iterate'
        movq    %rax, (%rdi)
        movq    $2, 8(%rdi)
        movb    $2, %dl
        xorl    %eax, %eax
        retq
L29:
        movb    $1, %dl
        xorl    %eax, %eax
; │ @ array.jl:704 within `iterate'
        retq
        nopw    %cs:(%rax,%rax)
; └

It is basically nothing.

And here is an interpreter session stepping through the same

julia> a = rand(2,2); @enter iterate(a)
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (rem)(1, UInt64)
1|debug> s
In rem(x, #unused#) at int.jl:464
>464              @eval rem(x::($from), ::Type{$to}) = bitcast($to, x)

About to run: (bitcast)(UInt64, 1)
1|debug> 
In rem(x, #unused#) at int.jl:464
>464              @eval rem(x::($from), ::Type{$to}) = bitcast($to, x)

About to run: return 0x0000000000000001
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (-)(0x0000000000000001, 1)
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
>871          T = promote_typeof(a, b)
 872          aT, bT = a % T, b % T
 873          not_sametype((a, b), (aT, bT))
 874          return $op(aT, bT)
 875      end

About to run: (Base.promote_typeof)(0x0000000000000001, 1)
1|debug> 
In promote_typeof(x, xs) at promotion.jl:264
>264  promote_typeof(x, xs...) = (@_inline_meta; promote_type(typeof(x), promote_typeof(xs...)))

About to run: (typeof)(0x0000000000000001)
1|debug> 
In promote_typeof(x, xs) at promotion.jl:264
>264  promote_typeof(x, xs...) = (@_inline_meta; promote_type(typeof(x), promote_typeof(xs...)))

About to run: (Base.promote_typeof)(1)
1|debug> 
In promote_typeof(x) at promotion.jl:263
>263  promote_typeof(x) = typeof(x)

About to run: (typeof)(1)
1|debug> 
In promote_typeof(x) at promotion.jl:263
>263  promote_typeof(x) = typeof(x)

About to run: return Int64
1|debug> 
In promote_typeof(x, xs) at promotion.jl:264
>264  promote_typeof(x, xs...) = (@_inline_meta; promote_type(typeof(x), promote_typeof(xs...)))

About to run: (promote_type)(UInt64, Int64)
1|debug> 
In promote_type(#unused#, #unused#) at promotion.jl:215
 217      # and there is a fallback returning Bottom below, so the common case is
 218      #   promote_type(T, S) =>
 219      #   promote_result(T, S, result, Bottom) =>
 220      #   typejoin(result, Bottom) => result
>221      promote_result(T, S, promote_rule(T,S), promote_rule(S,T))
 222  end

About to run: (promote_rule)(UInt64, Int64)
1|debug> 
In promote_rule(#unused#, #unused#) at int.jl:628
>628  promote_rule(::Type{UInt64},  ::Type{Int64} ) = UInt64

About to run: return UInt64
1|debug> 
In promote_type(#unused#, #unused#) at promotion.jl:215
 217      # and there is a fallback returning Bottom below, so the common case is
 218      #   promote_type(T, S) =>
 219      #   promote_result(T, S, result, Bottom) =>
 220      #   typejoin(result, Bottom) => result
>221      promote_result(T, S, promote_rule(T,S), promote_rule(S,T))
 222  end

About to run: (promote_rule)(Int64, UInt64)
1|debug> 
In promote_rule(#unused#, #unused#) at promotion.jl:233
>233  promote_rule(::Type{<:Any}, ::Type{<:Any}) = Bottom

About to run: return Union{}
1|debug> 
In promote_type(#unused#, #unused#) at promotion.jl:215
 217      # and there is a fallback returning Bottom below, so the common case is
 218      #   promote_type(T, S) =>
 219      #   promote_result(T, S, result, Bottom) =>
 220      #   typejoin(result, Bottom) => result
>221      promote_result(T, S, promote_rule(T,S), promote_rule(S,T))
 222  end

About to run: (Base.promote_result)(UInt64, Int64, UInt64, Union{})
1|debug> 
In promote_result(#unused#, #unused#, #unused#, #unused#) at promotion.jl:239
>239  promote_result(::Type{<:Any},::Type{<:Any},::Type{T},::Type{S}) where {T,S} = (@_inline_meta; promote_type(T,S))

About to run: (promote_type)(UInt64, Union{})
1|debug> 
In promote_type(#unused#, #unused#) at promotion.jl:211
>211  promote_type(::Type{T}, ::Type{Bottom}) where {T} = T

About to run: return UInt64
1|debug> 
In promote_result(#unused#, #unused#, #unused#, #unused#) at promotion.jl:239
>239  promote_result(::Type{<:Any},::Type{<:Any},::Type{T},::Type{S}) where {T,S} = (@_inline_meta; promote_type(T,S))

About to run: return UInt64
1|debug> 
In promote_type(#unused#, #unused#) at promotion.jl:215
 217      # and there is a fallback returning Bottom below, so the common case is
 218      #   promote_type(T, S) =>
 219      #   promote_result(T, S, result, Bottom) =>
 220      #   typejoin(result, Bottom) => result
>221      promote_result(T, S, promote_rule(T,S), promote_rule(S,T))
 222  end

About to run: return UInt64
1|debug> 
In promote_typeof(x, xs) at promotion.jl:264
>264  promote_typeof(x, xs...) = (@_inline_meta; promote_type(typeof(x), promote_typeof(xs...)))

About to run: return UInt64
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
>872          aT, bT = a % T, b % T
 873          not_sametype((a, b), (aT, bT))
 874          return $op(aT, bT)
 875      end

About to run: (rem)(0x0000000000000001, UInt64)
1|debug> 
In rem(x, #unused#) at int.jl:493
>493  rem(x::T, ::Type{T}) where {T<:Integer} = x

About to run: return 0x0000000000000001
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
>872          aT, bT = a % T, b % T
 873          not_sametype((a, b), (aT, bT))
 874          return $op(aT, bT)
 875      end

About to run: (rem)(1, UInt64)
1|debug> 
In rem(x, #unused#) at int.jl:464
>464              @eval rem(x::($from), ::Type{$to}) = bitcast($to, x)

About to run: (bitcast)(UInt64, 1)
1|debug> 
In rem(x, #unused#) at int.jl:464
>464              @eval rem(x::($from), ::Type{$to}) = bitcast($to, x)

About to run: return 0x0000000000000001
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
 872          aT, bT = a % T, b % T
>873          not_sametype((a, b), (aT, bT))
 874          return $op(aT, bT)
 875      end

About to run: (tuple)(0x0000000000000001, 1)
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
 872          aT, bT = a % T, b % T
>873          not_sametype((a, b), (aT, bT))
 874          return $op(aT, bT)
 875      end

About to run: (tuple)(0x0000000000000001, 0x0000000000000001)
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
 872          aT, bT = a % T, b % T
>873          not_sametype((a, b), (aT, bT))
 874          return $op(aT, bT)
 875      end

About to run: (Base.not_sametype)((0x0000000000000001, 1), (0x0000000000000001, 0x0000000000000001))
1|debug> 
In not_sametype(x, y) at promotion.jl:304
>304  not_sametype(x, y) = nothing

About to run: return
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
 872          aT, bT = a % T, b % T
 873          not_sametype((a, b), (aT, bT))
>874          return $op(aT, bT)
 875      end

About to run: (-)(0x0000000000000001, 0x0000000000000001)
1|debug> 
In -(x, y) at int.jl:52
>52  (-)(x::T, y::T) where {T<:BitInteger} = sub_int(x, y)

About to run: (sub_int)(0x0000000000000001, 0x0000000000000001)
1|debug> 
In -(x, y) at int.jl:52
>52  (-)(x::T, y::T) where {T<:BitInteger} = sub_int(x, y)

About to run: return 0x0000000000000000
1|debug> 
In -(a, b) at int.jl:871
 870      @eval function $op(a::Integer, b::Integer)
 871          T = promote_typeof(a, b)
 872          aT, bT = a % T, b % T
 873          not_sametype((a, b), (aT, bT))
>874          return $op(aT, bT)
 875      end

About to run: return 0x0000000000000000
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (length)([0.1916999763887164 0.5944419303038921; 0.3440468271194841 0.062447438819570156])
1|debug> 
In length(a) at array.jl:200
>200  length(a::Array) = arraylen(a)

About to run: (arraylen)([0.1916999763887164 0.5944419303038921; 0.3440468271194841 0.062447438819570156])
1|debug> 
In length(a) at array.jl:200
>200  length(a::Array) = arraylen(a)

About to run: return 4
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (<)(0x0000000000000000, 4)
1|debug> 
In <(x, y) at int.jl:430
>430  <( x::BitUnsigned, y::BitSigned  ) = (y >= 0) & (x <  unsigned(y))

About to run: (>=)(4, 0)
1|debug> 
In >=(x, y) at operators.jl:341
>341  >=(x, y) = (y <= x)

About to run: (<=)(0, 4)
1|debug> 
In <=(x, y) at int.jl:424
>424  (<=)(x::T, y::T) where {T<:BitSigned}   = sle_int(x, y)

About to run: (sle_int)(0, 4)
1|debug> 
In <=(x, y) at int.jl:424
>424  (<=)(x::T, y::T) where {T<:BitSigned}   = sle_int(x, y)

About to run: return true
1|debug> 
In >=(x, y) at operators.jl:341
>341  >=(x, y) = (y <= x)

About to run: return true
1|debug> 
In <(x, y) at int.jl:430
>430  <( x::BitUnsigned, y::BitSigned  ) = (y >= 0) & (x <  unsigned(y))

About to run: (unsigned)(4)
1|debug> 
In unsigned(x) at essentials.jl:372
>372  unsigned(x::Int) = reinterpret(UInt, x)

About to run: (reinterpret)(UInt64, 4)
1|debug> 
In reinterpret(#unused#, x) at essentials.jl:417
>417  reinterpret(::Type{T}, x) where {T} = bitcast(T, x)

About to run: (bitcast)(UInt64, 4)
1|debug> 
In reinterpret(#unused#, x) at essentials.jl:417
>417  reinterpret(::Type{T}, x) where {T} = bitcast(T, x)

About to run: return 0x0000000000000004
1|debug> 
In unsigned(x) at essentials.jl:372
>372  unsigned(x::Int) = reinterpret(UInt, x)

About to run: return 0x0000000000000004
1|debug> 
In <(x, y) at int.jl:430
>430  <( x::BitUnsigned, y::BitSigned  ) = (y >= 0) & (x <  unsigned(y))

About to run: (<)(0x0000000000000000, 0x0000000000000004)
1|debug> 
In <(x, y) at int.jl:423
>423  (< )(x::T, y::T) where {T<:BitUnsigned} = ult_int(x, y)

About to run: (ult_int)(0x0000000000000000, 0x0000000000000004)
1|debug> 
In <(x, y) at int.jl:423
>423  (< )(x::T, y::T) where {T<:BitUnsigned} = ult_int(x, y)

About to run: return true
1|debug> 
In <(x, y) at int.jl:430
>430  <( x::BitUnsigned, y::BitSigned  ) = (y >= 0) & (x <  unsigned(y))

About to run: (&)(true, true)
1|debug> 
In &(x, y) at bool.jl:40
>40  (&)(x::Bool, y::Bool) = and_int(x, y)

About to run: (and_int)(true, true)
1|debug> 
In &(x, y) at bool.jl:40
>40  (&)(x::Bool, y::Bool) = and_int(x, y)

About to run: return true
1|debug> 
In <(x, y) at int.jl:430
>430  <( x::BitUnsigned, y::BitSigned  ) = (y >= 0) & (x <  unsigned(y))

About to run: return true
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (getindex)(<suppressed 80 bytes of output>, 1)
1|debug> 
In getindex(A, i1) at array.jl:728
>728  @eval getindex(A::Array, i1::Int) = arrayref($(Expr(:boundscheck)), A, i1)

About to run: (Core.arrayref)(true, <suppressed 80 bytes of output>, 1)
1|debug> 
In getindex(A, i1) at array.jl:728
>728  @eval getindex(A::Array, i1::Int) = arrayref($(Expr(:boundscheck)), A, i1)

About to run: return 0.1916999763887164
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (+)(1, 1)
1|debug> 
In +(x, y) at int.jl:53
>53  (+)(x::T, y::T) where {T<:BitInteger} = add_int(x, y)

About to run: (add_int)(1, 1)
1|debug> 
In +(x, y) at int.jl:53
>53  (+)(x::T, y::T) where {T<:BitInteger} = add_int(x, y)

About to run: return 2
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: (tuple)(0.1916999763887164, 2)
1|debug> 
In iterate(A, i) at array.jl:704
>704  iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

About to run: $(Expr(:inbounds, :pop))
1|debug> 
(0.1916999763887164, 2)

That is 62 statements, each that need to be matched against. For each function call, data structures for local variables and temporary variables etc need to be setup. For each of the intrinsics a ccall into the runtime is made to evaluate it.

Julia code is designed with the fact that there is an optimizing compiler in mind. No one worries about cost of function calls or stuff like typeof that are computed statically. On the other hand, the interpreter (with its current implementation) deals with it all. What if someone put a breakpoint in the convert(x, ::Any) = x method? We have put in some work to make the cost per instruction not too big but as Tim already said, when there is a huge number of instructions it will take a long time no matter what. So to get to the next order of magnitude speedups, more sophisticated optimization techniques are needed.

Personally, I feel it is a bit silly to compare a ~6 man month effort into interpreting a compiled language against Python which was designed from the start for interpretation, has decades of work done on it with support of many very large companies etc.

8 Likes