Adding `@time` to a function gets rid of type-instability and makes code run 6x fast?!


#1

I am developing a function and I find it hard to boil down to MWE. But the weird thing is the function runs in about 13 seconds, so I try to @time various part of the program and the code runs in 2 seconds.

There is nothing different about the code except I have placed @time at 2 places. See

Pkg.clone("https://github.com/xiaodaigh/FastGroupBy.jl")
using FastGroupBy
byveccv = (rand(1:100,10_000_000), rand(1:100, 10_000_000))
    val = rand(1:5,10_000_000)

@time FastGroupBy.fgroupreduce!(+, byveccv, val, 0) # 2 seconds
@time FastGroupBy.fgroupreduce2!(+, byveccv, val, 0) # 13 seconds

and here is the code for the two functions


function fgroupreduce!(fn::F, byveccv::Tuple, val::Vector{Z}, v0::T = zero(T)) where {F<:Function, Z, T}
    l = length(val)
    index = collect(1:l)
    lb = length(byveccv)
    grouptwo!(byveccv[lb], index)
    @inbounds val .= val[index]
    
    @time @inbounds for i = lb-1:-1:1
        byveccv[i] .= byveccv[i][index]
    end

    @time @inbounds for i = lb-1:-1:1
        index .= collect(1:l)
        grouptwo!(byveccv[i], index)
        for j = lb:-1:i+1
            byveccv[j] .= byveccv[j][index]
        end
        val .= val[index]
    end

    diff2prev = da(byveccv)
    n_uniques = sum(diff2prev)

    upto::UInt = 0
    res = fill(v0, n_uniques)

    res[1] = v0
    resby = ((bv[diff2prev] for bv in byveccv)...)
    @inbounds for (vali, dp) in zip(val, diff2prev)
        # increase upto by 1 if it's different to previous value
        upto += UInt(dp)
        res[upto] = fn(res[upto], vali)
    end
    (resby..., res)
end

function fgroupreduce2!(fn::F, byveccv::Tuple, val::Vector{Z}, v0::T = zero(T)) where {F<:Function, Z, T}
    l = length(val)
    index = collect(1:l)
    lb = length(byveccv)
    grouptwo!(byveccv[lb], index)
    @inbounds val .= val[index]
    
    @inbounds for i = lb-1:-1:1
        byveccv[i] .= byveccv[i][index]
    end

    @inbounds for i = lb-1:-1:1
        index .= collect(1:l)
        grouptwo!(byveccv[i], index)
        for j = lb:-1:i+1
            byveccv[j] .= byveccv[j][index]
        end
        val .= val[index]
    end

    diff2prev = da(byveccv)
    n_uniques = sum(diff2prev)

    upto::UInt = 0
    res = fill(v0, n_uniques)

    res[1] = v0
    resby = ((bv[diff2prev] for bv in byveccv)...)
    @inbounds for (vali, dp) in zip(val, diff2prev)
        # increase upto by 1 if it's different to previous value
        upto += UInt(dp)
        res[upto] = fn(res[upto], vali)
    end
    (resby..., res)
end

#2

Did you run it once before timing it and then using @time?
@time needs the code to have been run once (as does the REPL) to get a fair timing

BenchmarkTools exports @btime which handles that automatically. When using BenchmarkTools put a dollar sign before external variables:

using BenchmarkTools

a = 5
b = 17

@btime divrem($b, $a)

#3

This is something different, @code_warntype shows different variables being inferred between the two functions.


#4

good catch


#5

I ran it multiple times. 13 seconds vs 2 seconds. Not due to compilation


#6

Yeah. Quite weird. The functions are exactly the same except for two places where @time is used. Might be something for the core developer to look into.


#7

Tuple size and closure bugs. This does the trick:

resby = let diff2prev=diff2prev, byveccv=byveccv
        resby = ntuple((x)->(byveccv[x][diff2prev]), Val{lb})
end

Edit: the tuple size is not a bug in Julia but in your code since you are making a generator then splatting it into a tuple, so the size is unknown at compile time.


#8

Changing @time @inbounds for... to foo = @inbounds for.. also reproduces. Not sure why that would avoid boxing diff2prev.