How to loop in script properly? (Perhaps CUDA specific)

ChunXi_Zhang · June 15, 2021, 10:28am

I have some code which is slow in the first run, for example my script is like:

function loop(x)
    # something very complicated.
end

function main()
    t = time()
    x = rand(100)
    for i in 1:3
        loop(x)
        println(time() - t)
        t = time()
    end
end

main()

I found that the first run of main() is very slow, even when the loop has stepped into the non-first time call of loop()

56.61500000953674
43.842000007629395
47.085999965667725

but if i rerun the main() in the same REPL again, it’s fast

1.872999906539917
1.8619999885559082
1.7840001583099365

But since the main function is the one for loop, and it’s very very slow in the first run, I cannot wait to run it for the first time to warm up. Since it’s actually slow for N times not just 1 time.
How to write loop in script in julia properly, so I can just julia --project=./ script.jl and it runs as fast as possible(I hope only the first time of the loop is slow)?

jling · June 15, 2021, 11:01am

function loop(x)
    # something very complicated.
end

    t = time()
    x = rand(100)
    for i in 1:3
        loop(x)
        println(time() - t)
        t = time()
    end

how about this?

ChunXi_Zhang · June 15, 2021, 11:14am

then it’s leaking more variable into the global scope, and use of global variable in loop, isn’t it not recommended by Julia?

Btw I tried to run it, and it shows

17.370999813079834
1.753000020980835
1.6850001811981201

Even the first time is much faster than put it in main()

Skoffer · June 15, 2021, 11:22am

Just out of curiosity, loop(x) is self contained or it uses global variables?

ChunXi_Zhang · June 15, 2021, 11:23am

it uses variables defined in main(), so no global variables

ChunXi_Zhang · June 15, 2021, 11:26am

Another question is that there must be many other functions involving loops, so in those functions, the whole loop is slow for the first time run of the function?
And because the time is constantly over 40s, it doesn’t look like a compile time for me…

ChunXi_Zhang · June 15, 2021, 1:24pm

Can you explain why put it into the main could make things so slow?
(56s vs 17s and following 43s vs 1.8s)

I tried to put the loop length into the parameter of the main, and strangely, the loop time is now 10s per loop, and will not speed up to 1s for the second run…

I tried to write a MME but I found that with simple function it’s hard to replicate this problem. I don’t know what causes this, in which case a extra slow compilation will happen, and in which case the for loop is slowed down permanently… I checked common problems like type stability, global const problem, it doesn’t seem to be the case here.

Btw maybe I should mention that this loop function contains complicated CUDA function and custom cuda kernels?

jling · June 15, 2021, 1:31pm

oh well then you should ask CUDA people

ChunXi_Zhang · June 15, 2021, 1:45pm

using CUDA
using Statistics

function some_func(x, y)
    a = CUDA.rand(1000)
    b = CUDA.rand(1000)
    c = a .* b
    return c .* x .* y
end

function main()
    result = []
    count = 0
    for i in 1:3
        t = time()
        for j in 0:20
            for k in 1:20
                count += 1
                for m in 1:20
                    a = CUDA.rand(1000)
                    b = CUDA.rand(1000)
                    c = some_func(a, b)
                    push!(result, mean(c))
                end
            end
        end
        println(time() - t)
    end
end

main()

Here I did successfully replicated the problem with a MME, the timing is:
First run:

42.087000131607056
33.35199999809265
34.365999937057495

Second run

1.1009998321533203
0.8480000495910645
1.4190001487731934

Without main()

10.562000036239624
0.935999870300293
1.316999912261963

oheil · June 15, 2021, 1:50pm

Changed title to attract CUDA experts.

stillyslalom · June 15, 2021, 3:33pm

This may be a regression. On CUDA.jl v3.1.0,

julia> main() # first run
29.039000034332275
1.7300000190734863
1.61899995803833

On v3.3.0,

julia> main() # first run
80.84599995613098
61.09299993515015
63.625

julia> main() # second run
1.5520000457763672
1.7239999771118164
1.375999927520752

stillyslalom · June 15, 2021, 3:44pm

xref Performance issue with complicated loops in function · Issue #984 · JuliaGPU/CUDA.jl · GitHub

Topic		Replies	Views
Question - for loop - variable assignments Performance	5	1184	January 29, 2020
What makes Julia loops so fast? General Usage question	8	1053	December 20, 2021
Question on simple performance comparison between Python and Julia General Usage question	23	1028	June 13, 2023
Julia is surprisingly slow for some simple iteration New to Julia	7	883	September 20, 2023
For Loop optimisation in Julia New to Julia	1	366	July 12, 2022

How to loop in script properly? (Perhaps CUDA specific)

Related topics