Benchmark function that uses CUDA.jl

Hello all,

I have created a function that computes the eigenvalues of large sparse matrices. I have implemented it both on CPU and GPU. I used CUDA.jl for portions of code that were computationally expensive. Now, I want to compare the two implementations but I am not sure about the handling of the GPU version.

I have created the following function in order to benchmark my operation:

function bench()
    file = matopen("path_to_file")
    Problem = read(file,"Problem");
    A::SparseMatrixCSC{FLOAT} = Problem["A"];
    @timeit to "RBL_gpu" CUDA.@time d,_ = RBL_gpu(A,25,10);
end

I call it with that way:

# d,_ = RBL_gpu(sprandn(FLOAT,50,50,0.5),1,10);
to = TimerOutput();
bench();
show(to);

I have noticed that if I call my function first with a small matrix, just to warm up things, the execution time is improved. For example,

with the warm up: 12 seconds
without the warm up: 25 seconds

So, my question is which of the two is the most accurate? Am I cheating with the “warm up” or it is a good practice?
I am thinking that a user would need to call the function only once. Maybe I could create an interface and place the warm-up call inside it.

Sorry for the long text and thank you in advance.

First of all, welcome on Discourse and hopefully you had a great experience with Julia so far!

What you see, is the typical Julia phenomenon (and in any other JIT compiled languages).
Unless you don’t include the compilation time in C, Fortran etc. it would be fair to take the second measurement.

In principle you can ship a compiled version to the user with PackageCompiler.jl

That was the answer I was hoping for! Furthermore, the package seems very useful. Thank you!

1 Like