I use a local computer, consisting of an NVIDIA RTX 3090 GPU, with a Windows 10 system to run models in different programming languages and compare their performance. When comparing Julia (v1.9.3) and MATLAB (2023a), I find that MATLAB seems to have much better memory usage than Julia on a GPU, allowing it to run faster than Julia.
In order to make the program accessible for everyone to run and test, I provide a minimum example as follows. In this example, the results show that GPU computing in MATLAB is over 31 times faster than that in Julia. Additionally, MATLAB GPU memory usage (6.4 GB) is far lower than Julia’s (> 32 GB). I am surprised at the significant differences in performance. I would like to know whether Julia’s memory management is inefficient or if the Julia program could be revised to improve performance.
Here is a minimum example in Julia:
using CUDA function DGP(N) x = range(0, 1, N^2) return reshape(x, (N, N)) end function main(N) x = CuArray(DGP(N)) V0 = CUDA.ones(Float64, N); idx = () a = 0.5 max_iter = 100 iter = 0 while iter < max_iter V1 = copy(V0) V0, idx = findmax(x .+ a * V1', dims=2) @show iter += 1 end return V0, idx, iter end CUDA.@time CUDA.@sync V, idx, iter = main(2^13)
The following is the MATLAB version:
tic [V0, idx, iter] = main(2^13); toc function [res] = DGP(N) x = linspace(0, 1, N^2); res = reshape(x, N, N); end function [V0, idx, iter] = main(N) x = gpuArray(DGP(N)); V0 = gpuArray.ones(N, 1); idx = zeros(N, 1); a = 0.5; max_iter = 100; iter = 0; while iter < max_iter disp(['Start the iteration: ', num2str(iter+1)]) V1 = V0; [V0, idx] = max(x + a * V1', , 2); iter = iter + 1; end end
Under the N=2^13 case, the runtime of Julia is 17.5817 seconds, and that of MATLAB is 0.5653 seconds.
Here are Windows Task Manager screenshots when running the program in Julia and MATLAB, respectively.
The figures show that Julia uses much more memory in this problem, leading to more time spent resetting the memory. Meanwhile, MATLAB uses less memory, and its memory usage is far below the memory limit, so it does not spend any time cleaning the memory.