Hello,
I use a local computer, consisting of an NVIDIA RTX 3090 GPU, with a Windows 10 system to run models in different programming languages and compare their performance. When comparing Julia (v1.9.3) and MATLAB (2023a), I find that MATLAB seems to have much better memory usage than Julia on a GPU, allowing it to run faster than Julia.
In order to make the program accessible for everyone to run and test, I provide a minimum example as follows. In this example, the results show that GPU computing in MATLAB is over 31 times faster than that in Julia. Additionally, MATLAB GPU memory usage (6.4 GB) is far lower than Juliaβs (> 32 GB). I am surprised at the significant differences in performance. I would like to know whether Juliaβs memory management is inefficient or if the Julia program could be revised to improve performance.
Here is a minimum example in Julia:
using CUDA
function DGP(N)
x = range(0, 1, N^2)
return reshape(x, (N, N))
end
function main(N)
x = CuArray(DGP(N))
V0 = CUDA.ones(Float64, N); idx = ()
a = 0.5
max_iter = 100
iter = 0
while iter < max_iter
V1 = copy(V0)
V0, idx = findmax(x .+ a * V1', dims=2)
@show iter += 1
end
return V0, idx, iter
end
CUDA.@time CUDA.@sync V, idx, iter = main(2^13)
The following is the MATLAB version:
tic
[V0, idx, iter] = main(2^13);
toc
function [res] = DGP(N)
x = linspace(0, 1, N^2);
res = reshape(x, N, N);
end
function [V0, idx, iter] = main(N)
x = gpuArray(DGP(N));
V0 = gpuArray.ones(N, 1); idx = zeros(N, 1);
a = 0.5;
max_iter = 100;
iter = 0;
while iter < max_iter
disp(['Start the iteration: ', num2str(iter+1)])
V1 = V0;
[V0, idx] = max(x + a * V1', [], 2);
iter = iter + 1;
end
end
Under the N=2^13 case, the runtime of Julia is 17.5817 seconds, and that of MATLAB is 0.5653 seconds.
Here are Windows Task Manager screenshots when running the program in Julia and MATLAB, respectively.
The figures show that Julia uses much more memory in this problem, leading to more time spent resetting the memory. Meanwhile, MATLAB uses less memory, and its memory usage is far below the memory limit, so it does not spend any time cleaning the memory.