Here’s a more idiomatic Julia version:
A = CUDA.rand(151,151,151);
B = similar(A);
C = CUDA.zeros(151,151,151);
D = similar(C);
E = similar(C);
F = similar(C);
function math1!(C, A, B)
@. C = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B
return C
end
function math2!(D, C)
@. D = C^2 + C^2 + C * C + C / C - C * C - C / C + C * C + C / C - C * C - C / C
return D
end
function math3!(E, D)
@. E = D^2 + D^2 + D * D + D / D - D * D - D / D + D * D + D / D - D * D - D / D
return E
end
@btime for iter = 1:1000
math1!($C, $A, $B)
math2!($D, $C)
math3!($E, $D)
end
What time do you get with this code?
Key points:
- If you want to mutate your preallocated arrays
C, D, E, you need to pass them to the functions as arguments, and apply a mutating operation to them.C = ...does not mutate, it creates a new variableC. However,@. C = ...mutates because it translates toC .= ..., which is syntax for in-place broadcasting. - The macro
@.goes in front of an expression to insert dots (i.e., fused broadcasting) at every subexpression. When you use it, you don’t have to insert any dots manually. (I’ve never seen it placed in front of the entire method definition before, not sure what it would do there.) - When benchmarking with
@btimeyou should interpolate global variables with$, otherwise the expression can’t be type inferred and compiled to efficient code.
Minor point: you don’t need to terminate lines with a semicolon in Julia.
Let us know how the updated code performs!