Hello, just starting to make the switch to Julia from Matlab/C. I’ve been reading up on the speed advantages of Julia and recognize the power that can be wielded in the right hands. I was hoping I could receive some pointers on how to optimize my code. I’m comparing the code below to a vectorized C style that performs the same calculation but has been converted to julia’s syntax. The code below is both slower and requires significantly more memory allocation than the C version. I use the same variables declared outside of each function with most of the being Vector{Floats64} with b1 being a Vector{Complex{Float64}}.
Any recommendations on how I could speed this up and make it significantly less memory intensive would be greatly appreciated!
using LinearAlgebra
x_pts = 1000
num_pts = 5000
Xpos = range(-FOV/2,FOV/2,length = x_pts)
M = zeros(3,num_pts)
M[:,1] = [0;0;1]
function Sim1d(Xpos, num_pts, resol, b1, gradx, offset, M)
Mout = zeros(3, length(Xpos))
B1 = [0.0; 0.0; 0.0]
for n = 1:length(Xpos)
for tt = 1:num_pts-1
Δω = gam * Xpos[n] * grad[tt] + offset[n]
B1 = [real(b1[tt]); imag(b1[tt]); Δω]
θ = norm(B1) * resol
if abs(θ) > 1e-9
B1 = B1 / norm(B1)
M[:, tt+1] = M[:, tt] * cos(θ) + cross(B1, M[:, tt]) * sin(θ) +
B1 * dot(B1, M[:, tt]) * (1 - cos(θ))
else
M[:, tt+1] = M[:, tt]
end
end
Mout[:, n] = M[:, end]
end
return Mout
end
The key 2 problems here are that M is a non const global, and that you need to use @views to prevent right hand side, non-scalar indexing from allocating.
To build off what @Oscar_Smith said, I would personally recommend reading the content below and re-editing parts of your code
This really helped me when I started and both the suggestions given above are included in it too with a more detailed explanation. Feel free to post an edited version of the above code at some point based on what you tried after reading this link!
Thank you for the link! I’ll start working my way through it and try to make the appropriate change into my code. It seems @views is an important thing to learn and master.
For this example, putting a const in front of your first use of M, and views before the outer for loop should bring it to C speeds. In general, it won’t always be quite that easy
Quick question regarding global variables and const. Coming from matlab, my work flow would be to do some minor calculations to create arrays (see example code below) then feed them into functions, like Sim1d up above, which would perform the bulk of the calculations. Would this work flow be frowned upon when using Julia? If I am reading all of the recommended documentation correctly, it seems that this would be bad as it could lead to significantly more computations than needed and potentially large data usage assuming I am not using @view properly.
Also here is a link to my full code showing how I structured it plus the two different implementations of my algorithm that was discussed above (https://github.com/loki3118/Julia_Code.git)
Yes, see below. From the link I shared, one notes that “REPL/Global scope does not allow for type-specificity”. In sum_bad() we define x, y in the global scope without a const in front so the compiler can’t specialize on the types. In the next two functions, I either pass x, y as arguments or declare them as a const. In both cases, the compiler can generate specialized code (see the output of @code_warntype) which results in a dramatic performance improvement.
Personally, I prefer the style of sum_good(x, y) because it’s easy for readability and you know what arguments the function operates on. In sum_also_good(), while there is no performance hit rel. to sum_good(), you would have to figure out from the REPL that you passed x,y with a const in front. This becomes cumbersome for large pieces of code.
julia> x = 1
1
julia> y = 2
2
julia> function sum_bad()
x + y
end
sum_bad (generic function with 1 method)
julia> @code_warntype sum_bad()
Variables
#self#::Core.Compiler.Const(sum_bad, false)
Body::Any
1 ─ %1 = (Main.x + Main.y)::Any
└── return %1
######################
# Two possible fixes
######################
# 1) passing x, y as arguments
julia> function sum_good(x, y)
x + y
end
sum_good (generic function with 1 method)
julia> @code_warntype sum_good(x, y)
Variables
#self#::Core.Compiler.Const(sum_good, false)
x::Int64
y::Int64
Body::Int64
1 ─ %1 = (x + y)::Int64
└── return %1
# 2) declaring globals with 'const' in front
julia> const a = 1
1
julia> const b = 2
2
julia> function sum_also_good()
a + b
end
sum_also_good (generic function with 1 method)
julia> @code_warntype sum_also_good()
Variables
#self#::Core.Compiler.Const(sum_also_good, false)
Body::Int64
1 ─ %1 = (Main.a + Main.b)::Core.Compiler.Const(3, false)
└── return %1
#benchmarks
julia> @btime sum_bad()
20.744 ns (0 allocations: 0 bytes)
3
julia> @btime sum_good($x, $y)
0.001 ns (0 allocations: 0 bytes)
3
julia> @btime sum_also_good()
0.001 ns (0 allocations: 0 bytes)
3
You generally want to use a barrier function if the dynamic dispatch will have to occur. If you really want to use a global variable, you should enclose it in a barrier function to limit the number of times dispatch has to occur. In the following example, you can see that the normal usage of the global variable results in many allocations. By using a barrier function, you limit to the dynamic dispatch cost to a single function call.
using BenchmarkTools
scale = 2.0
a=rand(1000);
foo(a) = sum(x->x*scale,a)
function foo_caller(a)
foo_inner(a,scale)
end
function foo_inner(a,scale)
sum(x->x*scale,a)
end
@btime foo($a)
37.999 μs (2999 allocations: 46.86 KiB)
@btime foo_caller($a) # The barrier limits the allocations
228.454 ns (2 allocations: 32 bytes)
@btime foo_inner($a,$scale) #No allocations were necessary past the barrier
83.090 ns (0 allocations: 0 bytes)
I have converted a fair amount of Matlab/C code to Julia. If you have existing and tested computational C code, my advice is to keep the code as is and only do the necessary syntactical and indexing changes. Julia will handle loops perfectly fine and converting to a vectorized style is only worth it if it adds clarity without compromising performance.
Threading can certainly be done with loops (in fact, I thought threaded loops were the normal way). I’m not veryfamiliar with gpu processing, but there are things like GitHub - vchuravy/GPUifyLoops.jl
Vector/matrix operation can benefit from BLAS, though.
Yes, but in the context of porting code from Matlab/C, chances are that the parts that were implemented in C were so exactly because they would not fit into BLAS calls.
Thank you everyone for the feed back (and Happy New Year). It has been a big help getting though I am still running into issues where I am not utilizing the features of Julia. Per all of the recommendations, I tried to specify the input type to avoid any Type-Instabilities but have been struggling getting a sense of how to properly utilize in-place operations (like foo! and .*) and @views. I am trying to optimize the following code to better utilize these operations to cut down on garbage collecting. Also any recommendations on how to track down where I am generating garbage would be greatly appreciated.
function M_Avg(numAvg::Int64, Readout::Int64, x_pts::Int64, y_pts::Int64, M_in_vec::Array{Float64,1})
M_in_mat = reshape(M_in_vec, Readout, x_pts,y_pts)
M_out = zeros(Readout,Int(x_pts/numAvg),Int(y_pts/numAvg))
@progress for n1 in 1:Int(x_pts/numAvg)
for n2 in 1:Int(y_pts/numAvg)
M_out[:, n1, n2]=
sum(sum(M_in_mat[:,1+numAvg*(n1-1):n1*numAvg,1+numAvg*(n2-1):n2*numAvg],dims = 3),dims = 2)/numAvg
end
end
return M_out
end
Addressing point #3 first. I miss typed it, sorry about that. It was initialized as all zeros that is then filled during the two for loops.
Regarding the other points, I have this function inside of another function that I then run, i.e. setup like the sudo code below. I believe placing @btime when I call my outer most function wont be that informative as it includes several different functions being called. That might be also true of @codewarntype but I am not that familiar with that package. To over come this I’ve been using @profiler to get a better sense of whats going on.
# Defined in its own .jl files else where
Function Function_to_Calculate_things()
more calculation stuff
return more calculation stuff
end
Function M_Avg()
return output
end
Function Do_stuff(....)
calculates_stuff = Function_to_Calculate_things(...)
Matrix_avg = M_Avg(Calculated_stuff,...)
return Matrix_avg
end
# Main .jl file that defined everything required for Do_stuff()
include("Do_stuff.jl")
Final_out = Do_stuff(...)
Hmm, I see. I thought you wanted help in M_Avg() only. Can you give an example for the values of arguments that you are running M_Avg() with?
If you’d like, we could start looking at things starting at M_Avg() first and see if we can improve this inner-most function. Then, maybe you can scale up based on suggestions that I or others might have in the responses.