Optimizing Calculation in Julia compared to C (New to Julia)

Hello, just starting to make the switch to Julia from Matlab/C. I’ve been reading up on the speed advantages of Julia and recognize the power that can be wielded in the right hands. I was hoping I could receive some pointers on how to optimize my code. I’m comparing the code below to a vectorized C style that performs the same calculation but has been converted to julia’s syntax. The code below is both slower and requires significantly more memory allocation than the C version. I use the same variables declared outside of each function with most of the being Vector{Floats64} with b1 being a Vector{Complex{Float64}}.

Any recommendations on how I could speed this up and make it significantly less memory intensive would be greatly appreciated!

using LinearAlgebra
x_pts = 1000
num_pts = 5000
Xpos = range(-FOV/2,FOV/2,length = x_pts)

M = zeros(3,num_pts)
M[:,1] = [0;0;1]

function Sim1d(Xpos, num_pts, resol, b1, gradx, offset, M)

    Mout = zeros(3, length(Xpos))
    B1 = [0.0; 0.0; 0.0]

    for n = 1:length(Xpos)
        for tt = 1:num_pts-1
            Δω = gam * Xpos[n] * grad[tt] + offset[n]
            B1 = [real(b1[tt]); imag(b1[tt]); Δω]
            θ = norm(B1) * resol

            if abs(θ) > 1e-9
                B1 = B1 / norm(B1)
                M[:, tt+1]  = M[:, tt] * cos(θ) + cross(B1, M[:, tt]) * sin(θ) +
                       B1 * dot(B1, M[:, tt]) * (1 - cos(θ))
            else
                M[:, tt+1] = M[:, tt]
            end
        end
        Mout[:, n] = M[:, end]
    end
    return Mout
end

The key 2 problems here are that M is a non const global, and that you need to use @views to prevent right hand side, non-scalar indexing from allocating.

3 Likes

To build off what @Oscar_Smith said, I would personally recommend reading the content below and re-editing parts of your code

This really helped me when I started and both the suggestions given above are included in it too with a more detailed explanation. Feel free to post an edited version of the above code at some point based on what you tried after reading this link!

3 Likes

Thank you for the link! I’ll start working my way through it and try to make the appropriate change into my code. It seems @views is an important thing to learn and master.

1 Like

Also have a look at Julia Performance Tips: https://docs.julialang.org/en/v1/manual/performance-tips/

3 Likes

For this example, putting a const in front of your first use of M, and views before the outer for loop should bring it to C speeds. In general, it won’t always be quite that easy

You’re welcome! And, when you time make sure to do something like the following if you are using global variables inside your function argument.

using BenchMarkTools

julia> A = rand(3,3);

julia> @btime inv($A); # we interpolate the global variable A with $A
  1.191 μs (10 allocations: 2.31 KiB)
1 Like

Quick question regarding global variables and const. Coming from matlab, my work flow would be to do some minor calculations to create arrays (see example code below) then feed them into functions, like Sim1d up above, which would perform the bulk of the calculations. Would this work flow be frowned upon when using Julia? If I am reading all of the recommended documentation correctly, it seems that this would be bad as it could lead to significantly more computations than needed and potentially large data usage assuming I am not using @view properly.

Also here is a link to my full code showing how I structured it plus the two different implementations of my algorithm that was discussed above (https://github.com/loki3118/Julia_Code.git)

Example Code:

resol = 1e-6
tp = 0.005
num_pts = Int(round(tp / resol))
ts = range(-1, 1, length = num_pts)
#--------------------------------------------------------------------------
bw1 = 4.5 / tp
FOV = 10

scale = (2 * pi * bw1) / (gam * FOV)
grad = scale * ones(1, num_pts)

Yes, see below. From the link I shared, one notes that “REPL/Global scope does not allow for type-specificity”. In sum_bad() we define x, y in the global scope without a const in front so the compiler can’t specialize on the types. In the next two functions, I either pass x, y as arguments or declare them as a const. In both cases, the compiler can generate specialized code (see the output of @code_warntype) which results in a dramatic performance improvement.

Personally, I prefer the style of sum_good(x, y) because it’s easy for readability and you know what arguments the function operates on. In sum_also_good(), while there is no performance hit rel. to sum_good(), you would have to figure out from the REPL that you passed x,y with a const in front. This becomes cumbersome for large pieces of code.

julia> x = 1
1

julia> y = 2
2

julia> function sum_bad()
        x + y
       end
sum_bad (generic function with 1 method)

julia> @code_warntype sum_bad()
Variables
  #self#::Core.Compiler.Const(sum_bad, false)

Body::Any
1 ─ %1 = (Main.x + Main.y)::Any
└──      return %1

######################
# Two possible fixes 
######################

# 1)  passing x, y as arguments
julia> function sum_good(x, y)
        x + y
       end
sum_good (generic function with 1 method)

julia> @code_warntype sum_good(x, y)
Variables
  #self#::Core.Compiler.Const(sum_good, false)
  x::Int64
  y::Int64

Body::Int64
1 ─ %1 = (x + y)::Int64
└──      return %1

# 2) declaring globals with 'const' in front
julia> const a = 1
1

julia> const b = 2
2

julia> function sum_also_good()
        a + b
       end
sum_also_good (generic function with 1 method)

julia> @code_warntype sum_also_good()
Variables
  #self#::Core.Compiler.Const(sum_also_good, false)

Body::Int64
1 ─ %1 = (Main.a + Main.b)::Core.Compiler.Const(3, false)
└──      return %1

#benchmarks 
julia> @btime sum_bad()
  20.744 ns (0 allocations: 0 bytes)
3

julia> @btime sum_good($x, $y)
  0.001 ns (0 allocations: 0 bytes)
3

julia> @btime sum_also_good()
  0.001 ns (0 allocations: 0 bytes)
3
1 Like

You generally want to use a barrier function if the dynamic dispatch will have to occur. If you really want to use a global variable, you should enclose it in a barrier function to limit the number of times dispatch has to occur. In the following example, you can see that the normal usage of the global variable results in many allocations. By using a barrier function, you limit to the dynamic dispatch cost to a single function call.

using BenchmarkTools

scale = 2.0
a=rand(1000);

foo(a) = sum(x->x*scale,a)

function foo_caller(a)
    foo_inner(a,scale)
end

function foo_inner(a,scale)
    sum(x->x*scale,a)
end


@btime foo($a)
  37.999 μs (2999 allocations: 46.86 KiB)
@btime foo_caller($a)  # The barrier limits the allocations
  228.454 ns (2 allocations: 32 bytes)
@btime foo_inner($a,$scale)  #No allocations were necessary past the barrier
  83.090 ns (0 allocations: 0 bytes)
2 Likes

I would store M as a vector of static arrays since one dimension (3) is known and fixed. This will probably speed things up quite a lot.

See my previous answer here for a similar situation

2 Likes

That is not a problem here, M is an input argument to the function so it should be just fine.

3 Likes

I have converted a fair amount of Matlab/C code to Julia. If you have existing and tested computational C code, my advice is to keep the code as is and only do the necessary syntactical and indexing changes. Julia will handle loops perfectly fine and converting to a vectorized style is only worth it if it adds clarity without compromising performance.

5 Likes

it also gives access to threading and gpu

1 Like

Threading can certainly be done with loops (in fact, I thought threaded loops were the normal way). I’m not veryfamiliar with gpu processing, but there are things like https://github.com/vchuravy/GPUifyLoops.jl

Vector/matrix operation can benefit from BLAS, though.

2 Likes

Yes, but in the context of porting code from Matlab/C, chances are that the parts that were implemented in C were so exactly because they would not fit into BLAS calls.

Thank you everyone for the feed back (and Happy New Year). It has been a big help getting though I am still running into issues where I am not utilizing the features of Julia. Per all of the recommendations, I tried to specify the input type to avoid any Type-Instabilities but have been struggling getting a sense of how to properly utilize in-place operations (like foo! and .*) and @views. I am trying to optimize the following code to better utilize these operations to cut down on garbage collecting. Also any recommendations on how to track down where I am generating garbage would be greatly appreciated.

function M_Avg(numAvg::Int64, Readout::Int64, x_pts::Int64, y_pts::Int64, M_in_vec::Array{Float64,1})
    M_in_mat = reshape(M_in_vec, Readout, x_pts,y_pts)
    M_out = zeros(Readout,Int(x_pts/numAvg),Int(y_pts/numAvg))

    @progress for n1 in 1:Int(x_pts/numAvg)
        for n2 in 1:Int(y_pts/numAvg)
            M_out[:, n1, n2]=
                sum(sum(M_in_mat[:,1+numAvg*(n1-1):n1*numAvg,1+numAvg*(n2-1):n2*numAvg],dims = 3),dims = 2)/numAvg
        end
    end
    return M_out
end
  1. Can you time your code using @btime and report the output? This will help us see where there might be potential for improvements.

  2. Use @codewarntype to check for type-instabilities in your code and let us know the ouput in the reply. This will also be helpful as a first step.

  3. Where is M_out_RO initialized?

Addressing point #3 first. I miss typed it, sorry about that. It was initialized as all zeros that is then filled during the two for loops.

Regarding the other points, I have this function inside of another function that I then run, i.e. setup like the sudo code below. I believe placing @btime when I call my outer most function wont be that informative as it includes several different functions being called. That might be also true of @codewarntype but I am not that familiar with that package. To over come this I’ve been using @profiler to get a better sense of whats going on.

# Defined in its own .jl files else where
Function Function_to_Calculate_things()
     more calculation stuff
return more calculation stuff
end

Function M_Avg()
return output
end

Function Do_stuff(....)
     calculates_stuff = Function_to_Calculate_things(...)
     Matrix_avg = M_Avg(Calculated_stuff,...)
return Matrix_avg 
end

# Main .jl file that defined everything required for Do_stuff()
include("Do_stuff.jl")
Final_out = Do_stuff(...)

Hmm, I see. I thought you wanted help in M_Avg() only. Can you give an example for the values of arguments that you are running M_Avg() with?

If you’d like, we could start looking at things starting at M_Avg() first and see if we can improve this inner-most function. Then, maybe you can scale up based on suggestions that I or others might have in the responses.