I suspect that M_Avg() is slow and an issue as I know there should be a better/cleaner way to implement what I am trying to do. I do agree that addressing the inner most loop would be the easiest. In words, I’m trying to average adjacent points together in a matrix. In 2d this would be like going from 100x50 to 100x25, assuming I’m averaging 2 points together, hence numAvg.

# Example Input for M_Avg
function M_Avg(numAvg = 2, Readout = 2742, x_pts = 80, y_pts = 80, M_in_vec = Array{Float64,1} )
# Expected Output
size(M_out) = 2742, 40, 40

On a side note, since I’ve already constructed a somewhat complex program that calls all of my functions from an included .jl file. What is considered the best practice when trying to debug to get the most performance? Did I already construct my program in a way that is frowned upon?

It’s hard to say without really understanding what each function in your code is supposed to do. However, in your function definition of M_Avg() your last argument is an Array{Float64, 1} but in your latest reply where you are giving an example for function arguments to use you are passing an integer as your last argument, so you should edit that unless I am mistaken.

To make things easier, pick smaller arguments for your function, run it and post with output below.

Thank you for the suggestion, I made the change. I was feeding in a Array{Float64,1} but wanted to give a sense of the size.

Here is a example of my function with corresponding @btimes. Thinking about my problem more, I might be able to utilize some of Julia’s imresize functions to try to get the results I would like. In a sense I am trying to downsample an 3D image (TxMxN with T >> (M,N)).

Thank you again for the help!

using LinearAlgebra
using BenchmarkTools
## Functions
function M_Avg(numAvg::Int64, Readout::Int64, x_pts::Int64, y_pts::Int64, M_in_vec::Array{Float64,1})
M_in_mat = reshape(M_in_vec, Readout, x_pts,y_pts)
M_out = zeros(Readout,Int(x_pts/numAvg),Int(y_pts/numAvg))
for n1 in 1:Int(x_pts/numAvg)
for n2 in 1:Int(y_pts/numAvg)
M_out[:, n1, n2]=
sum(sum(M_in_mat[:,1+numAvg*(n1-1):n1*numAvg,1+numAvg*(n2-1):n2*numAvg],dims = 3),dims = 2)/numAvg
end
end
return M_out
end
## Example 1
long_dim = 10;
smaller_dim = 10;
A = rand(long_dim,smaller_dim,smaller_dim)
A_smaller = @btime M_Avg(2, long_dim, smaller_dim, smaller_dim, A[:])
@btime:
57.610 μs (554 allocations: 47.67 KiB)
## Example 2
long_dim = 3000;
smaller_dim = 80;
A = rand(long_dim,smaller_dim,smaller_dim)
A_smaller = @btime M_Avg(2, long_dim, smaller_dim, smaller_dim, A[:])
@btime:
267.356 ms (41606 allocations: 477.44 MiB)

You may find the discussion above helpful. Especially, the comment made by @baggepinnen

Re @codewarntype, it’s just a helpful quick-way to check for all type-instabilities like so

julia> function f(x, y)
return x + y
end
f (generic function with 1 method)
julia> @code_warntype f(1, 2)
Variables
#self#::Core.Compiler.Const(f, false)
x::Int64
y::Int64
Body::Int64
1 ─ %1 = (x + y)::Int64
└── return %1
julia> @code_warntype f(1.0, 2.0)
Variables
#self#::Core.Compiler.Const(f, false)
x::Float64
y::Float64
Body::Float64
1 ─ %1 = (x + y)::Float64
└── return %1
julia> x = 1
1
julia> y = 2
2
julia> function g()
return x + y
end
g (generic function with 1 method)
julia> @code_warntype g()
Variables
#self#::Core.Compiler.Const(g, false)
Body::Any
1 ─ %1 = (Main.x + Main.y)::Any
└── return %1

In the above example, f(x, y) is type-stable (all types are known at compile-time) but g() is not as the return type is not known at compile time. The Any in the @code_warntype output for g() will likely show up in your REPL in red. The reason is that in f(x,y) I pass x, y as arguments but they are global variables for the function g().

Notice that for all functions a specialized compiled code is generated depending on your argument type. @code_warntype simply allows you to see that. Please keep in mind that all type-instabilities are not important as mentioned in the link below. There are some functions in Base like Base.getindex which are type-unstable by design. Similarly, the iterator interface for a for loop will also be type-unstable and so on. So don’t worry about all type-instabilities!

Just one hint: you are using sin and cos (twice) of \theta in one expression. Julia has sincos to calculate those utilizing both even and odd exponents. I would try something like

I don’t have experience with imresize but I changed your function into the following:

function M_Avg_new(numAvg::N, Readout::N, x_pts::N, y_pts::N, M_in_vec::Vector{T}) where {N<:Integer,T}
M_in_mat = reshape(M_in_vec, Readout, x_pts, y_pts)
M_out = zeros(T, Readout, x_pts÷numAvg, y_pts÷numAvg)
idx(n) = 1+numAvg*(n-1):n*numAvg
@inbounds for n1 in 1:x_pts÷numAvg
for n2 in 1:y_pts÷numAvg
for r in 1:Readout
s = zero(T)
for x in idx(n1), y in idx(n2)
s += M_in_mat[r,x,y]
end
M_out[r, n1, n2] = s/numAvg
end
end
end
return M_out
end

No need to specify Int64 or Float64 in this case. It will not make your code run faster and will only make your code less “generic”. If you like to annotate types of input arguments for clarity, you can do what I did above to still have “generic” code.

You can use div or ÷ (\div tab in Julia REPL) for integer division.

Inside of your loop you create a 3D subarray of the input array M_in_mat. This will lead to memory allocations and hurts performance. In some situations you can simply fix this with @views but I think in your case that won’t help because the 3D subarray is not a continuous block in memory. Therefore I decided to just write out things explicitly with loops instead.

Just like indexing into an array allocates memory for the subarrays, calling sum(A, dims=3) or sum(A, dims=2) will allocate memory for the output array. Note that sum(A) will not allocate because the output is just a number (assuming the elements of A are themselves just numbers). So again, I just decided to write out a for loop explicitly.

The @inbounds around the for loops prevents the runtime checking of whether the array indices are within bounds. This can save some time especially when there’s a lot of looping and not a lot of computation.

Perhaps it’s possible to use things like map, dot fusion and whatnot to reduce the number of loops without introducing memory allocations. But I thought that - since you’re coming from Matlab - it would maybe be nice to see that just writing out loops is not actually bad for performance.

When $btimeing your code, put a $ in front or the input arguments to “interpolate” them. To illustrate why: when you do @btime foo(rand(1000,1000)), you are timing the creation of the random matrix as well. With @btime foo($(rand(1000,1000))) you are timing what foo does.