I finally was able to get the example running. Primarily I wanted to ask, whether the code and it’s style is okay, before I want to optimize it. I thought it would be somewhat faster even on a single core. It also has a lot of allocations!
height, width = 187, 746;
org_sized = rand(Float32, (2001, 2001)) * 60;
shadow_time_hrs = zeros(Float32, size(org_sized));
height_mat = rand(Float32, (height, width)) * 100;
angle_mat = round.(Int32, 2 .* atand.(0:height-1, (0:width-1)' .-(width/2-1))) .+ 1;
enlarged_size = zeros(eltype(org_sized), size(org_sized, 1) + height, size(org_sized, 2) + width);
enlarged_size[1:size(org_sized, 1), range(Int(width/2), length=size(org_sized, 2))] = org_sized;
function computeSumHours(org_sized, enlarged_size, angle_mat, height_mat, weights, y, x)
height, width = size(height_mat)
short_elevations = enlarged_size[y:y+height, x:x+width]
shadowed_segments = zeros(eltype(weights), 361)
for x2 in 1:width
for y2 in 1:height
overshadowed = (short_elevations[y2, x2] - org_sized[y, x]) > height_mat[y2, x2]
if overshadowed
angle = angle_mat[y2, x2]
if shadowed_segments[angle] != 0.0
shadowed_segments[angle] = weights[angle]
end
end
end
end
return sum(shadowed_segments)
end
function computeAllLines(org_sized, enlarged_size, angle_mat, height_mat, shadow_time_hrs, weights)
for x in 1:size(org_sized, 2) - 1
for y in 1:size(org_sized, 1) - 1
shadow_time_hrs[x, y] = computeSumHours(org_sized, enlarged_size, angle_mat, height_mat, weights, y, x)
end
end
return shadow_time_hrs
end
@time result = computeAllLines(org_sized, enlarged_size, angle_mat, height_mat, shadow_time_hrs, weights)
The timing:
1827.429890 seconds (12.02 M allocations: 2.050 TiB, 2.64% gc time, 0.00% compilation time)
What would be the next steps (after fixing the code) you would take to optimize it?