Memory reuse problem in for loop

deil · May 17, 2021, 9:54am

I was implementing an algorithm to detect edges in RGB images, when I noticed very high memory usage:

function coloredge(img::Matrix{RGB{T}})::Matrix{T} where T <: AbstractFloat
    Sy, Sx = size(img)
    
    z = zeros(T, Sy, Sx)
    
    for x=2:Sx-1
        for y=2:Sy-1
            ∂img∂x = img[y, x+1] - img[y, x-1]
            ∂img∂y = img[y+1, x] - img[y-1, x]
            
            
            u = [red(∂img∂x), green(∂img∂x), blue(∂img∂x)]
            v = [red(∂img∂y), green(∂img∂y), blue(∂img∂y)]
            
            g_xx = u'*u
            g_xy = u'*v
            g_yy = v'*v
            
            θ = 1/2 * atan(2*g_xy/(g_xx - g_yy))
            
            if isnan(θ)
                θ = 1/2 * π/2
            end
            
            F_θ = sqrt(max(0, 1/2 * ((g_xx + g_yy) + (g_xx - g_yy)*cos(2*θ) + 2*g_xy*sin(2*θ))))
            
            z[y, x] = F_θ
        end
    end

    return z
end

I tested it on the testimage("mandrill") using @btime and I got

  51.996 ms (520202 allocations: 57.56 MiB)

Then I changed the code to

function coloredge_noalloc(img::Matrix{RGB{T}})::Matrix{T} where T <: AbstractFloat
    Sy, Sx = size(img)
    
    z = zeros(T, Sy, Sx)
    
    u = Vector{T}(undef, 3)
    v = Vector{T}(undef, 3)
    
    for x=2:Sx-1
        for y=2:Sy-1
            ∂img∂x =  img[y, x+1] -  img[y, x-1]
            ∂img∂y =  img[y+1, x] -  img[y-1, x]
            
            u[1] = red(∂img∂x)
            u[2] = green(∂img∂x)
            u[3] = blue(∂img∂x)
            
            v[1] = red(∂img∂y)
            v[2] = green(∂img∂y)
            v[3] = blue(∂img∂y)
            
            g_xx = u'*u
            g_xy = u'*v
            g_yy = v'*v
            
            θ = 1/2 * atan(2*g_xy/(g_xx - g_yy))
            
            if isnan(θ)
                θ = 1/2 * π/2
            end
            
            F_θ = sqrt(max(0, 1/2 * ((g_xx + g_yy) + (g_xx - g_yy)*cos(2*θ) + 2*g_xy*sin(2*θ))))
            
            
            z[y, x] = F_θ
        end
    end

    return z
end

and now it performs as expected

  32.185 ms (4 allocations: 2.00 MiB)

Why do I have to do it component-wise? I tried with u .= [red(∂img∂x), green(∂img∂x), blue(∂img∂x)] but that did not work, neither did u= ....

(required imports: Images, TestImages, BenchmarkTools)

lmiq · May 17, 2021, 10:16am

The thing is that these lines are creating a new mutable vector at every iteration (on the right).

That can be solved using StaticArrays and doing

u = @SVector [red(...) green (...) blue (...)]

(Or sometimes just a tuple, depending on how you use it after, may suffice)

deil · May 17, 2021, 10:28am

Thank you very much for the swift response! Using a Tuple worked for me (and requires no additional packages).

P.S. I forgot to mention that I am running Julia 1.6 (in Jupyter notebook)

lmiq · May 17, 2021, 11:43am

I’m intrigued. Did you managed the adjoint and product by hand there?

julia> u = (1,2,3)
(1, 2, 3)

julia> v = (1,2,3)
(1, 2, 3)

julia> u' * v
ERROR: MethodError: no method matching adjoint(::Tuple{Int64, Int64, Int64})

This is the kind of thing that works with static arrays:

julia> using StaticArrays

julia> u = @SVector [1,2,3];

julia> v = @SVector [1,2,3];

julia> u' * v
14

deil · May 17, 2021, 11:58am

I preallocated u outside the for-loop

u = Vector{Float64}(undef, 3)

and inside the for loop I used .=, e.g.

u .= (1, 2, 3)

lmiq · May 17, 2021, 12:17pm

Ah, ok. Note that using the static arrays version will probably be faster (not sure how much and if that will be relevant for your case).

deil · May 17, 2021, 12:33pm

Can confirm! This was about twice as fast as the tuple version:

using StaticArrays

function coloredge_s(img::Matrix{RGB{T}})::Matrix{T} where T <: AbstractFloat
    Sy, Sx = size(img)
    
    z = zeros(T, Sy, Sx)
    
    u = SVector{3, T}(0,0,0)
    v = SVector{3, T}(0,0,0)

    for x=2:Sx-1
        for y=2:Sy-1
            ∂img∂x = img[y, x+1] - img[y, x-1]
            ∂img∂y = img[y+1, x] - img[y-1, x]
            
            
            u =  @SVector [red(∂img∂x), green(∂img∂x), blue(∂img∂x)]  
            v =  @SVector [red(∂img∂y), green(∂img∂y), blue(∂img∂y)]
            
            g_xx = u'*u
            g_xy = u'*v
            g_yy = v'*v
            
            θ = 1/2 * atan(2*g_xy/(g_xx - g_yy))
            
            if isnan(θ)
                θ = 1/2 * π/2
            end
            
            F_θ = sqrt(max(0, 1/2 * ((g_xx + g_yy) + (g_xx - g_yy)*cos(2*θ) + 2*g_xy*sin(2*θ))))
            
            z[y, x] = F_θ
        end
    end

    return z
end

lmiq · May 17, 2021, 12:46pm

In this case you don’t need to define the vectors initially. Static vectors are like any number, you get a new one every time you assign one of them to a label.

deil · May 17, 2021, 3:59pm

Out of curiosity: what is then the reason that these arrays avoid the huge memory allocation of regular Arrays? Does the compiler figure out how to optimize them like it does with the number variables that I compute in every loop?

lmiq · May 17, 2021, 4:15pm

Yes, exactly. The compiler does not figure out that that regular array is created in and useless after that loop, and therefore that it does not need to be an actually mutable array. You tell him that by using static arrays. When the array lives only the scope of the loop, it does not need to be allocated at all (in the “heap” memory), which is what is slow, it just lives in the stack memory, the processor register, or is simply not created at all if the sequence of operations can be simplified by the compiler.

And probably it could in principle figure that out and optimize that out. The reason why it does not in Julia is beyond my understanding.

More generally, I have these notes written when I was learning this stuff.

ToucheSir · May 18, 2021, 2:24am

My non-expert understanding of this is that the compiler just doesn’t know enough about arrays to perform this level of optimization on them. Certainly it’s much harder to handle escape analysis for an immutable value type (e.g. Int) than a mutable reference type like Array. That said, there has been more talk of working on escape analysis recently, so perhaps this will all be out of date soon

Topic		Replies	Views
Common allocation mistakes Performance memory-allocation	47	7142	August 21, 2023
Local arrays in for loops New to Julia	55	1826	March 3, 2021
Avoiding memory allocation when repeatedly using small arrays Performance	5	494	July 25, 2021
Huge memory allocation New to Julia array , memory-allocation	17	1111	January 25, 2024
Memory allocations when returning vectors General Usage array , memory-allocation	15	1491	June 6, 2018

Memory reuse problem in for loop

Related topics