Using views slows down my code

Several people on the discourse have recently suggested I use views to improve my code’s performance. However, for some reason, it actually slows down my code when I use them. To test this, I wrote the following MWE:

using  FFTW

function with_views(N, M)
    ψ = Array{Complex{Float64}}(undef, M, N)
    ψ[1, :] = rand(N) + im*rand(N)
    F = plan_fft!(@view ψ[1, :]) # Plan
    F̃ = plan_ifft!(@view ψ[1, :]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        @views ψ[i+1, :] = T2(ψ[i,:],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function without_views(N, M)
    ψ = Array{Complex{Float64}}(undef, M, N)
    ψ[1, :] = rand(N) + im*rand(N)
    F = plan_fft!(ψ[1, :]) # Plan
    F̃ = plan_ifft!(ψ[1, :]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        ψ[i+1, :] = T2(ψ[i,:],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function T2(ψ, W, dx, F, F̃)
    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    F*ψ
    @inbounds for i in 1:length(W)
        ψ[i] *= W[i]
    end
    F̃*ψ

    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    return ψ
end #T2

@benchmark with_views(512, 10000)

@benchmark without_views(512, 10000)

Now to bench mark:

@benchmark with_views(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  157.51 MiB
  allocs estimate:  10072
  --------------
  minimum time:     227.541 ms (1.81% GC)
  median time:      235.816 ms (4.57% GC)
  mean time:        241.442 ms (6.53% GC)
  maximum time:     302.875 ms (25.67% GC)
  --------------
  samples:          21
  evals/sample:     1

And without views:

@benchmark without_views(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  157.53 MiB
  allocs estimate:  10074
  --------------
  minimum time:     165.236 ms (1.71% GC)
  median time:      172.742 ms (6.32% GC)
  mean time:        176.176 ms (8.19% GC)
  maximum time:     236.072 ms (31.66% GC)
  --------------
  samples:          29
  evals/sample:     1

I am not quite sure what’s going on here. Any advice?

Using views didn’t reduce the number of allocations here, so I’m guessing that wasn’t the problem here. I’ve heard that FFTW does a lot of allocating in the background since it’s calling to a C library, so that could be one reason for this. Have you checked your code (or this MWE) with --track-allocation to see where the allocations are coming from?

Just an idea, maybe @view works better when it is working with properly memory aligned data? Can you try and change rows and columns in your code? So, in the end it should be @view ψ[:, 1] instead of @view ψ[1, :].

1 Like

Since a C library is called it’s likely that a copy is made unless the array is contiguous. So this would definitely be something to try, and should improve performance anyway.

2 Likes

I don’t think the allocations are coming from FFTW. Try

@views ψ[i+1, :] .= T2(ψ[i,:],W, 1.0/M, F, F̃)

(note the .=)

3 Likes

This was the solution to the problem at hand, thank you.

This has also lead to some small performance gains, I knew Julia is column major but I must have gotten confused with the indexing. Thanks for catching that.

Combining the two recommendations, this is a new MWE:

using FFTW

function attempt2(N, M)
    ψ = Array{Complex{Float64}}(undef, N, M)
    ψ[:, 1] = rand(N) + im*rand(N)
    F = plan_fft!(@view ψ[:, 1]) # Plan
    F̃ = plan_ifft!(@view ψ[:, 1]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        @views ψ[:, i+1] .= T2(ψ[:,i],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function attempt1(N, M)
    ψ = Array{Complex{Float64}}(undef, M, N)
    ψ[1, :] = rand(N) + im*rand(N)
    F = plan_fft!(ψ[1, :]) # Plan
    F̃ = plan_ifft!(ψ[1, :]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        ψ[i+1, :] = T2(ψ[i,:],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function T2(ψ, W, dx, F, F̃)
    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    F*ψ
    @inbounds for i in 1:length(W)
        ψ[i] *= W[i]
    end
    F̃*ψ

    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    return ψ
end #T2

Benchmarking this, we get:

julia> @benchmark attempt1(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  157.53 MiB
  allocs estimate:  10074
  --------------
  minimum time:     167.911 ms (4.22% GC)
  median time:      175.678 ms (8.73% GC)
  mean time:        176.204 ms (8.71% GC)
  maximum time:     187.611 ms (10.80% GC)
  --------------
  samples:          29
  evals/sample:     1

julia> @benchmark attempt2(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  78.17 MiB
  allocs estimate:  73
  --------------
  minimum time:     138.274 ms (1.23% GC)
  median time:      143.694 ms (4.12% GC)
  mean time:        144.101 ms (4.02% GC)
  maximum time:     154.579 ms (3.71% GC)
  --------------
  samples:          35
  evals/sample:     1

a nice 20% performance gain!

3 Likes