Using views slows down my code

ash · October 23, 2020, 5:47am

Several people on the discourse have recently suggested I use views to improve my code’s performance. However, for some reason, it actually slows down my code when I use them. To test this, I wrote the following MWE:

using  FFTW

function with_views(N, M)
    ψ = Array{Complex{Float64}}(undef, M, N)
    ψ[1, :] = rand(N) + im*rand(N)
    F = plan_fft!(@view ψ[1, :]) # Plan
    F̃ = plan_ifft!(@view ψ[1, :]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        @views ψ[i+1, :] = T2(ψ[i,:],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function without_views(N, M)
    ψ = Array{Complex{Float64}}(undef, M, N)
    ψ[1, :] = rand(N) + im*rand(N)
    F = plan_fft!(ψ[1, :]) # Plan
    F̃ = plan_ifft!(ψ[1, :]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        ψ[i+1, :] = T2(ψ[i,:],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function T2(ψ, W, dx, F, F̃)
    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    F*ψ
    @inbounds for i in 1:length(W)
        ψ[i] *= W[i]
    end
    F̃*ψ

    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    return ψ
end #T2

@benchmark with_views(512, 10000)

@benchmark without_views(512, 10000)

Now to bench mark:

@benchmark with_views(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  157.51 MiB
  allocs estimate:  10072
  --------------
  minimum time:     227.541 ms (1.81% GC)
  median time:      235.816 ms (4.57% GC)
  mean time:        241.442 ms (6.53% GC)
  maximum time:     302.875 ms (25.67% GC)
  --------------
  samples:          21
  evals/sample:     1

And without views:

@benchmark without_views(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  157.53 MiB
  allocs estimate:  10074
  --------------
  minimum time:     165.236 ms (1.71% GC)
  median time:      172.742 ms (6.32% GC)
  mean time:        176.176 ms (8.19% GC)
  maximum time:     236.072 ms (31.66% GC)
  --------------
  samples:          29
  evals/sample:     1

I am not quite sure what’s going on here. Any advice?

Sukera · October 23, 2020, 6:38am

Using views didn’t reduce the number of allocations here, so I’m guessing that wasn’t the problem here. I’ve heard that FFTW does a lot of allocating in the background since it’s calling to a C library, so that could be one reason for this. Have you checked your code (or this MWE) with --track-allocation to see where the allocations are coming from?

Skoffer · October 23, 2020, 6:43am

Just an idea, maybe @view works better when it is working with properly memory aligned data? Can you try and change rows and columns in your code? So, in the end it should be @view ψ[:, 1] instead of @view ψ[1, :].

kristoffer.carlsson · October 23, 2020, 6:52am

Since a C library is called it’s likely that a copy is made unless the array is contiguous. So this would definitely be something to try, and should improve performance anyway.

pablosanjose · October 23, 2020, 6:54am

I don’t think the allocations are coming from FFTW. Try

@views ψ[i+1, :] .= T2(ψ[i,:],W, 1.0/M, F, F̃)

(note the .=)

ash · October 23, 2020, 7:32am

This was the solution to the problem at hand, thank you.

This has also lead to some small performance gains, I knew Julia is column major but I must have gotten confused with the indexing. Thanks for catching that.

Combining the two recommendations, this is a new MWE:

using FFTW

function attempt2(N, M)
    ψ = Array{Complex{Float64}}(undef, N, M)
    ψ[:, 1] = rand(N) + im*rand(N)
    F = plan_fft!(@view ψ[:, 1]) # Plan
    F̃ = plan_ifft!(@view ψ[:, 1]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        @views ψ[:, i+1] .= T2(ψ[:,i],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function attempt1(N, M)
    ψ = Array{Complex{Float64}}(undef, M, N)
    ψ[1, :] = rand(N) + im*rand(N)
    F = plan_fft!(ψ[1, :]) # Plan
    F̃ = plan_ifft!(ψ[1, :]) # Plan

    W = ifftshift(cis.(rand(512)))

    for i = 1:M-1
        ψ[i+1, :] = T2(ψ[i,:],W, 1.0/M, F, F̃)
    end #for
    return nothing
end #solve

function T2(ψ, W, dx, F, F̃)
    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    F*ψ
    @inbounds for i in 1:length(W)
        ψ[i] *= W[i]
    end
    F̃*ψ

    @inbounds for i in 1:length(ψ)
        ψ[i] *= cis(dx/2 * (-1*abs2(ψ[i]))) 
    end

    return ψ
end #T2

Benchmarking this, we get:

julia> @benchmark attempt1(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  157.53 MiB
  allocs estimate:  10074
  --------------
  minimum time:     167.911 ms (4.22% GC)
  median time:      175.678 ms (8.73% GC)
  mean time:        176.204 ms (8.71% GC)
  maximum time:     187.611 ms (10.80% GC)
  --------------
  samples:          29
  evals/sample:     1

julia> @benchmark attempt2(512, 10000)
BenchmarkTools.Trial: 
  memory estimate:  78.17 MiB
  allocs estimate:  73
  --------------
  minimum time:     138.274 ms (1.23% GC)
  median time:      143.694 ms (4.12% GC)
  mean time:        144.101 ms (4.02% GC)
  maximum time:     154.579 ms (3.71% GC)
  --------------
  samples:          35
  evals/sample:     1

a nice 20% performance gain!

Topic		Replies	Views
Views performance in Julia 1.0-DEV? Performance	1	872	August 4, 2018
Warning for array view usage? Performance	13	986	November 6, 2018
Question on performance of views General Usage question , performance , views	4	331	September 12, 2024
Using views causes significant slowdown in simple PDE Performance diffeq , performance , memory-allocation , pde	2	510	February 7, 2021
Reduce memory allocated in array view and in place sum Performance question	12	654	November 10, 2023

Using views slows down my code

Related topics