I have been struggling a few hours to understand a massive slow down (x1000) of some computations. I finally spotted the problem to be caused by the use of array views in a context where one (at least myself) could think that it should cause no arm…
I join the following MWE
using BenchmarkTools
function test()
nx=10
ny=20
nz=30
nxyz=nx*ny*nz
x1D=zeros(Float64,nxyz)
y1D=zeros(Float64,nxyz)
copyto!(x1D,1:nxyz)
x3D=reshape(x1D,(nx,ny,nz))
y3D=reshape(y1D,(ny,nz,nx))
t=@belapsed permutedims!($y3D,$x3D,(2,3,1))
println("time permutedims: ",t)
x3Dv=reshape(view(x1D,1:nxyz),(nx,ny,nz))
y3Dv=reshape(view(y1D,1:nxyz),(ny,nz,nx))
tv=@belapsed permutedims!($y3Dv,$x3Dv,(2,3,1))
println("time permutedims (view): ",tv)
println(" slow down factor:",tv/t)
end
test()
I wonder about the opportunity to reduce the risk of this kind of mistake like a warning when views are used with linear indexing or, a different kind of view when it refers to a contiguous piece of data. What is your opinion ?
Although your example has a one-to-one correspondence between the array index and the view index, that’s not generally the case. You’re asking permutedims!to calculate an index-to-index correspondence on structures whose indices are two degrees removed from the underlying data, and the inflated time reflects that. The damage could be abated by collect(reshape(view(...))).
Thanks for your explanation.
I still wonder about the opportunity to find a way to spot (or avoid) this kind of performance issue.
For example, using linear indexing on subarrays could emit performance warnings… Another option would be to optimize views when they actually refer to contiguous piece of array data.
I am still a newbie in Julia and, although I appreciate very much the tools like BenchmakTools or @code_xxx that help to eliminate performance issues, it still took me a long time to understand what was wrong in this case…
Note that reshape does not copy the data; instead, it makes a new indexing structure for the same data. Similarly, transpose does not copy the data. Therefore, in a setting where view refers to the entire array, reshape(view(...),...) would be redundant and involve extra index computations; reshape(...) alone would probably be preferable.
This is clear.
What is not clear is my oversimplified MWE: in my case I wanted to permute the dims of an array slice (a contiguous slice).
What I wanted to point out in this thread, is that my mistake of using a view in this case was difficult to find (in my larger code). The code produces the correct result but was very slow.
I hope that other Julia (newbie) users will not loose hours debugging the same performance errors.
Thank you again.
I think such things should better be handled by a linter tool for Julia. In addition, a beginner striving for high performance (who already knows usage of views) should read the Performance Tips in the documentation first.
The source file reshapedarray.jl uses a lot of splatting (that is, the … operation). The compiler uses a number of heuristics in decide when to specialize ... versus when to punt and delay the type-matching until run-time. (The issue is that compile-time type-matching can be slow for complicated nested types like views of reshaped arrays.) It is possible that you have hit one of those cases when the compiler’s heuristic decides to punt the type-matching of indices to run-time. As far as I understand, there is no easy way with the available instrumentation tools to determine whether you are in that case. There are some techniques to prevent it. In another thread, Tim Holy said that you can sometimes force specialization of splatting by using Varargs instead. However, since the code with the splatting is in Base rather than your own file, this is not an option for you unless you decided to implement your own version of reshape and view specialized for your situation.
That’s not the problem here — and in general you shouldn’t worry or fear that the base library uses splatting. In cases like ReshapedArrays and SubArrays, we’ve taken great care to make sure that the splatting is “free.” Please don’t use such certainty about where you think the problem is when you’re not sure.
The problem is that we have two different versions of permutedims!:
julia> @which permutedims!(y3D,x3D,(2,3,1)) # This output is manually edited to use the StridedArray alias:
permutedims!(P::Array{T,N}, B::StridedArray{T,N}) where {T, N} in Base at multidimensional.jl:1273
julia> @which permutedims!(y3Dv,x3Dv,(2,3,1))
permutedims!(dest, src::AbstractArray, perm) in Base.PermutedDimsArrays at permuteddimsarray.jl:192
I think the multidimensional signature can actually be extended to be (::StridedArray, ::StridedArray), and that’d fix the issue here:
julia> test()
time permutedims: 9.587e-6
time permutedims (view): 0.006955433
slow down factor:725.5067278606446
# @eval Base permutedims!(::StridedArray, ::StridedArray)
julia> test()
time permutedims: 9.614e-6
time permutedims (view): 9.74e-6
slow down factor:1.0131058872477636
Edit: I don’t think it even requires strided-ness. Just linear indexing is sufficient.
I didn’t mean to express any degree of certainty in my diagnosis of the problem-- indeed, I was just offering up a guess. Thank you for finding the correct diagnosis.
Following up on your comment, I’m glad to hear that splats in Base will all specialize as expected. I’m curious how this is achieved-- is there something in the Julia code itself to insist that these particular splats gets specialized? Or are the splats checked during the continuous-integration process via test sets? The reason I ask is because in other threads, posters have wondered about splats in their own codes.
There are two potential sources for the “splatting penalty”:
It might not be type stable. If you have a collection like xs::Array where the length isn’t encoded in the type itself, then Julia needs to use dynamic dispatch at runtime to figure out which method of f (how many args) to call in f(xs…). This is by far the most common case that people run into.
If the function f doesn’t get inlined or specialized, then Julia might need to gather up the elements of xs into a temporary heap-allocated object in order to call the function.
Both of these are carefully avoided with type-stable Tuples (which know their length in their type) and with @inline or sometimes Vararg annotations throughout Base (like SubArray and ReshapedArray). There are times where we miss an inline or somesuch, but those are often easy to fix.
Thank you very much for these explanations.
If I have correctly understood, the view use is correct and do not cause performance penalty if the permutedim! signature is extended.