Why does THIS allocate? No hairs were harmed in this experiment ('cause I am bald...)

One more puzzle:

                                                                                                                                
julia> function cross3!(                                                                                                        
           result::AbstractVector{T1},                                                                                          
               theta::AbstractVector{T2},                                                                                       
           v::Tuple{T3, T3, T3},                                                                                                
       ) where {T1, T2, T3}                                                                                                     
           @assert (length(theta) == 3) && (length(v) == 3) "Inputs must be 3-vectors"                                          
           result[1] = -theta[3] * v[2] + theta[2] * v[3]                                                                       
           result[2] = theta[3] * v[1] - theta[1] * v[3]                                                                        
           result[3] = -theta[2] * v[1] + theta[1] * v[2]                                                                       
           return result                                                                                                        
       end                                                                                                                      
cross3! (generic function with 1 method)                                                                                        
                                                                                                                                
julia>                                                                                                                          
                                                                                                                                
julia> @time cross3!(rand(3), rand(3), (1.0, 0.0, 0.0))                                                                         
  0.000003 seconds (2 allocations: 160 bytes)                                                                                   
3-element Vector{Float64}:                                                                                                      
  0.00000e+00                                                                                                                   
  9.03374e-01                                                                                                                   
 -2.03397e-01                                                                                                                   
                                                                                                                                
julia> @code_warntype cross3!(rand(3), rand(3), (1.0, 0.0, 0.0))                                                                
MethodInstance for cross3!(::Vector{Float64}, ::Vector{Float64}, ::Tuple{Float64, Float64, Float64})                            
  from cross3!(result::AbstractVector{T1}, theta::AbstractVector{T2}, v::Tuple{T3, T3, T3}) where {T1, T2, T3} @ Main REPL[3]:1 
Static Parameters                                                                                                               
  T1 = Float64                                                                                                                  
  T2 = Float64                                                                                                                  
  T3 = Float64                                                                                                                  
Arguments                                                                                                                       
  #self#::Core.Const(cross3!)                                                                                                   
  result::Vector{Float64}                                                                                                       
  theta::Vector{Float64}                                                                                                        
  v::Tuple{Float64, Float64, Float64}                                                                                           
Body::Vector{Float64}                                                                                                           
1 ─ %1  = Main.length(theta)::Int64                                                                                             
│   %2  = (%1 == 3)::Bool                                                                                                       
└──       goto #4 if not %2                                                                                                     
2 ─ %4  = Main.length(v)::Core.Const(3)                                                                                         
│   %5  = (%4 == 3)::Core.Const(true)                                                                                           
└──       goto #4 if not %5                                                                                                     
3 ─       goto #5                                                                                                               
4 ┄ %8  = Base.AssertionError("Inputs must be 3-vectors")::Core.Const(AssertionError("Inputs must be 3-vectors"))               
└──       Base.throw(%8)                                                                                                        
5 ┄ %10 = Base.getindex(theta, 3)::Float64                                                                                      
│   %11 = -%10::Float64                                                                                                         
│   %12 = Base.getindex(v, 2)::Float64                                                                                          
│   %13 = (%11 * %12)::Float64                                                                                                  
│   %14 = Base.getindex(theta, 2)::Float64                                                                                      
│   %15 = Base.getindex(v, 3)::Float64                                                                                          
│   %16 = (%14 * %15)::Float64                                                                                                  
│   %17 = (%13 + %16)::Float64                                                                                                  
│         Base.setindex!(result, %17, 1)                                                                                        
│   %19 = Base.getindex(theta, 3)::Float64                                                                                      
│   %20 = Base.getindex(v, 1)::Float64                                                                                          
│   %21 = (%19 * %20)::Float64                                                                                                  
│   %22 = Base.getindex(theta, 1)::Float64                                                                                      
│   %23 = Base.getindex(v, 3)::Float64                                                                                          
│   %24 = (%22 * %23)::Float64                                                                                                  
│   %25 = (%21 - %24)::Float64                                                                                                  
│         Base.setindex!(result, %25, 2)                                                                                        
│   %27 = Base.getindex(theta, 2)::Float64                                                                                      
│   %28 = -%27::Float64                                                                                                         
│   %29 = Base.getindex(v, 1)::Float64                                                                                          
│   %30 = (%28 * %29)::Float64                                                                                                  
│   %31 = Base.getindex(theta, 1)::Float64                                                                                      
│   %32 = Base.getindex(v, 2)::Float64                                                                                          
│   %33 = (%31 * %32)::Float64                                                                                                  
│   %34 = (%30 + %33)::Float64                                                                                                  
│         Base.setindex!(result, %34, 3)                                                                                        
└──       return result                                                                                                         
                                                                                                                                
                  ```
Can you see any place where 80 bytes will need to be allocated?

I think these are your two 80-byte allocations:

julia> @time rand(3)
  0.000002 seconds (1 allocation: 80 bytes)
2 Likes
julia> a = rand(3); b = rand(3); @time cross3!(a, b, (1.0, 0.0, 0.0))
  0.000002 seconds
3-element Vector{Float64}:
  0.0
  0.5477837953337921
 -0.3329147734612631

Sorry, bad MWE.
I have had something like this:

    m = zeros(3, 3)
    @time begin @views let
            cross3!(m[:, 2], m[:, 1], (1.0, 0.0, 0.010))
        end
    end

which gave me

julia>     @time begin @views let                                                                                               
                   cross3!(m[:, 2], m[:, 1], (1.0, 0.0, 0.010))                                                                 
               end                                                                                                              
           end                                                                                                                  
  0.000010 seconds (4 allocations: 128 bytes)                                                                                   
3-element view(::Matrix{Float64}, :, 2) with eltype Float64:                                                                    
 0.00000e+00                                                                                                                    
 0.00000e+00                                                                                                                    
 0.00000e+00      

Looks like the view allocates

julia> m = zeros(100, 200);

julia> @time @view m[:, 2];
  0.000024 seconds (2 allocations: 64 bytes)

Don’t know why or where though …

PS: No, its the global, i.e., using let or const there are no allocations:

julia> let A = zeros(100, 200)
           @time @view A[:, 1]
           :tada
       end
  0.000002 seconds
:tada

Note to myself: Always use @btime and interpolation, i.e., @btime @view $(m)[:, 1];

3 Likes

That shaves off one of the allocations, the other is because @time itself was called in the global scope

julia> m = zeros(100, 200);

julia> @view(m[:,2]); @time @view(m[:,2]);
  0.000007 seconds (2 allocations: 64 bytes)

julia> function foo(m,i) @view(m[:,i]) end;

julia> foo(m, 2); @time foo(m, 2);
  0.000006 seconds (1 allocation: 48 bytes)

julia> function timefoo(m,i) @time @view(m[:,i]) end;

julia> timefoo(m, 2);
  0.000001 seconds

I can’t find the more recent thread anymore because allocating views was so much of a problem before v1.5, but I think there was a case where a view reshaped the input array thus causing a real allocation. But I’m not sure what the circumstance is.

2 Likes

My problem is actually still somewhere (as @Benny just demonstrated cross3 has been falsely blamed by me). My full MWE is

module mcs002xxx
using LinearAlgebra
using Statistics
using InteractiveUtils

function cross3!(
    result::AbstractVector{T1},
    theta::AbstractVector{T2},
    v::Tuple{T3, T3, T3},
) where {T1, T2, T3}
    @assert (length(theta) == 3) && (length(v) == 3) "Inputs must be 3-vectors"
    result[1] = -theta[3] * v[2] + theta[2] * v[3]
    result[2] = theta[3] * v[1] - theta[1] * v[3]
    result[3] = -theta[2] * v[1] + theta[1] * v[2]
    return result
end

@views function gen_iso_csmat!(csmatout::Matrix{T}, XYZ::Matrix{T}, tangents::Matrix{T}, feid::IT, qpid::IT) where {T, IT}
    sdim, mdim = size(tangents)
    if sdim == mdim # finite element embedded in space of the same dimension
        for i in 1:size(csmatout, 1), j in 1:size(csmatout, 2)
            csmatout[i, j] = (i == j ? one(T) : zero(T))
        end
    else # lower-dimensional finite element embedded in space of higher dimension
        @assert 0 < mdim < sdim
        @assert 0 < sdim
        csmatout[:, 1] = tangents[:, 1]
        csmatout[:, 1] /= norm(csmatout[:, 1])
        if mdim == 1 # curve-like finite element in 2d or 3d
            # all done
        elseif mdim == 2 # surface-like finite element in 3d
            e2 = (tangents[1, 2], tangents[2, 2], tangents[3, 2])
            cross3!(csmatout[:, 2], csmatout[:, 1], e2)
            csmatout[:, 2] /= norm(csmatout[:, 2])
        end
    end
    return csmatout
end

function doit()
    XYZ = reshape([0.2, 0.3, 0.4], 1, 3)
    tangents = reshape([1.0, 0.0, 0.0], 3, 1)
    m = zeros(3, 3)

    @time begin @views let
            cross3!(m[:, 2], m[:, 1], (1.0, 0.0, 0.010))
        end
    end

    @time gen_iso_csmat!(m, XYZ, tangents, 0, 0)

    true
end
doit()
nothing
end

I get

julia> include("C:\\Users\\pkonl\\Documents\\00WIP\\FinEtools.jl\\playn.jl")                                                    
WARNING: replacing module mcs002xxx.                                                                                            
  0.000001 seconds                                                                                                              
  0.000002 seconds (1 allocation: 80 bytes)                                                                                     
Main.mcs002xxx    

which means gen_iso_csmat! allocates. The path actually does not involve cross3!: the empty block # all done is hit.

Solved: please see below.

This code manages not to allocate anything. (As opposed to that mentioned in the thread Why does THIS allocate? No hairs were harmed in this experiment ('cause I am bald…) - General Usage - Julia Programming Language (julialang.org).)

module mcs006xxx
using LinearAlgebra
using Statistics
using InteractiveUtils

function cross3!(
    result::AbstractVector{T1},
    theta::AbstractVector{T2},
    v::Tuple{T3, T3, T3},
) where {T1, T2, T3}
    @assert (length(theta) == 3) && (length(v) == 3) "Inputs must be 3-vectors"
    result[1] = -theta[3] * v[2] + theta[2] * v[3]
    result[2] = theta[3] * v[1] - theta[1] * v[3]
    result[3] = -theta[2] * v[1] + theta[1] * v[2]
    return result
end

function gen_iso_csmat!(csmatout::Matrix{T}, XYZ::Matrix{T}, tangents::Matrix{T}, feid::IT, qpid::IT) where {T, IT}
    sdim, mdim = size(tangents)
    if sdim == mdim # finite element embedded in space of the same dimension
        for i in 1:size(csmatout, 1), j in 1:size(csmatout, 2)
            csmatout[i, j] = (i == j ? one(T) : zero(T))
        end
    else # lower-dimensional finite element embedded in space of higher dimension
        @assert 0 < mdim < sdim
        @assert 0 < sdim
        @view(csmatout[:, 1]) .= @view(tangents[:, 1])
        @view(csmatout[:, 1]) ./= norm(@view(csmatout[:, 1]))
        if mdim == 1 # curve-like finite element in 2d or 3d
             # all done
        elseif mdim == 2 # surface-like finite element in 3d
            e2 = (tangents[1, 2], tangents[2, 2], tangents[3, 2])
            @views cross3!(csmatout[:, 2], csmatout[:, 1], e2)
            # @views csmatout[:, 2] /= norm(csmatout[:, 2])  ALLOCATES!
            @view(csmatout[:, 2]) ./= norm(@view(csmatout[:, 2]))
        end
    end
    return csmatout
end

function doit()
    XYZ = reshape([0.2, 0.3, 0.4], 1, 3)
    tangents = reshape([1.0, 0.0, 0.0, 1.0, 0.0, 0.0], 3, 2)
    m = zeros(3, 2)

    @time gen_iso_csmat!(m, XYZ, tangents, 0, 0)

    true
end
doit()
nothing
end

Surprisingly, there seems to be a difference between @views and individual applications of the @view macro.

So, for instance

@views csmatout[:, 2] /= norm(csmatout[:, 2])

is found to allocate, whereas

@view(csmatout[:, 2]) ./= norm(@view(csmatout[:, 2]))

does not. Does anyone have an explanation? (Is it the dot?)

Edit: Yes, it is. There is NO difference between the effect of those two macros. It was the dot (./= vs. /=). Well, it took me a while, but I did learn something. :wink:

2 Likes

It still isn’t clear to me why @views csmatout[:, 2] /= norm(csmatout[:, 2]) would allocate. Any suggestions?

Simplifying that a bit, the expression A /= b is simply doing A = A / b. The best mental model for assignment in Julia is that is a naming exercise — you’re just choosing what name you want to use for whatever happens on the right hand side. In this case, you’re changing what you’re using the name A for — and making it the name for whatever the result of A/b is. If A / b creates a new array, then this line allocates.

1 Like

I do understand the transformation A /= bA = A / b. But I would think that that would create a view, which should not allocate!? That’s why I used @views, no?

1 Like

@views only transforms indexing expressions; there’s no indexing syntax in A / b.

1 Like

I think this is what you’re saying: @views csmatout[:, 2] /= norm(csmatout[:, 2]) will become

A = csmatout[:, 2]
@view(csmatout[:, 2]) = A / norm(@view(csmatout[:, 2]))

For some reason my mental picture was the transformation

@view(csmatout[:, 2]) = @view(csmatout[:, 2]) / norm(@view(csmatout[:, 2]))

Is the above consistent with your understanding?

That is not at all what I’m saying. The @views macro does indeed transform all indexing expressions into views. What it doesn’t change is the / operator. And dividing a vector by a number allocates a new vector unless you a fused broadcasted assignment. That’s your allocation. That’s why I’m simplifying this to A / b. It doesn’t matter where you got A and b from.

3 Likes

So what will this @views csmatout[:, 2] /= norm(csmatout[:, 2]) be transformed into? Will there still be views?

A = view(csmatout, :, 2)
b = norm(A)
temp = A / b
setindex!(csmatout, temp, :, 2)

The allocation is happening in the temporary created by the division of A / b.

1 Like

You can see the answer for yourself:

julia> @macroexpand(@views csmatout[:, 2] /= norm(csmatout[:, 2]))
:(let var"##a#292" = csmatout, var"##i#293" = (:), var"##i#294" = 2
      var"##a#292"[var"##i#293", var"##i#294"] = (Base.maybeview)(var"##a#292", var"##i#293", var"##i#294") / norm((Base.maybeview)(csmatout, :, 2))
  end)

or making it a bit prettier:

julia> using MacroTools

julia> MacroTools.prettify(@macroexpand(@views csmatout[:, 2] /= norm(csmatout[:, 2])))
:(let mink = csmatout, porpoise = (:), gorilla = 2
      mink[porpoise, gorilla] = maybeview(mink, porpoise, gorilla) / norm(maybeview(csmatout, :, 2))
  end)
2 Likes