Adding vectors of different lengths

performance

#1

I am porting code from Matlab to Julia, and I am having trouble finding an efficient way of adding vectors that differ in length. Here are the essential lines, from inside a recursive function:

% Matlab code
t1 = recursive_function(a+1,b-1,c,d,params);
t2 = recursive_function(a  ,b-1,c,d,params);  % numel(t2)==numel(t1)-1
result = t1 + [params.AB*t2 0];               % one example
result = t1 + [0 params.AB*t2];               % another example

result is what gets returned by the function. I translated this to Julia in the obvious (and likely overly naïve) way

# Julia code
t1 = recursive_function(a+1,b-1,c,d,params)
t2 = recursive_function(a  ,b-1,c,d,params)    # numel(t2)==numel(t1)-1
result = t1 + push!(params.AB*t2,0.0)          # one example
result = t1 + unshift!(0.0,params.AB*t2)       # another example

With this, I get a lot of memory allocations. @code_warntype doesn’t highlight any type instabilities.

Is there a more efficient way to calculate result?


#2

Are you able to do this in-place, i.e., destroy t1? If so, you should try

@view(t1[1:end-1]) .+= t2
@view(t1[2:end]) .+= t2

#3

My attempt on version 0.5:

julia> t1, t2 = rand(10^3), rand(10^3-1);
julia> @view(t1[1:end-1]) .+= t2
syntax: invalid assignment location "true&&Base.view(t1,1:-(Base.endof(t1),1))"

while
julia> t1[1:end-1] .+= t2
works fine. Very odd.


#4

I’m not sure why @fengyang.wang’s solution isn’t working for me, but here is a pretty speedy alternative:

for i in eachindex(t2)
    @inbounds t1[i+1] += t2[i]
end
for i in eachindex(t2)
    @inbounds t1[i] += t2[i]
end

#5

You can’t use @view on the lhs of an assignment. In 0.6, you can do:

@views t1[1:end-1] .+= t2

to update t1 in-place, but writing out the loop is a fast and reasonable option as well.


#6

I hadn’t realized that. Is there any reason why @view cannot be used on the LHS? I was under the impression that .+= lowered to a broadcast! call, so the LHS could be any object, evaluated greedily.


#7

Yes. @view a[i] transforms the expression to view(a, i). So, if you did @view a[i] = 3 it would transform to view(a, i) = 3, which is a function declaration and not an assignment.

As discussed in https://github.com/JuliaLang/julia/pull/16564, the @view macro was therefore purposely designed in a funny way to prevent you from using it on the lhs of an assignment. Unfortunately, this prevents it from being used on the lhs of .=, too, where a view call would be fine.

Note that doing a slice on the lhs of = actually does not produce a slice object: a[1:2] = ... turns into a call to setindex!(a, ..., 1:2). So, it already acts in-place and there is no need for @view. The problem comes with an updating operator like +=, where the lhs is actually used on both the left and right-hand sides of the expression.

The solution is to use @views on the whole expression (or a block containing the expression); this looks at the whole expression and does the right thing to prevent any slices from allocating copies. @views is a new feature of 0.6, but works on 0.4 and 0.5 via the latest Compat.


#8

I still do not fully understand why it is prevented from being on the LHS of .+=. To be precise in where I am confused, here is an annotated REPL session

julia> A = rand(5);

julia> (true && A) = 1;  # assignment, expected to fail
ERROR: syntax: invalid assignment location "true&&A"

julia> (true && A) += 1;  # assignment, expected to fail
ERROR: syntax: invalid assignment location "true&&A"

julia> (true && A) .= 1;  # broadcast!, expected to work

julia> (true && A) .+= 1;  # broadcast!, expected to work
ERROR: syntax: invalid assignment location "true&&A"

It is this last line that confuses me. To borrow some not-100%-accurate terminology from C++, my understanding was that what goes on the LHS of = and += must be an “lvalue”, which (true && A) is not. But what goes on the LHS of .= and .+= is an “rvalue”, which (true && A) is. Indeed it seems I am correct for .=, but not for .+=.


#9

After thinking about it some more, I understand now why the .+= case is more difficult than the .= case, although I think it could be supported with a little more effort. The idea is that

a += x

expands to

a = a + x

and the a is duplicated onto both left and right hand sides. This cannot be an issue because for the standard = operator, the only kinds of expressions a for which this makes sense are names, fields, and refs. For each of these, the operation of “setting” is a fundamentally different operation than “getting”.

In the case of

A .= x

considerably greater flexibility is available for the left-hand side A, and indeed this can take on any value that can normally be on the right hand side without issue. (Except, perhaps, as a special case values like A[x:y] are rewritten to be views, because otherwise the behaviour of modifying a newly-allocated object and then throwing away the result is nonsensical and error-prone. This seems to be the case on 0.5 at least; I don’t know if it has been changed in 0.6.)

In the case of

A .+= x

it is expanded to

A .= A .+ x

and it is here that A is restricted to being something simple. Suppose it were not; then we might have

somefunction(A) .= somefunction(A) .+ x

and the somefunction would be called twice, perhaps expensive, and perhaps with additional unwanted side effects. Instead this would need to be expanded to

let tmp = somefunction(A)
    tmp .= tmp .+ x
end

which is not currently supported by the update operator expander.

Hence, if my understanding is correct, it is not a semantic reason that @views(A[x:y]) .+= x is not allowed, but instead simply that the code to do it properly has not been written yet, so it is prohibited. Since alternatives like @views exist, presumably this is a low priority and unlikely to make it into 0.6.

Does this sound right?


#10

This works great, giving 50x speedup - wonderful, thanks!

Question concerning the usage of @inbounds: What’s the difference between the above and the following?

@inbounds for i in eachindex(t2)
    t1[i+1] += t2[i]
end

(The documentation gives both. This post doesn’t quite clarify this.) I don’t see any noticeable performance difference for my use case, but there are differences where $(Expr(:inbounds, true)) shows up in the @code_warntype output. Here is my test code:

function test1!(t1,t2,M)
  for k in 1:M
    for i in eachindex(t2)
        @inbounds t1[i+1] += t2[i]
    end
  end
end

function test2!(t1,t2,M)
  for k in 1:M
    @inbounds for i in eachindex(t2)
        t1[i+1] += t2[i]
    end
  end
end

N = 1000
t1 = zeros(N+1)
t2 = rand(N)
M = 10000
@code_warntype test1!(t1,t2,M)
@code_warntype test2!(t1,t2,M)

#11

As far as I know, there is no difference. I chose this for aesthetic reasons.

Be aware, though, that you should be very certain when you use @inbounds. If there is any uncertainty about the lengths of the vectors you should drop it, or perhaps explicitely perform a check before the loop.


#12

You’re right, I think it’s a bug in the parser. In 0.4, x .+= y corresponded to x = x .+ y. When that was changed to mean x .= x .+ y in 0.5, I guess we forgot to eliminate the syntax error for the LHS being an rvalue. Can you file an issue?


#13

Yes, but it would be straightforward, in principle, to eliminate the double call during lowering.