The operation of multiplication by a scalar and translation can be done with `Base.LinAlg.axpy!(α, x, y)`

. But I can also use `y .+= α*x`

.

I was wondering: is it recommended to use one or the other for writing library code? What does `.+=`

actually do? Does it call `axpy!`

under the hood?

In microbenchmarks (see below), the first one allocates no memory at all, although their speed is “very very close”.

```
using Compat, BenchmarkTools, DataFrames
blas_1(α, x, y) = Base.LinAlg.axpy!(α, x, y)
broadcast_partial(α, x, y) = y .+= α*x
broadcast_full(α, x, y) = @. y += α*x
n = [10^k for k in 1:6]
x, y = [rand(ni) for ni in n], [rand(ni) for ni in n]
z = copy(y); w = copy(y)
α = sqrt(pi)
table = DataFrame(Algorithm=["blas_1", "broadcast_partial", "broadcast_full"])
for (i, ni) in enumerate(n)
blas_1_bench = @benchmark blas_1($α, $x[$i], $y[$i])
broadcast_partial_bench = @benchmark broadcast_partial($α, $x[$i], $z[$i])
broadcast_full_bench = @benchmark broadcast_full($α, $x[$i], $w[$i])
times_ms = 1e-6 * [time(blas_1_bench),
time(broadcast_partial_bench),
time(broadcast_full_bench)]
table[Symbol("n = $ni (ms)")] = [signif(ti, 3) for ti in times_ms]
end
```

```
julia> table
3×7 DataFrames.DataFrame
│ Row │ Algorithm │ n = 10 (ms) │ n = 100 (ms) │ n = 1000 (ms) │ n = 10000 (ms) │ n = 100000 (ms) │ n = 1000000 (ms) │
├─────┼─────────────────────┼─────────────┼──────────────┼───────────────┼────────────────┼─────────────────┼──────────────────┤
│ 1 │ "blas_1" │ 3.0e-5 │ 0.00142 │ 0.00153 │ 0.00516 │ 0.01729 │ 0.31935 │
│ 2 │ "broadcast_partial" │ 6.0e-5 │ 0.00011 │ 0.00078 │ 0.00964 │ 0.11502 │ 2.34353 │
│ 3 │ "broadcast_full" │ 3.0e-5 │ 4.0e-5 │ 0.00027 │ 0.00332 │ 0.05153 │ 0.75011 │
```

*Edit 1:* added the full loop-fusion version.

*Edit 2:* added a benchmark function (use IJulia to see a cute table). Benchmark data on a MacBook Pro (see details below).

*Edit 3:* round to three significative digits using `signif(ti, 3)`

.

```
julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
```