While upgrading this package to Julia 0.7, I noticed a quite large performance regression in vectorized operations. This is the script I used for benchmarking:

```
n = 12500
Ch = rand(n)
Sh = rand(n)
C2 = rand(n)
S2 = rand(n)
C = rand(n)
S = rand(n)
function test!(P, Ch, Sh, C2, S2, C, S, YY)
tan_2ωτ = @. (S2 - 2 * S * C) / (C2 - (C * C - S * S))
C2w = @. 1 / (sqrt(1 + tan_2ωτ * tan_2ωτ)) # = cos(2 * ωτ)
S2w = @. tan_2ωτ * C2w # = sin(2 * ωτ)
Cw = @. sqrt((1 + C2w) / 2) # = cos(ωτ)
Sw = @. sign(S2w) * sqrt((1 - C2w) / 2) # = sin(ωτ)
return P .= @. ((Ch * Cw + Sh * Sw) ^ 2 /
((1 + C2 * C2w + S2 * S2w) / 2 - (C * Cw + S * Sw) ^ 2) +
(Sh * Cw - Ch * Sw) ^ 2 /
((1 - C2 * C2w - S2 * S2w) / 2 - (S * Cw - C * Sw) ^ 2)) / YY
end
@benchmark test!(P, $Ch, $Sh, $C2, $S2, $C, $S, 3.14) setup=(P = Vector{Float64}(n))
```

On Julia 0.6.4:

```
BenchmarkTools.Trial:
memory estimate: 489.09 KiB
allocs estimate: 15
--------------
minimum time: 447.034 μs (0.00% GC)
median time: 457.194 μs (0.00% GC)
mean time: 477.651 μs (2.20% GC)
maximum time: 1.305 ms (52.66% GC)
--------------
samples: 10000
evals/sample: 1
```

On master (updated yesterday):

```
BenchmarkTools.Trial:
memory estimate: 491.78 KiB
allocs estimate: 137
--------------
minimum time: 750.140 μs (0.00% GC)
median time: 765.811 μs (0.00% GC)
mean time: 788.548 μs (2.37% GC)
maximum time: 42.660 ms (98.12% GC)
--------------
samples: 5519
evals/sample: 1
julia> versioninfo()
Julia Version 0.7.0-beta2.98
Commit 77a4cb5a07 (2018-07-24 21:03 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, haswell)
```

This is more than 60% slowdown. The function allocates also a lot more. It can be replicated with any value of `n`

, including 1. Is this a known issue? I found other regressions in my package, but this one looks serious to me.