Symv producing incorrect results

Can anyone reproduce the following bug?

julia> A = randn(10,20) |> X -> X * X';

julia> x = rand(10);

julia> sA = Symmetric(A);

julia> all( A .== sA )
true

julia> (A * x)'
1×10 RowVector{Float64,Array{Float64,1}}:
 11.538  7.90415  -1.03592  -3.84546  15.5033  3.37954  7.7948  4.26935  6.29426  5.57295

julia> (sA * x)'
1×10 RowVector{Float64,Array{Float64,1}}:
 21.5629  15.7696  -4.14023  -10.6005  28.2271  5.50057  17.9761  6.60855  6.29426  5.57295

julia> Base.LinAlg.BLAS.symv('U', A, x)'
1×10 RowVector{Float64,Array{Float64,1}}:
 21.5629  15.7696  -4.14023  -10.6005  28.2271  5.50057  17.9761  6.60855  6.29426  5.57295

I do on both v0.6.1 and v0.7.0. Both are linked to a local BLAS build. Additionally, symv is slow:

julia> y = similar(x);

julia> @benchmark Base.LinAlg.BLAS.symv!('U', 1.0, $A, $x, 0.0, $y)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.014 μs (0.00% GC)
  median time:      1.278 μs (0.00% GC)
  mean time:        1.426 μs (0.00% GC)
  maximum time:     187.736 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark A_mul_B!($y, $A, $x)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     75.414 ns (0.00% GC)
  median time:      75.662 ns (0.00% GC)
  mean time:        77.367 ns (0.00% GC)
  maximum time:     144.184 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     971

On a laptop, with the JuliaPro v0.6.0.1 MKL version (Julia v0.6), I cannot reproduce the bug. Besides getting the correct answer, symv was also much faster – while A_mul_B took 200 ns, symv took only 150 ns.
(EDIT: v0.7.0 on my laptop also gets the correct answer / can’t reproduce the bug.)

Perhaps this is a clue? Is it calling the wrong function somehow?

Also, the last two elements of the vector match.

No.

Yes, symv is generally slower than gemv.

No

Sigh… Seems OpenBLAS, not Julia is at fault:

program symvtest
implicit none
integer, parameter :: n = 20, p = 12
real, dimension(n, p) :: Xb
real, dimension(p, p) :: A
real, dimension(p) :: x, y, ys

call random_number(Xb)
call random_number(x)

A = matmul( transpose(Xb), Xb )
y = matmul( A, x )
call ssymv('U', p, 1.0, A, p, x, 1, 1.0, ys, 1)

print *, 'Correct:', y
print *, 'ssymv:', ys

end program symvtest
$ gfortran-7.2 symvtest.f08 -o symvt2 -pthread /opt/OpenBLAS/lib/libopenblas.a

$ ./symvt2
 Correct:   21.2189522       18.4703217       23.1896820       17.4168549       24.2416267       19.9038563       19.7339249       18.0757847       21.3725090       25.3771706       24.4452839       22.0691833    
 ssymv:   34.6876221       30.6038113      -1435453.88       28.6646252       40.5542870       32.9165344      -1692865.25       30.3047485       21.3725128       25.3771706       24.4452820       22.0691833 

Reproducing the results, down to the first 8 answers being wrong (I get the same thing when changing 10 → 12 in the Julia example). For reference, OpenBLAS issue.

Yes, symv is generally slower than gemv.

symv is faster on my laptop at n=10 and n=2000, but perhaps that’s MKL.