You’re missing a . in .+ y in the last line, and missing interpolating the variables:
julia> @btime $y .= $a.*$x .+ $y
12.048 ms (0 allocations: 0 bytes)
I don’t think it’s a fair comparison to use threads in Julia. What are the timings without using threads?