What's wrong with this Flux model definitin?

I am on the latest master branch of Flux.jl

using Flux
using CuArrays
CuArrays.allowscalar(false)

x = gpu(rand(Float32, 1_000_000, 2))
y = 2 .* x + gpu(rand(Float32, 1_000_000))

w = gpu(rand(2, 1))
loss(x, y) = Flux.mse(x*w,y)

Flux.train!(loss, w, ((x,y),), ADAM())

I am trying to train a simple linear model. But I keep getting scalar error.

ERROR: scalar getindex is disallowed
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /home/zj/.julia/packages/GPUArrays/0lvhc/src/indexing.jl:14
 [3] getindex at /home/zj/.julia/packages/GPUArrays/0lvhc/src/indexing.jl:54 [inlined]
 [4] iterate at ./abstractarray.jl:914 [inlined]
 [5] iterate(::CuArray{Float32,2,Nothing}) at ./abstractarray.jl:912
 [6] Params at /home/zj/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:73 [inlined]
 [7] #train!#12(::Flux.Optimise.var"#16#22", ::typeof(Flux.Optimise.train!), ::Function, ::CuArray{Float32,2,Nothing}, ::Tuple{Tuple{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}, ::ADAM) at /home/zj/.julia/dev/Flux/src/optimise/train.jl:64
 [8] train!(::Function, ::CuArray{Float32,2,Nothing}, ::Tuple{Tuple{CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}}}, ::ADAM) at /home/zj/.julia/dev/Flux/src/optimise/train.jl:64
 [9] top-level scope at REPL[26]:100:

It seems ot be saying iterate is the issue, but where is the issue? Is it iterating through x? How to debug this?

The errors seems to be not wrapping w in params. So this works

using Flux
using CuArrays
CuArrays.allowscalar(false)

x = gpu(rand(Float32, 1_000_000, 2))
y = x*gpu([2, 2]) + gpu(rand(Float32, 1_000_000))

w = gpu(rand(Float64, 2, 1))
loss(x, y) = Flux.mse(x*w,y) |> gpu

Flux.train!(loss, params(w), ((x,y),), ADAM())