Using Flux, I would like to train a regression model with both continuous and categorical features. Since the categorical variable(s) may have a many levels, I would prefer to map these into a few new continuous variables before they are concatenated with the remaining continuous features. Are there better/simpler ways to achieve this than the code example below? Any comment on the code is welcome.
using Flux
n = 10_000
x = vcat(rand(1:10, 1, n), rand(Float32, 5, n)) # 1st row categorical with 10 levels
y = rand(Float32, n)
trdata = Flux.Data.DataLoader((Flux.onehotbatch(x[1,:], 1:10), x[2:end,:]), y,
batchsize = 100)
function create_model(embedding, main_model)
return function(x)
x1 = embedding(x[1])
x2 = cat(x1, x[2], dims=1)
return main_model(x2)
end, params(embedding, main_model)
end
m, prm = create_model(Dense(10,3), Chain(Dense(8,5), Dense(5,1)))
loss(x, y) = Flux.mse(m(x), y)
@time Flux.train!(loss, prm, collect(trdata), ADAM())
## 2nd time: 0.028417 seconds (52.83 k allocations: 24.259 MiB, 33.36% gc time)