Using Flux, I would like to train a regression model with both continuous and categorical features. Since the categorical variable(s) may have a many levels, I would prefer to map these into a few new continuous variables before they are concatenated with the remaining continuous features. Are there better/simpler ways to achieve this than the code example below? Any comment on the code is welcome.
using Flux n = 10_000 x = vcat(rand(1:10, 1, n), rand(Float32, 5, n)) # 1st row categorical with 10 levels y = rand(Float32, n) trdata = Flux.Data.DataLoader((Flux.onehotbatch(x[1,:], 1:10), x[2:end,:]), y, batchsize = 100) function create_model(embedding, main_model) return function(x) x1 = embedding(x) x2 = cat(x1, x, dims=1) return main_model(x2) end, params(embedding, main_model) end m, prm = create_model(Dense(10,3), Chain(Dense(8,5), Dense(5,1))) loss(x, y) = Flux.mse(m(x), y) @time Flux.train!(loss, prm, collect(trdata), ADAM()) ## 2nd time: 0.028417 seconds (52.83 k allocations: 24.259 MiB, 33.36% gc time)