I train my MLJ Model (EvoTrees.jl) on a Julia DataFrame with the scalars x1, x2, and x3 as features and y as label. However, when doing the inference, I have higher dimensional arrays instead of tabular data. Any suggestions how to code this in a nice way?
df = DataFrame()
#input is nearest neighbor from a grid
df.x1 = rand(100)
df.x2 = rand(100)
df.x3 = rand(100)
#target is measured at a point within grid
df.y = rand(100)
ETR = MLJ.@load EvoTreeRegressor pkg=EvoTrees
evotree = ETR()
mach = machine(evotree, df[:, Not(:y)], df.y)
fit!(mach)
######## inference ########
X1_inf = rand(100, 100, 10)
X2_inf = rand(100, 100, 10)
X3_inf = rand(100, 100, 10)
#this is what I want to do,
#I can come up with ways that work but feel wrong,
#like three for loops and casting, but how to do it in a smart way?
Y_inf = MLJ.predict(mach, [X1_inf, X2_inf, X3_inf])
The most simple way would be to train on arrays instead of a DataFrame and then broadcast, or, during inference, to make a for loop over all indices and then cast it to a DataFrame. But is there a smarter way?
For context if this helps, I have a climate model that is gridded (latitude, longitude, height), and a plane trajectory going through the grid. For training, I take the weather (e.g. temperature, air pressure and wind direction) at the nearest neighbor from the grid as features and the measurements (e.g. relative humidity) from the plane as target, so tabular data. For inference, I want to feed in all grid points of the weather model and get out a grid as well.