How to train MLJ model on DataFrame, but apply on Vector of Arrays?

I train my MLJ Model (EvoTrees.jl) on a Julia DataFrame with the scalars x1, x2, and x3 as features and y as label. However, when doing the inference, I have higher dimensional arrays instead of tabular data. Any suggestions how to code this in a nice way?

df = DataFrame()
#input is nearest neighbor from a grid
df.x1 = rand(100)
df.x2 = rand(100)
df.x3 = rand(100)
#target is measured at a point within grid
df.y = rand(100)

ETR = MLJ.@load EvoTreeRegressor pkg=EvoTrees
evotree = ETR()
mach = machine(evotree, df[:, Not(:y)], df.y)
fit!(mach)

######## inference ########
X1_inf = rand(100, 100, 10)
X2_inf = rand(100, 100, 10)
X3_inf = rand(100, 100, 10)

#this is what I want to do, 
#I can come up with ways that work but feel wrong, 
#like three for loops and casting, but how to do it in a smart way?
Y_inf = MLJ.predict(mach, [X1_inf, X2_inf, X3_inf])

The most simple way would be to train on arrays instead of a DataFrame and then broadcast, or, during inference, to make a for loop over all indices and then cast it to a DataFrame. But is there a smarter way?

For context if this helps, I have a climate model that is gridded (latitude, longitude, height), and a plane trajectory going through the grid. For training, I take the weather (e.g. temperature, air pressure and wind direction) at the nearest neighbor from the grid as features and the measurements (e.g. relative humidity) from the plane as target, so tabular data. For inference, I want to feed in all grid points of the weather model and get out a grid as well.

Seems that predict already works on arrays, i.e.,

julia> MLJ.predict(mach, rand(4, 3))  # Just pass 3 columns instead of a data frame
4-element Vector{Float32}:
 0.354614
 0.889316
 0.60572267
 0.23449774

Given that you have several options to run over the grid:

  1. Apply model elementwise and collect results:
    julia> @time stack((x1,x2,x3) -> only(MLJ.predict(mach, [x1 x2 x3])), X1_inf, X2_inf, X3_inf) |> size
    18.615153 seconds (65.36 M allocations: 6.271 GiB, 4.73% gc time, 0.48% compilation time)
    (100, 100, 10)
    
    This is rather slow though
  2. Create a suitable array, e.g., via reshaping
    function predict_grid(mach, x1, x2, x3)
        s = size(x1)
        @assert s == size(x2) == size(x3)
        x = reshape(stack([x1, x2, x3]), prod(s), 3)
        reshape(MLJ.predict(mach, x), s...)
    end
    
    which runs much faster
    julia> @time predict_grid(mach, X1_inf, X2_inf, X3_inf)  |> size
    0.235947 seconds (664 allocations: 3.402 MiB)
    (100, 100, 10)
    

@bertschi @brandon698sherrick
Thanks a lot for the help. I will reshape my data in a separate funciton.