Right way of applying `inverse_transform`

ctrebbau · June 21, 2022, 1:19pm

Hi, I have a setup like this:

dftrain, dftest = partition(df, 0.7, shuffle=true, rng=123)

datapipe = ContinuousEncoder() |> Standardizer()
datatrans_mach = machine(datapipe, dftrain) |> fit!
normalized_train = MLJ.transform(datatrans_mach, dftrain)
normalized_test = MLJ.transform(datatrans_mach, dftest)

normalizer = fitted_params(datatrans_mach).machines[2]

ytrain, Xtrain = normalized_train.target, select(normalized_train, Not(target))
ytest, Xtest = normalized_test.target, select(normalized_test, Not(target))

knn = KNNRegressor()
knnM = machine(knn, Xtrain, ytrain) |> fit!

All well and good, but when I do

predict(knn, inverse_transform(normalizer, Xtest))

I get
ERROR: Attempting to transform data with incompatible feature labels.
So I thought that since Standardizer was trained on dftrain which includes the target I tried

predict(knn, inverse_transform(normalizer, hcat(Xtest, ytest)))

But that is evidently not it, since I get a more fundamental incompatibility ERROR: ArgumentError: dimension of input points:44 and tree data:43 must agree

And if I first predict and then inverse transform i.e.

inverse_transform(normalizer, predict(knn, Xtest))

I get ERROR: type Nothing has no field names

So, how can I get the predictions on the original scale?

svilupp · June 22, 2022, 12:51pm

What are you hoping to do?

If I understood your example, Xtrain is your normalized training dataset and Xtest is your normalized testing dataset.
So if you trained your model on Xtrain, you should be able to do predict() on Xtest like this

# get predictions for your test dataset
ytest_hat=predict(knnM, Xtest)

It is possible that your error is simply a typo, because your fitted machine is called knnM whereas your predict() call is against knn

In general, would it be possible for you to change your workflow and separate your X and y early on (ala Common MLJ Workflows )

That way your target transformations would be separate and would be easy to debug if you have any problems. I’d argue it’s the more common way, because there are transforms that could introduce leakage from target into your features, so you tend to separate those early on.

Eg, changing your code to:

y, X =  unpack(df, ==(:target), rng=123);
(Xtrain, Xtest), (ytrain, ytest)  = partition((X, y), 0.7, shuffle=true,multi=true,  rng=123)

datapipe = ContinuousEncoder() |> Standardizer()
datatrans_mach = machine(datapipe, Xtrain) |> fit!

normalized_train = MLJ.transform(datatrans_mach, Xtrain)
normalized_test = MLJ.transform(datatrans_mach, Xtest);

knn = KNNRegressor()
knnM = machine(knn, normalized_train, ytrain) |> fit!

# out of sample predictions that you can evaluate performance on
ytest_hat=predict(knnM, normalized_test)

samuel_okon · June 23, 2022, 3:36pm

@ctrebbau I’m not sure I get your question. But check to see if the following code does what you want. You could use whatever workflow you wish to but using the MLJ workflows @svilupp as pointed out makes things easier conceptually.

# This assumes that the name of your target feature is `:target`
# You can replace this with the actual name of your target feature
knnp = (X -> select(X, Not(:target))) |> KNNRegressor
knnM = machine(knnp, normalized_train, ytrain) |> fit!
predict(knnM, inverse_transform(normalizer, normalized_test))

ctrebbau · June 25, 2022, 6:27pm

Hi, thank you for your prompt help; sorry for my late response. I’m sorry I wasn’t able to explain myself more clearly. I’ve adhered more closely to the more standard workflow, separating target and features earlier, even before normalizing, and I’m happy to report I’m getting more sensible predictions now.

Topic		Replies	Views
Problem standardizing data with MLJ + NaN predictions in Flux Machine Learning flux , mlj , nan	3	1024	November 8, 2021
Standardize dataset with StatsBase Machine Learning	1	1022	April 4, 2020
Fit learner using CombineML New to Julia package	5	688	November 12, 2018
Taking Fitting Seriously Data plotting	39	5827	December 8, 2018
ANN: MLLabelUtils.jl Machine Learning package , announcement	0	994	January 1, 2017

Right way of applying `inverse_transform`

Related topics