Problem standardizing data with MLJ + NaN predictions in Flux

I’m new to Julia and Flux so I suspect this may be something dumb that I may be missing…

I was trying to do some pre-processing in order to train a NN using Flux and ended up stuck with two problems:

  1. The predictions of the Neural Network were all NaN. I suspected that the parameters were not being updated but it turns out that they were but still I get only NaNs.

  2. The Standardizer() transformer from MLJ is not able to apply inverse_transform to the normalized arrays for the evaluation of the NN as it results in a LoadError: type Nothing has no field names (see stacktrace below). The weird thing is that the transformation seems to have been carried out as expected when the values of the normalized DataFrames are printed.

I also had problems with the Standardizer() not scaling Int64 features in the DataFrame which were resolved by just converting manually (lines 21:23). I wonder if there’s any better way to do it.

The data was downloaded from this Kaggle dataset page

The whole script that reproduces these two problems is pasted below:

using Flux
using CSV
using DataFrames
using MLJ
using ScientificTypes
using Tables


function preprocess(x, y; train_size = 0.8)

    columns_to_delete = [
        "FLUVENTS",
        "DYSTROPEPTS",
        "ORTHENTS",
        "UDALFS",
        "USTALFS",
    ]

    select!(x, Not(columns_to_delete))

    for feature in names(x)
        x[!, feature] = convert.(Float64, x[:, feature])
    end

    limit = trunc(Int64, size(x, 1) * train_size)

    xtrain = x[begin:limit, :]
    xtest =  x[limit:end,   :]

    ytrain = y[begin:limit, :]
    ytest =  y[limit:end,   :]


    xtrain_std_mach = machine(Standardizer(), xtrain)
    ytrain_std_mach = machine(Standardizer(), ytrain)

    xtest_std_mach  = machine(Standardizer(), xtest)
    ytest_std_mach  = machine(Standardizer(), ytest)


    norm_xtrain = MLJ.transform(fit!(xtrain_std_mach), xtrain)
    norm_xtest  = MLJ.transform(fit!(xtest_std_mach), xtest)

    norm_ytrain = MLJ.transform(fit!(ytrain_std_mach), ytrain)
    norm_ytest  = MLJ.transform(fit!(ytest_std_mach), ytest)


    norm_xtrain = Array(norm_xtrain)' 
    norm_xtest  = Array(norm_xtest)'
    norm_ytrain = Array(norm_ytrain)'
    norm_ytest  = Array(norm_ytest)'


    return norm_xtrain, norm_ytrain, norm_xtest, norm_ytest, ytest_std_mach
end


function train(xtrain, ytrain; epochs::Int64)

    dataloader = Flux.DataLoader((xtrain, ytrain), batchsize = 4, shuffle = true)

    model = Chain(
        Dense(16, 100, relu),
        Dense(100, 100, relu),
        Dense(100, 1, sigmoid),
    )

    loss(x, y) = Flux.Losses.mse(model(x), y)
    optimizer = Flux.ADAM()

    for current_epoch in range(1, epochs)
        println("Epoch $current_epoch/$epochs")
        Flux.train!(loss, Flux.params(model), dataloader, optimizer)
    end

    return model
end


function evaluate(model, xtest, ytest, target_scaler)

    norm_preds = model(xtest) # all preds are NaN

    real = inverse_transform(target_scaler, ytest) # can't inverse transform data
    pred = inverse_transform(target_scaler, norm_preds) 

    println(Flux.Losses.mse(preds, real))
end


function main()
    x = CSV.read("./X1.csv", DataFrame)
    y = CSV.read("./y1.csv", DataFrame)

    xtrain, ytrain, xtest, ytest, target_scaler = preprocess(x, y)

    model = train(xtrain, ytrain, epochs = 15)

    evaluate(model, xtest, ytest, target_scaler)

    return 
end

main()

The corresponding stack trace:

[ Info: Training Machine{Standardizer,…}.
[ Info: Training Machine{Standardizer,…}.
┌ Warning: Extremely small standard deviation encountered in standardization.
└ @ MLJModels ~/.julia/packages/MLJModels/4sRmw/src/builtins/Transformers.jl:500
┌ Warning: Extremely small standard deviation encountered in standardization.
└ @ MLJModels ~/.julia/packages/MLJModels/4sRmw/src/builtins/Transformers.jl:500
┌ Warning: Extremely small standard deviation encountered in standardization.
└ @ MLJModels ~/.julia/packages/MLJModels/4sRmw/src/builtins/Transformers.jl:500
┌ Warning: Extremely small standard deviation encountered in standardization.
└ @ MLJModels ~/.julia/packages/MLJModels/4sRmw/src/builtins/Transformers.jl:500
[ Info: Training Machine{Standardizer,…}.
[ Info: Training Machine{Standardizer,…}.
ERROR: LoadError: type Nothing has no field names
Stacktrace:
  [1] getproperty
    @ ./Base.jl:42 [inlined]
  [2] _standardize
    @ ~/.julia/packages/MLJModels/4sRmw/src/builtins/Transformers.jl:876 [inlined]
  [3] inverse_transform(#unused#::Standardizer, fitresult::NamedTuple{(:is_univariate, :is_invertible, :fitresult_given_feature), Tuple{Bool, Bool, Dict{Symbol, Tuple{Float64, Float64}}}}, X::LinearAlgebra.Adjoint{Float64, Matrix{Float64}})
    @ MLJModels ~/.julia/packages/MLJModels/4sRmw/src/builtins/Transformers.jl:861
  [4] inverse_transform(mach::Machine{Standardizer, true}, Xraw::LinearAlgebra.Adjoint{Float64, Matrix{Float64}})
    @ MLJBase ~/.julia/packages/MLJBase/u6vLz/src/operations.jl:88
  [5] evaluate(model::Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(σ), Matrix{Float32}, Vector{Float32}}}}, xtest::LinearAlgebra.Adjoint{Float64, Matrix{Float64}}, ytest::LinearAlgebra.Adjoint{Float64, Matrix{Float64}}, target_scaler::Machine{Standardizer, true})
    @ Main ~/Documents/projetos/minicurso-ML/texte/model.jl:84
  [6] main()
    @ Main ~/Documents/projetos/minicurso-ML/texte/model.jl:99
  [7] top-level scope
    @ ~/Documents/projetos/minicurso-ML/texte/model.jl:104
  [8] include
    @ ./client.jl:451 [inlined]
  [9] top-level scope
    @ ./timing.jl:210 [inlined]
 [10] top-level scope
    @ ./REPL[11]:0
 [11] top-level scope
    @ ~/.julia/packages/CUDA/YpW0k/src/initialization.jl:52
in expression starting at /var/home/enzo/Documents/projetos/minicurso-ML/texte/model.jl:104

Some info about the package versions:

julia> versioninfo()
Julia Version 1.7.0-beta4.2
Commit d0c90f37ba (2021-08-24 12:35 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

(@v1.7) pkg> status
      Status `~/.julia/environments/v1.7/Project.toml`
  [27a7e980] Animations v0.4.1
  [6e4b80f9] BenchmarkTools v1.2.0
  [336ed68f] CSV v0.9.10
  [5ae59095] Colors v0.12.8
  [a93c6f00] DataFrames v1.2.2
  [864edb3b] DataStructures v0.18.10
  [b4f34e82] Distances v0.10.6
  [587475ba] Flux v0.12.8
  [78b212ba] Javis v0.7.1
  [add582a8] MLJ v0.16.11
  [91a5bcdd] Plots v1.23.4
  [321657f4] ScientificTypes v2.3.3
  [bd369af6] Tables v1.6.0

Any thoughts on what I may be missing or suggestions on how this workflow could be improved?

Thanks a lot in advance!!

@lfenzo Thanks for reporting. I may be able to help out with the MLJ Standardizer but it would help if you could construct for me a minimum working example - with explicit small dataset - that is tripping the inverse_transform call in your evaluate function. You can construct synthetic data using one of the tools documented here if that helps.

On an unrelated point, I may misunderstand, but you appear to be normalising your test data using a scaling learned from the test data, instead of applying the scaling you learn from train data to both test and train data, for better data hygiene.

That is, instead of

norm_xtrain = MLJ.transform(fit!(xtrain_std_mach), xtrain)
norm_xtest  = MLJ.transform(fit!(xtest_std_mach), xtest)

shouldn’t it be

fit!(xtrain_std_mach)
norm_xtrain = MLJ.transform(train_std_mach, xtrain)
norm_xtest  = MLJ.transform(train_std_mach, xtest)

?

@ablaom Thanks for such a quick reply!

Unfortunately I wasn’t able to reproduce this behavior without having to manually select the training instances from this dataset.

The good thing, though, is that I managed to solve problem #1 by just not scaling all the features in x. I took a look in describe(x) (something I should have done earlier) and noticed that most of the features had their values ranging from 0 to 1. Somehow that dataset was already preprocessed in that sense. As the NaN values propagate through operations, what I believe that happened is that because of this “second normalization” NaNs were being introduced in some features of the norm_xtest causing the prediction to be a vector of NaNs when the multiplications of inference were applied.

Now the problem seems a little easier: converting the transposed arrays created at preprocessing back to DataFrames in order to inverse_transform them inside evaluate() to get the predictions in the original scale (something I haven’t figured out how to do yet). Is there a better way to do this whole evaluation process?

On an unrelated point, I may misunderstand, but you appear to be normalising your test data using a scaling learned from the test data, instead of applying the scaling you learn from train data to both test and train data, for better data hygiene.
That is, instead of (…) shouldn’t it be (…) ?

Yes! As I was copying and pasting I didn’t pay attention to this and it was left untouched. (kinda embarrassing…) Thanks for spotting that!

Thank you!

You could try using MLJ for the entire pipeline by using MLJFlux and MLJ’s built-in evaluate! apparatus.

It looks like you’re doing regression, right? Then you can check out this Boston dataset example. In your case, the builder is defined by this code:

builder = MLJFlux.@builder Chain(
        Dense(n_in, 100, relu),
        Dense(100, 100, relu),
        Dense(100, 1, sigmoid)

The example also shows how to incorporate standardisation using an MLJ @pipeline.

The same repos also have classification examples.

If your using very large datasets (eg, images) that do not fit into memory, then MLJFlux is not currently an option. You might check out the FastAI.jl project, which is specific to “deep learning” (MLJ is multiparadigm).