Hi all,
We are building an application of ML using MLJ and LightGBM (for the ML part) to perform some forecasts on complex time series (for a quite critical business need, in production with daily use)
We have a blocking issue at the training stage of the models tuning (about one hundred models based on roughly 50k obs each and 10 features, tuned on a relatively small hyper params grids, 768 combinations).
The associated memory usage is way too high and breaks the servers (virtual machines).
It is high for a single tuning, but also keeps growing rapidly over the batches of tunings (for the different models).
We took care of not caching models in MLJ (in machines and during the tuning definition).
To perform some diagnostics and ask for you help , here (below) is a minimal working example with two different tests that show the problems of memory usage.
-
FIRST TEST: the script performs a simple hyper parameters search. RAM usage jumps by 9 G
-
SECOND TEST: the script performs multiples batches of tunings. RAM usage jumps by 35 G
What are the reasons of such the memory usage for one single tuning and for the sequence of tunings? Most memory usage should be released after individual train and after each tuning? What am I missing here to solve it?
# LOAD DEPENDENCIES ############################################################
using Pkg
Pkg.activate(".")
using MLJ
using Random
using DataFrames
using LightGBM
# My Project.toml file
# Status `~/Project.toml`
# [a93c6f00] DataFrames v1.6.1
# [7acf609c] LightGBM v0.6.2
# [add582a8] MLJ v0.20.3
# [03970b2e] MLJTuning v0.8.4
# [9a3f8284] Random
# With Julia 1.10.2 on Ubuntu 23.10
# SOME REPRESENTATIVE TEST DATA ################################################
# A 10 ten column dataframe of regressors (continuous vars)
df = DataFrame(
map(x -> rand(50_000), 1:10),
["x_$i" for i in 1:10]
)
# A target var that is the sum of the coumns plus a random Gaussian noise
DataFrames.transform!(df, AsTable(1:10) => (x -> sum(x) + randn(50_000)) => :y)
# PREPARATION OF THE MODEL, PIPELINE AND BASIC HYPER PARAMS TUNING STRATEGY ####
# We can pick small grid or more granular grids with the grid resolution param
function prepare_gb_tuned_model(grid_resolution::Int64)
# Get LightGBM.MLJInterface.LGBMRegressor
Tree = LightGBM.MLJInterface.LGBMRegressor
# Instantiate the model
ml_model = Tree()
# Pipeline definition
pipe = OneHotEncoder(ordered_factor=false) |> ml_model # In my project I would have preprocessing in the pipeline (for categorical variables)
# Ranges for the grid
# LGBMRegressor
ml_ranges = [
range(pipe, :(lgbm_regressor.num_iterations), lower=50, upper=300),
range(pipe, :(lgbm_regressor.max_depth), lower=5, upper=20),
range(pipe, :(lgbm_regressor.feature_fraction), values=[0.7, 0.85, 1.0]),
range(pipe, :(lgbm_regressor.learning_rate), lower=0.02, upper=0.2, scale=:log),
range(pipe, :(lgbm_regressor.min_data_in_leaf), lower=10, upper=30, scale=:log),
]
# Tuning strategy
tuned_model = TunedModel(
model=pipe,
tuning=Grid(resolution=grid_resolution),
ranges=ml_ranges,
measure=mae,
train_best=true,
cache=false # to make sure I reduce the memory footprint
)
return tuned_model
end
mach = machine(
prepare_gb_tuned_model(4), # A grid 768 params sets
select(df, :y),
df[: ,end],
cache=false # to make sure I reduce the memory footprint
)
# RUN THE TRAIN AND SEARCH FOR HYPER PARAMS ####################################
# FIRST TEST ===================================================================
# MEMORY STATUS
# free -h
# total used free shared buff/cache available
# Mem: 62Gi 8.0Gi 53Gi 2.0Gi 3.9Gi 54Gi
MLJ.evaluate!(
mach,
resampling=CV(nfolds=3),
measure=mae,
acceleration=CPU1(),
verbosity=2,
# Do not record all obs-pred comparison
per_observation=false # it may reduce the memory footprint
)
# MEMORY STATUS
# total used free shared buff/cache available
# Mem: 62Gi 17Gi 43Gi 2.0Gi 4.0Gi 45Gi
# => Memory footprint of 9 G
# SECOND TEST ==================================================================
# MEMORY STATUS
# free -h
# total used free shared buff/cache available
# Mem: 62Gi 17Gi 43Gi 2.0Gi 4.0Gi 45Gi
# => Repeat it 5 times with a map
outputs = map(
x -> MLJ.evaluate!(
mach,
resampling=CV(nfolds=3),
measure=mae,
acceleration=CPU1(),
verbosity=2,
# Do not record all obs-pred comparison
per_observation=false # it may reduce the memory footprint
),
1:5 # do it 5 times
)
# MEMORY STATUS
# free -h
# total used free shared buff/cache available
# Mem: 62Gi 52Gi 8.3Gi 2.0Gi 4.1Gi 9.6Gi
# => Memory footprint of 35 G