Hello,
I’m using Surrogates.jl
to build a surrogate model based on data in a CSV file. There are 24 input parameters and 1 output variables in my data. I have 500 samples. 80% samples are used for training set and other 20% for test.
using CSV
using DataFrames
using Plots
using Surrogates
using SurrogatesPolyChaos
using StatsPlots
using Statistics
default()
using Dates
using LinearAlgebra
using Lathe.preprocess: TrainTestSplit
df = DataFrame(CSV.File("result.csv"))
train, test = TrainTestSplit(df, .8)
dim = 24
x_train = train[:, 1:dim]
x_train = values.(eachrow(x_train))
y_train = train[:, end]
# y_train = values.(eachrow(y_train))
x_test = test[:, 1:dim]
x_test = values.(eachrow(x_test))
y_test = test[:, end]
lower_bound = [
200, 200, 200, 200, 200, 200,
200, 200, 200, 200, 200, 200,
180, 180, 180, 180, 180, 180,
300, 300, 300, 300, 300, 300
]
upper_bound = [
230, 230, 230, 230, 230, 230,
275, 275, 275, 275, 275, 275,
200, 200, 200, 200, 200, 200,
350, 350, 350, 350, 350, 350
]
p = ones(dim) * 2
theta = [0.5 / max(1e-6 * norm(upper_bound .- lower_bound), std(x_i[i] for x_i in x_train))^p[i] for i in 1:length(x_train[1])]
mymodel = Kriging(x_train, y_train, lower_bound, upper_bound, p=p, theta=theta)
# Prediction
ys_test = mymodel.(x_test)
ys_train = mymodel.(x_train)
# Model assessment criteria
function mae(x, y)
return sum(abs.(x - y)) / length(x)
end
function mape(x, y)
return sum(abs.(x - y)/y) / length(x)
end
function rmse(x, y)
return sqrt(sum(((x - y).^2) / length(x)))
end
function mse(x, y)
return sum(((x - y).^2) / length(x))
end
function r2(x,y)
sse = sum((x - y).^2)
sst = sum((y .- mean(y)).^2)
return 1 - sse / sst
end
println(" ASSESSMENT CRITERIA TRAIN TEST")
println(" Mean Absolute Error ", mae(ys_train, y_train), " ", mae(ys_test, y_test))
println("Mean Absolute Percentage Error ", mape(ys_train, y_train), " ", mape(ys_test, y_test))
println(" Root Mean Square Error ", rmse(ys_train, y_train), " ", rmse(ys_test, y_test))
println(" Mean Square Error ", mse(ys_train, y_train), " ", mse(ys_test, y_test))
println(" R Square ", r2(ys_train, y_train), " ", r2(ys_test, y_test))
ASSESSMENT CRITERIA TRAIN TEST
Mean Absolute Error 0.0 8.600517533587887
Mean Absolute Percentage Error 0.0 0.0032085499707193293
Root Mean Square Error 0.0 10.959428095549947
Mean Square Error 0.0 120.10906418152953
R Square 1.0 -0.018184443821864793
After calculation, I found there is overfitting with my model. Can someone help me? Thanks a lot!