Hi,
I have a problem while training on GPU vs CPU. Or when I just once run train function it goes completely off while on GPU. If I choose CPU it just stays “converge” and not go further away. An example of 3 runs on GPU. It just gets super off, and I don’t get that when doing function train2()
again it starts close where the previous ended. It just should not because I guess the variables are not GLOBAL.
When I run it as many times I want, it will always give a result similar to the first one since it just starts with different values but then converges somewhere. GPU just goes off.
julia> m, argsout, resout = train2(tr_data, te_data, modelfct;...);
[ Info: Training on GPU
[ Info: Dataset: 1600 train and 400 test examples
[ Info: Model: 1185 trainable params. Chain(Dense(1, 64, relu), Dense(64, 16, relu), Dropout(0.5), Dense(16, 1), #879)
[ Info: Start Training . . .
Epoch: 0 Train: 1.0487 Test: 43.2849
Epoch: 30 Train: 1.1291 Test: 1.1006
Epoch: 60 Train: 1.1291 Test: 1.1006
Epoch: 90 Train: 1.1291 Test: 1.1006
Epoch: 120 Train: 1.1291 Test: 1.1006
Epoch: 150 Train: 1.1291 Test: 1.1006
julia> m, argsout, resout = train2(tr_data, te_data, modelfct;...);
[ Info: Training on GPU
[ Info: Dataset: 1600 train and 400 test examples
[ Info: Model: 1185 trainable params. Chain(Dense(1, 64, relu), Dense(64, 16, relu), Dropout(0.5), Dense(16, 1), #879)
[ Info: Start Training . . .
Epoch: 0 Train: 1.1291 Test: 1.1006
Epoch: 30 Train: 1.1291 Test: 1.1006
Epoch: 60 Train: 1.1291 Test: 1.1006
Epoch: 90 Train: 1.1291 Test: 1.1006
Epoch: 120 Train: 131.7943 Test: 169.8546
Epoch: 150 Train: 46.2814 Test: 97.1106
julia> m, argsout, resout = train2(tr_data, te_data, modelfct;...);
[ Info: Training on GPU
[ Info: Dataset: 1600 train and 400 test examples
[ Info: Model: 1185 trainable params. Chain(Dense(1, 64, relu), Dense(64, 16, relu), Dropout(0.5), Dense(16, 1), #879)
[ Info: Start Training . . .
Epoch: 0 Train: 160.3743 Test: 143.1688
Epoch: 30 Train: 160.3743 Test: 152.41
Epoch: 60 Train: 116.2331 Test: 95.1814
Epoch: 90 Train: 107.7877 Test: 159.5695
Epoch: 120 Train: 105.0404 Test: 54.0994
Epoch: 150 Train: 99.2565 Test: 69.0809
Any ideas are more than welcomed.