It is a convention that the optimizers always minimize the target function. Therefore, in order to maximize the reward, you have to multiply it with -1.
1 Like
It is a convention that the optimizers always minimize the target function. Therefore, in order to maximize the reward, you have to multiply it with -1.