It is a convention that the optimizers always minimize the target function. Therefore, in order to maximize the reward, you have to multiply it with -1.
It is a convention that the optimizers always minimize the target function. Therefore, in order to maximize the reward, you have to multiply it with -1.