How to apply a decay for learning rate?

Hello,

I am very new to Julia, so I apologize if I do not explain the problem correctly (feel free to ask me).

I am trying to use this solvers. For the exploration_policy I can use a decay with exploration_policy= EpsGreedyPolicy( MDP,LinearDecaySchedule(start=1.0, stop=0.01, steps=10000)) but when I use it for learning_rate::Float64 it says that it can`t convert to Float64 .

I saw ParameterSchedulers.jl but I do not know if I can use it and how.

Thank you :slight_smile:

Generally speaking, Flux works with Flaot32. Try learning_rate::Float32

Please provide the full command and error message. Ideally a minimal working example too.

This works for example
julia> exppolicy = EpsGreedyPolicy(mdp,LinearDecaySchedule(start=1.0, stop=0.01, steps=10000))
EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, Symbol}}(LinearDecaySchedule{Float64}(1.0, 0.01, 10000) (function of type LinearDecaySchedule{Float64})
  start: Float64 1.0
  stop: Float64 0.01
  steps: Int64 10000
, Random._GLOBAL_RNG(), (:up, :down, :left, :right))

julia> solver = QLearningSolver(exploration_policy=exppolicy, learning_rate=0.1, n_episodes=5000, max_episode_length=50, eval_every=50, n_eval_traj=100)
1 Like

Hello,

The decay is for the learning rate also

#Q-Learning solver
q_learning_solver = QLearningSolver(n_episodes=1000, 
                                max_episode_length = 1000,
                                learning_rate= LinearDecaySchedule(start=1.0, stop=0.0, steps=1000),
                                exploration_policy= EpsGreedyPolicy(mdp,LinearDecaySchedule(start=1.0, stop=0.0, steps=1000)), 
                                eval_every = 10000, 
                                n_eval_traj = 20, 
                                verbose=true) 

Error


ERROR: MethodError: Cannot `convert` an object of type LinearDecaySchedule{Float64} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::ColorTypes.Gray24) where T<:Real at C:\Users\X\.julia\packages\ColorTypes\6m8P7\src\conversions.jl:114
  convert(::Type{T}, ::ColorTypes.Gray) where T<:Real at C:\Users\X\.julia\packages\ColorTypes\6m8P7\src\conversions.jl:113
  convert(::Type{T}, ::Unitful.Gain) where T<:Real at C:\Users\X\.julia\packages\Unitful\SUQzL\src\logarithm.jl:62    
  ...
Stacktrace:
 [1] QLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}}(n_episodes::Int64, max_episode_length::Int64, learning_rate::Function, exploration_policy::EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}, Q_vals::Nothing, eval_every::Int64, n_eval_traj::Int64, rng::Random._GLOBAL_RNG, verbose::Bool)
   @ TabularTDLearning C:\Users\X\.julia\packages\Parameters\MK0O4\src\Parameters.jl:503
 [2] QLearningSolver(n_episodes::Int64, max_episode_length::Int64, learning_rate::Function, exploration_policy::EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}, Q_vals::Nothing, eval_every::Int64, n_eval_traj::Int64, rng::Random._GLOBAL_RNG, verbose::Bool)
   @ TabularTDLearning C:\Users\X\.julia\packages\Parameters\MK0O4\src\Parameters.jl:526
 [3] QLearningSolver(; n_episodes::Int64, max_episode_length::Int64, learning_rate::Function, exploration_policy::EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}, Q_vals::Nothing, eval_every::Int64, n_eval_traj::Int64, rng::Random._GLOBAL_RNG, verbose::Bool)
   @ TabularTDLearning C:\Users\X\.julia\packages\Parameters\MK0O4\src\Parameters.jl:545
 [4] top-level scope
   @ c:\Users\X\Desktop\X\X\VS Code Projects\Algortihms Test\X\X\X_v2.jl:103

It appears this is currently not possible. Look at

The solver assumes the learning_rate parameter to be a number, not a function which you are trying to pass with.

I guess it not too difficult to make it work. Might be worth opening an issue with TabularTDLearning.jl.

Maybe you could run the solver for fewer episodes, adjust the learning rate, restart the solve, and so on.

Is it sensible? I don’t have enough practical knowledge of POMDPs to answer that, but here is a SE question in that direction

https://ai.stackexchange.com/questions/12268/in-q-learning-shouldnt-the-learning-rate-change-dynamically-during-the-learnin

Hello,

Thank you so much for your help, I will take a look to it!

Thanks :slight_smile: