How to apply a decay for learning rate?

vamp · May 15, 2022, 5:54pm

Hello,

I am very new to Julia, so I apologize if I do not explain the problem correctly (feel free to ask me).

I am trying to use this solvers. For the exploration_policy I can use a decay with exploration_policy= EpsGreedyPolicy( MDP,LinearDecaySchedule(start=1.0, stop=0.01, steps=10000)) but when I use it for learning_rate::Float64 it says that it can`t convert to Float64 .

I saw ParameterSchedulers.jl but I do not know if I can use it and how.

Thank you

khorrami1 · May 15, 2022, 9:39pm

Generally speaking, Flux works with Flaot32. Try learning_rate::Float32

skleinbo · May 16, 2022, 6:00am

Please provide the full command and error message. Ideally a minimal working example too.

This works for example

julia> exppolicy = EpsGreedyPolicy(mdp,LinearDecaySchedule(start=1.0, stop=0.01, steps=10000))
EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, Symbol}}(LinearDecaySchedule{Float64}(1.0, 0.01, 10000) (function of type LinearDecaySchedule{Float64})
  start: Float64 1.0
  stop: Float64 0.01
  steps: Int64 10000
, Random._GLOBAL_RNG(), (:up, :down, :left, :right))

julia> solver = QLearningSolver(exploration_policy=exppolicy, learning_rate=0.1, n_episodes=5000, max_episode_length=50, eval_every=50, n_eval_traj=100)

vamp · May 16, 2022, 6:42am

Hello,

The decay is for the learning rate also

#Q-Learning solver
q_learning_solver = QLearningSolver(n_episodes=1000, 
                                max_episode_length = 1000,
                                learning_rate= LinearDecaySchedule(start=1.0, stop=0.0, steps=1000),
                                exploration_policy= EpsGreedyPolicy(mdp,LinearDecaySchedule(start=1.0, stop=0.0, steps=1000)), 
                                eval_every = 10000, 
                                n_eval_traj = 20, 
                                verbose=true)

Error


ERROR: MethodError: Cannot `convert` an object of type LinearDecaySchedule{Float64} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::ColorTypes.Gray24) where T<:Real at C:\Users\X\.julia\packages\ColorTypes\6m8P7\src\conversions.jl:114
  convert(::Type{T}, ::ColorTypes.Gray) where T<:Real at C:\Users\X\.julia\packages\ColorTypes\6m8P7\src\conversions.jl:113
  convert(::Type{T}, ::Unitful.Gain) where T<:Real at C:\Users\X\.julia\packages\Unitful\SUQzL\src\logarithm.jl:62    
  ...
Stacktrace:
 [1] QLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}}(n_episodes::Int64, max_episode_length::Int64, learning_rate::Function, exploration_policy::EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}, Q_vals::Nothing, eval_every::Int64, n_eval_traj::Int64, rng::Random._GLOBAL_RNG, verbose::Bool)
   @ TabularTDLearning C:\Users\X\.julia\packages\Parameters\MK0O4\src\Parameters.jl:503
 [2] QLearningSolver(n_episodes::Int64, max_episode_length::Int64, learning_rate::Function, exploration_policy::EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}, Q_vals::Nothing, eval_every::Int64, n_eval_traj::Int64, rng::Random._GLOBAL_RNG, verbose::Bool)
   @ TabularTDLearning C:\Users\X\.julia\packages\Parameters\MK0O4\src\Parameters.jl:526
 [3] QLearningSolver(; n_episodes::Int64, max_episode_length::Int64, learning_rate::Function, exploration_policy::EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, Vector{Action}}, Q_vals::Nothing, eval_every::Int64, n_eval_traj::Int64, rng::Random._GLOBAL_RNG, verbose::Bool)
   @ TabularTDLearning C:\Users\X\.julia\packages\Parameters\MK0O4\src\Parameters.jl:545
 [4] top-level scope
   @ c:\Users\X\Desktop\X\X\VS Code Projects\Algortihms Test\X\X\X_v2.jl:103

skleinbo · May 16, 2022, 7:13am

It appears this is currently not possible. Look at

github.com

JuliaPOMDP/TabularTDLearning.jl/blob/3630ddfdb1bac2e95912fa9648e14d6fb6831563/src/q_learn.jl#L62


      
          for i = 1:solver.n_episodes
              s = rand(rng, initialstate(mdp))
              t = 0
              while !isterminal(mdp, s) && t < solver.max_episode_length
                  a = action(exploration_policy, on_policy, k, s)
                  k += 1
                  sp, r = @gen(:sp, :r)(mdp, s, a, rng)
                  si = stateindex(mdp, s)
                  ai = actionindex(mdp, a)
                  spi = stateindex(mdp, sp)
                  Q[si, ai] += solver.learning_rate * (r + discount(mdp) * maximum(Q[spi, :]) - Q[si,ai])
                  s = sp
                  t += 1
              end
              if i % solver.eval_every == 0
                  r_tot = 0.0
                  for traj in 1:solver.n_eval_traj
                      r_tot += simulate(sim, mdp, on_policy, rand(rng, initialstate(mdp)))
                  end
                  solver.verbose ? println("On Iteration $i, Returns: $(r_tot/solver.n_eval_traj)") : nothing
              end

The solver assumes the learning_rate parameter to be a number, not a function which you are trying to pass with.

I guess it not too difficult to make it work. Might be worth opening an issue with TabularTDLearning.jl.

Maybe you could run the solver for fewer episodes, adjust the learning rate, restart the solve, and so on.

Is it sensible? I don’t have enough practical knowledge of POMDPs to answer that, but here is a SE question in that direction

https://ai.stackexchange.com/questions/12268/in-q-learning-shouldnt-the-learning-rate-change-dynamically-during-the-learnin

vamp · May 16, 2022, 7:18am

Hello,

Thank you so much for your help, I will take a look to it!

Thanks

Topic		Replies	Views
How to update learning rate during Flux training in a better manner? New to Julia flux	7	2399	December 23, 2023
Learning rate scheduler with the new interface of Flux Machine Learning flux	4	1077	December 23, 2023
ParameterSchedulers causing error with Flux.update! Machine Learning	0	89	May 15, 2024
Learning rate decay in callback function Machine Learning question , lux	3	488	January 11, 2024
Implementing the Learn rate scheduling in the NeuralPDE julia package New to Julia question	5	250	November 21, 2023

How to apply a decay for learning rate?

Related topics