Suggestions for improved learning in neural network based controller for rotary inverted pendulum

Hello,

I am new to Julia, and trying to make use of neural network based controllers for robotics applications. Here I am trying balance an rotary inverted pendulum using neural network based controller. The code is given below. Note that I am importing all the parameters from a python object here.

using PyCall
s = pyimport("experiments.STR74.RotInvPend_Settings")
ss = s.settings


using DiffEqFlux,
      DifferentialEquations,
      Flux,
      Plots,
      Interpolations,
      DataFrames,
      CSV,
      JSON,
      Dates,
      Dierckx,
      FiniteDifferences,
      Optim,
      Statistics,
      DiffEqSensitivity,
      Distributions,
      ODEInterfaceDiffEq

function ri_pendulum(du, u, p, t)
    
    alpha, beta, alpha_dot, beta_dot, current, voltage, T_D = u

    du[1] = alpha_dot

    du[2] = beta_dot

    du[3] = ((ss.Ixz_2 - ss.m_2*(ss.l_1 + ss.x_m2)*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)))*(ss.Iyy_2*(alpha_dot^2)*sin(2*beta) + 2*ss.Iyz_2*(alpha_dot^2)*cos(2*beta) - ss.Izz_2*alpha_dot^2*sin(2*beta) + 2*T_D - alpha_dot^2*ss.m_2*ss.y_m2^2*sin(2*beta) + 2*alpha_dot^2*ss.m_2*ss.y_m2*ss.z_m2*cos(2*beta) + alpha_dot^2*ss.m_2*ss.z_m2^2*sin(2*beta) - 2*beta_dot*ss.d_damp2 - 2*ss.g_0*ss.m_2*(ss.y_m2*cos(beta) - ss.z_m2*sin(beta))) - 2*(ss.Ixx_2 + ss.m_2*ss.y_m2^2 + ss.m_2*ss.z_m2^2)*(-alpha_dot*ss.d_damp1 + beta_dot*ss.m_2*(2*alpha_dot*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)) + beta_dot*(ss.l_1 + ss.x_m2))*(ss.y_m2*cos(beta) - ss.z_m2*sin(beta)) + current*ss.k_t))/(2*((ss.Ixz_2 - ss.m_2*(ss.l_1 + ss.x_m2)*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)))*(ss.Ixy_2*sin(beta) + ss.Ixz_2*cos(beta) - ss.l_1*ss.m_2*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)) + ss.m_2*ss.x_m2*ss.y_m2*sin(beta) + ss.m_2*ss.x_m2*ss.z_m2*cos(beta)) - (ss.Ixx_2 + ss.m_2*ss.y_m2^2 + ss.m_2*ss.z_m2^2)*(ss.Izz_1 + ss.Izz_2 + ss.m_1*(ss.x_m1^2 + ss.y_m1^2) + ss.m_2*((ss.l_1 + ss.x_m2)^2 + (ss.y_m2*cos(beta) - ss.z_m2*sin(beta))^2))))

    du[4] = ((-alpha_dot*ss.d_damp1 + beta_dot*ss.m_2*(2*alpha_dot*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)) + beta_dot*(ss.l_1 + ss.x_m2))*(ss.y_m2*cos(beta) - ss.z_m2*sin(beta)) + current*ss.k_t)*(ss.Ixy_2*sin(beta) + ss.Ixz_2*cos(beta) - ss.l_1*ss.m_2*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)) + ss.m_2*ss.x_m2*ss.y_m2*sin(beta) + ss.m_2*ss.x_m2*ss.z_m2*cos(beta)) - (ss.Izz_1 + ss.Izz_2 + ss.m_1*(ss.x_m1^2 + ss.y_m1^2) + ss.m_2*((ss.l_1 + ss.x_m2)^2 + (ss.y_m2*cos(beta) - ss.z_m2*sin(beta))^2))*(ss.Iyy_2*alpha_dot^2*sin(2*beta) + 2*ss.Iyz_2*alpha_dot^2*cos(2*beta) - ss.Izz_2*alpha_dot^2*sin(2*beta) + 2*T_D - alpha_dot^2*ss.m_2*ss.y_m2^2*sin(2*beta) + 2*alpha_dot^2*ss.m_2*ss.y_m2*ss.z_m2*cos(2*beta) + alpha_dot^2*ss.m_2*ss.z_m2^2*sin(2*beta) - 2*beta_dot*ss.d_damp2 - 2*ss.g_0*ss.m_2*(ss.y_m2*cos(beta) - ss.z_m2*sin(beta)))/2)/((ss.Ixz_2 - ss.m_2*(ss.l_1 + ss.x_m2)*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)))*(ss.Ixy_2*sin(beta) + ss.Ixz_2*cos(beta) - ss.l_1*ss.m_2*(ss.y_m2*sin(beta) + ss.z_m2*cos(beta)) + ss.m_2*ss.x_m2*ss.y_m2*sin(beta) + ss.m_2*ss.x_m2*ss.z_m2*cos(beta)) - (ss.Ixx_2 + ss.m_2*ss.y_m2^2 + ss.m_2*ss.z_m2^2)*(ss.Izz_1 + ss.Izz_2 + ss.m_1*(ss.x_m1^2 + ss.y_m1^2) + ss.m_2*((ss.l_1 + ss.x_m2)^2 + (ss.y_m2*cos(beta) - ss.z_m2*sin(beta))^2)))

    du[5] = (-ss.R*current + voltage - alpha_dot*ss.k_e)/ss.L

end

controller = FastChain((x, p) -> x, FastDense(4, 32, tanh), FastDense(32, 16, tanh), FastDense(16, 1))
nn_weights = initial_params(controller)

# Problem formulation

u0 = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
params  = nn_weights
tspan = (0.0, 5)
dt = 0.01
tsteps = tspan[1]:dt:tspan[2]


prob = ODEProblem(ri_pendulum, u0, tspan, params)

# Controller callback
timesteps = collect(tsteps)
condition(u,t,integrator) = t ∈ timesteps

function control_loop!(integrator)

    voltage = controller([integrator.u[1], integrator.u[2], integrator.u[3], integrator.u[4]], integrator.p)[1]
    if abs(voltage) >= 12
        voltage = 12* (abs(voltage)/voltage)
    end

    integrator.u[6] = voltage
    integrator.u[7] = 0

end

cb_controller = DiscreteCallback(condition, control_loop!)


function predict_neuralode(p)
    tmp_prob = remake(prob, p = p)
    solve(tmp_prob, Tsit5(), saveat = tsteps, callback=cb_controller, tstops=timesteps, sensealg = ReverseDiffAdjoint())
end

function loss_neuralode(p)
    pred = predict_neuralode(p)

    beta = pred[2,:]
    voltage = pred[6,:]
    loss = sum((beta.%(2*pi).-pi).^2)
    
    return loss, pred
end

loss, pred = loss_neuralode(nn_weights)


index = 0

callback = function (p, loss, pred)
    global index += 1
  
  
    # ouput every few epochs
    if index % 50 == 0
     
      println("loss:", loss)
      display(plot(pred.t, rad2deg.(pred[2,:]), label = ["beta"]))
      display(plot(pred.t, pred[6,:], label = ["voltage"]))

    end
  
    return false
  
  end

result = DiffEqFlux.sciml_train(
  loss_neuralode,
  nn_weights,
  ADAM(0.01),
  cb = callback,
  maxiters = 1500,
  save_best=true
)

Here a voltage based motor is used for controlling the system with a frequency of 100hz. The problem is it starting to learn but not well enough to rotary inverted pendulum to balance. Any suggestions for learning improvements is highly appreciated.

Hello,

You are not penalizing control effort, this will lead to an extremely aggressive controller that likely will perform poorly or fail to stabilize a physical process.

2 Likes

Thanks for the suggestion. I will penalise the control effort. The main problem is it taking too much time for learning to reduce the loss (means the pendulum to get balanced).