Decision-dependent randomness in SDDP.jl

Hi everyone,
I’m quite new to Julia and SDDP and may ask a silly question.

I’m trying to optimize the preventive replacement of some devices over several stages. I assume that know the lifetime distribution of the devices, which suffer from aging. Aging means that when the device gets older, it becomes more and more prone to failure (eg weibull distribution but not exponential). The randomness here represents the failure of the device (eg 1 if we observe a failure, 0 if the device survives).
I tried to figure out how to represent this randomness in that problem, and as the device suffers from aging, I cannot assume that the randomness is stagewise independent. Therefore I’m thinking of a markovian representation by considering the 2d random element (failure, age of the device). This leads to a rather large (but that’s another problem) but very sparse transition matrix that evolves with time to reflect aging. There’s one problem with that representation : the transition matrix at stage t not only changes with time but it depends on the decision taken at stage t-1. Indeed, if I replace my device at stage t-1, I’m now dealing with a fresh new device with the corresponding transition matrix (encouraging survival until the next step and, as a direct consequence, incrementing the age of the device by one) whereas if I choose to keep my device in place, the transition matrix keeps on “aging”.

Here is my question : does SDDP.jl handle decision-dependent randomness ? or is there a more clever way to formulate the problem ? Should I rely on other algorithms to solve my problem ?

Thanks and sorry if the description of the problem is rather vague.

does SDDP.jl handle decision-dependent randomness?

Nope.

I often find that “decision-dependent” really just means that there is some part of the problem that you are not modeling.

I would reframe the question to a decision-independent problem, where the transition is not a failure/age, but that you are measuring the wear of the device, and that it cannot be used past a certain level of wear.

The wear likely happens independently, or depends on your actions, so the wear can be a state variable in the model.

Whether this approach makes sense depends on the specifics of the problem, so it’s hard to say more without more detail.

You’ll probably find that you end up with a non-convexity somewhere, and that SDDP.jl cannot find a provably optimal policy (but it can still find a feasible policy). That might be sufficient for you, or you might need to look into some heuristic policies.

Thanks for your reply.
Actually there might be a wear but it’s not observable in my application. The only observation of a kind of wear we have is the age of the device. I could certainly consider it as a state variable but still, the probability of failure depends on the decision : if I decide to replace the device, there’s a kind of reset on the aging process.
I found an Informs paper using a heuristic based on progressive hedging. But it’s not sddp related anymore.

Thanks again

What is your model for failure against age?

For the sake of clarity let’s consider a system of devices i = 1, …, I, a set of stages or timestamps t = 1,…,T and define x[i,t] a binary variable that is equal to 1 if we choose to replace the device i at stage t.To give a better picture of the problem, there are constraints on the total cost of replacements per stage and we can mutualise some of the costs by replacing several devices at the same stage (formulation not detailed here). The objective is the total cost over a finite horizon : at stage t, we can choose a preventive replacement (x[i,t] = 0) which has a given known cost or keep the current device in place. However, if the device fails (randomly), we need to replace it and incur in the same time a much higher cost. The goal is to find a trade off between preventive replacements that occurs too early (hence too often) and too many replacement on failures over the horizon that would occur less often but at a prohibitive cost.
To answer your question, the failure model is rather simple : if the age is s, the probability of failure p_s, for s in [0, smax], increasing function of s, with p_smax = 1. Now I realize that given the age state s, we have an independent sequence of bernouilli random variables. However, if for some stage t, we decide to replace the device (x[i,t] = 1 or failure occurrence), then we reset the age state to 0 and the probability distribution to p_0. Actually, between 2 replacement/failures, the state equation is linear and the randomness is exogenous (although not stationnary), but when we replace/incur a failure, there’s a “nonlinear branching” in the state and in the randomness.

Does that help ?

I’d model your problem as something like this. You’ll probably need to tweak things, but it should point you in the right direction.

using SDDP
model = SDDP.LinearPolicyGraph(;
    stages = 10
    sense = :Min,
) do sp, t
    c_age_max = 10
    @variable(sp, 0 <= x_age <= c_age_max, Int)
    @variable(sp, u_buy, Bin)
    @variable(sp, u_force, Bin)
    @variable(sp, u_reset >= 0)
    @constraint(sp, x_age.out == x_age.in + 1 - u_reset)
    @constraint(sp, u_reset <= c_age_max * (u_buy + u_force))
    @staggeobjective(sp, u_buy + 10 * u_force)
    # Given ω ~ U[0, 1]
    # Want to model:
    #     if ω < p_s(x_age.in)
    #         u_force = 1
    #     end
    # Assume p_s(s) = x_age.in / c_age_max
    #     if ω < x_age.in / c_age_max
    #         u_force = 1
    #     end
    # becomes
    #     if ω * c_age_max < x_age.in
    #         u_force = 1
    #     end
    # becomes
    #     x_age.in - ω * c_age_max <= c_age_max * u_force
    # becomes
    #     x_age.in - c_age_max * u_force <= ω * c_age_max
    # Why?
    #  If x_age.in > ω * c_age_max, then the LHS is > 0, and so RHS must be
    #  > 0, and so u_force = 1.
    #  If x_age.in <= ω * c_age_max, then LHS is <= 0, and so RHS can be
    #  annything, but we'd rather not do `u_force` if we can help, so it
    #  will take u_force = 0.
    Ω = 0.0:0.1:1.0
    @constraint(sp, con_age, x_age.in - c_age_max * u_force <= 1.0 * c_age_max)
    SDDP.parameterize(sp, Ω) do ω
        set_normalized_rhs(con_age, ω * c_age_max)
        return
    end
    return
end

I didn’t try to execute it, so there might be typos, but you get the idea. Also, because of the integrality, SDDP.jl can find only a locally optimal policy. So there are no guarantees on the quality of solution it will converge to (it might be arbitrarily bad).

You’re probably better off tuning some decision rule, such as “buy new once the age is X”. Warren Powell’s book might be a good read if you haven’t already: Sequential Decision Analytics and Modeling – CASTLE

Thanks a lot, I’ll give it a try.