Target Variable is a Deterministic Function of Random Variables

Crossposting this from “New to Julia” as I received no responses.

Hi, I am new to Julia and struggling to formulate my model correctly. My model consists of:

  • A regressor x_t
  • An intermediate variable y_t = k * x_t + ϵ , ϵ ~ Normal(0, σ)

And finally a target variable z that is a function of y_t and its lags:

  • z_t = λ * y_t + λ^2 * y_(t-1) + λ^3 * y_(t-2) + …, λ ~ Uniform(0, 1)

The variables of interest are k, σ and λ.

My issue is that I am unable to tell Julia that z_t is distributed as above, because by the time I get to defining it, the terms are numbers instead of distributions.

It might be possible to re-parametrize and define z_t as a normally distributed variable with mean equal to z_t above, and some variance that is a function of σ and λ. However I’m not willing to attempt to find this function because I’m sure there is a better way.

Below is a version of my code that produces semi-reasonable code, using 4 lags. I’ve attempted to use Distributions.convolute() multiple times to combine distributions, but I’m not sure how correct this is, and again I’m sure there is a better way.

using Turing, StatsPlots, Random
Random.seed!(123)

function lag_transform(λ, y)
    # apply the lag transformation to y
    y_type = promote_type(eltype(λ), eltype(y))
    z = zeros(y_type, 0)

    for i in 1:length(y)
        powers = reverse([λ ^ k for k in range(1, stop=i)])
        
        previous_terms = zeros(y_type, 0)
        for j in 1:i
            append!(previous_terms, y[j] * powers[j])
        end

        current_z_val = sum(previous_terms)
        append!(z, current_z_val)
    end
    return z
end

# generate some test data
n_samples = 200
σ = 0.5
k = 2.3
λ = 0.7

x = 1:n_samples
y = [x_val * k + rand(Normal(0, σ)) for x_val in x]
z = lag_transform(λ, y)

@model function model_func(x, z)
    k ~ Uniform(0.5, 3)
    λ ~ Uniform(0.4, 1)
    σ ~ truncated(Normal(0, 1), 0, Inf)

    y = TArray([convert(Real, k * x_val) for x_val in x])

    powers = TArray([convert(Real, λ ^ i) for i in 1:5])

    for n in 5:length(y) # start on 5, because we use the previous 4 lags

        normals = [
            Normal(powers[1] * y[n], powers[1] * σ),
            Normal(powers[2] * y[n - 1], powers[2] * σ),
            Normal(powers[3] * y[n - 2], powers[3] * σ),
            Normal(powers[4] * y[n - 3], powers[4] * σ),
            Normal(powers[5] * y[n - 4], powers[5] * σ)
        ]

        z[n] ~ reduce(convolve, normals)
    end
end

model = model_func(x, z[5:end]) # dont pass all of z_t, because we use the previous 4 lags

chain = sample(model, NUTS(), 5000)
plot(chain)

I’ve also tried some other solutions (looking into @submodels, !addlogprob, etc.) but have been completely unsuccessful. Thanks!

1 Like

Apologies, z[n] ~ reduce(convolve, normals) should be z[n - 4] ~ reduce(convolve, normals).

I think the problem is in the model formulation. It looks like z_t = \lambda z_{t-1} + \lambda y_t and this looks like an AR(1) model. If this is the case then it is easy to find the distribution of z_t, namely z_t \sim Normal(\lambda (z_{t-1} + kx_t), \lambda \sigma).

Thanks for the response. Unfortunately the model isn’t z_t = \lambda z_{t-1} + \lambda y_t. The transformation I want to apply is the Adstock decay effect, where the decay function decreases geometrically: z_t = \lambda y_t + \lambda ^2y_{t-1} + \lambda ^ 3 y_{t-2} + \dots (don’t worry about the lack of normalization, I do this later).

Do I understand correctly that you believe z = f(SomeRandomStuff) + no_error

You will never get good sampling but another way to put this is

z ~ Dirac(f(SomeRandomStuff))

If this is the case, it’s similar to ABC methods where you have a simulation model and are trying to match data, but you can’t match it exactly so you have to tolerate some error. I’d suggest

z ~ Normal(f(SomeRandomStuff), epsilon)

where epsilon is small, now the size of epsilon is something you can tune to compromise between the Dirac type situation (where you will never get samples at all) and the situation where epsilon is large and you’ll get samples but most of them will be too far off your target to be acceptable.

1 Like

Yes, that’s correct. z = f(\text{random_variable1, random_variable2, ...}), with no error. If the values of the random variables are known, then the exact value of z is known.

Thanks for the suggestion, I’ve tried to use Dirac before and yes, the lack of proper sampling is an issue. Regarding z ~ Normal(f(SomeRandomStuff), epsilon), are you suggesting I should remove the error term \epsilon from the equation y_t = k * x_t + \epsilon, and instead add it into the z_t equation z_t = \text{Normal}(\text{lag_transform}(\lambda, y_t), \,\epsilon_\text{new})? This could potentially work, however knowing the actual distribution of the original \epsilon is desirable.

Otherwise, if you meant to add an extra \epsilon term to the model, I’m not sure how this would be helpful. Is the idea to give it a very small prior, so that now the sampling works as intended and the added \epsilon has a negligible impact on the estimation of the other parameters?

Basically this, let’s say Z is typically O(1) then give epsilon a prior with scale say O(0.01) and let it loose. The sampled distribution will be not perfectly the one you want but it will sample. You can also take a large sample and then adjust the sample after. There are techniques for such things.

Excellent, this is producing very decent results! Are there any resources that go deeper into this? I would like to know more about how to choose an appropriate epsilon prior, as well as the other technique you mentioned.

For reference, this is my working model now:

@model function model_func(x, z)
    k ~ Uniform(0.5, 3)
    λ ~ Uniform(0.4, 1)
    σ ~ truncated(Normal(0, 1), 0, Inf)
    ϵ ~ Uniform(0.0001, 0.001)

    y = [k * x_val + rand(Normal(0, σ)) for x_val in x]

    powers = [λ ^ i for i in 1:5]

    for n in 5:length(x)
        z_val::Float32 = 
            powers[1] * y[n] + powers[2] * y[n - 1] + powers[3] * y[n - 2] + powers[4] * y[n - 3] + powers[5] * y[n - 4]

        z[n-4] ~ Normal(z_val, ϵ)        
    end
end

You could take a look at the CRC Handbook of Approximate Bayesian Computation (Sisson, Fan, and Beaumont).

Glad this is working for you!

From the perspective of Bayes as continuous valued logic, the Normal(z_val,eps) likelihood is essentially saying that samples of z_val which are very close to z are very credible values, and as you get farther away the credibility drops a lot. So you can think of it as an approximation to the model where they are exactly equal, or you can think of it as exactly a model in which z-z_val differs by at most a very small amount. If you’re willing to take this last view, then there’s no adjustment required.

Great, thanks again!