Target Variable is a Deterministic Function of Random Variables

Stephen2 · November 23, 2021, 10:50am

Crossposting this from “New to Julia” as I received no responses.

Hi, I am new to Julia and struggling to formulate my model correctly. My model consists of:

A regressor x_t
An intermediate variable y_t = k * x_t + ϵ , ϵ ~ Normal(0, σ)

And finally a target variable z that is a function of y_t and its lags:

z_t = λ * y_t + λ^2 * y_(t-1) + λ^3 * y_(t-2) + …, λ ~ Uniform(0, 1)

The variables of interest are k, σ and λ.

My issue is that I am unable to tell Julia that z_t is distributed as above, because by the time I get to defining it, the terms are numbers instead of distributions.

It might be possible to re-parametrize and define z_t as a normally distributed variable with mean equal to z_t above, and some variance that is a function of σ and λ. However I’m not willing to attempt to find this function because I’m sure there is a better way.

Below is a version of my code that produces semi-reasonable code, using 4 lags. I’ve attempted to use Distributions.convolute() multiple times to combine distributions, but I’m not sure how correct this is, and again I’m sure there is a better way.

using Turing, StatsPlots, Random
Random.seed!(123)

function lag_transform(λ, y)
    # apply the lag transformation to y
    y_type = promote_type(eltype(λ), eltype(y))
    z = zeros(y_type, 0)

    for i in 1:length(y)
        powers = reverse([λ ^ k for k in range(1, stop=i)])
        
        previous_terms = zeros(y_type, 0)
        for j in 1:i
            append!(previous_terms, y[j] * powers[j])
        end

        current_z_val = sum(previous_terms)
        append!(z, current_z_val)
    end
    return z
end

# generate some test data
n_samples = 200
σ = 0.5
k = 2.3
λ = 0.7

x = 1:n_samples
y = [x_val * k + rand(Normal(0, σ)) for x_val in x]
z = lag_transform(λ, y)

@model function model_func(x, z)
    k ~ Uniform(0.5, 3)
    λ ~ Uniform(0.4, 1)
    σ ~ truncated(Normal(0, 1), 0, Inf)

    y = TArray([convert(Real, k * x_val) for x_val in x])

    powers = TArray([convert(Real, λ ^ i) for i in 1:5])

    for n in 5:length(y) # start on 5, because we use the previous 4 lags

        normals = [
            Normal(powers[1] * y[n], powers[1] * σ),
            Normal(powers[2] * y[n - 1], powers[2] * σ),
            Normal(powers[3] * y[n - 2], powers[3] * σ),
            Normal(powers[4] * y[n - 3], powers[4] * σ),
            Normal(powers[5] * y[n - 4], powers[5] * σ)
        ]

        z[n] ~ reduce(convolve, normals)
    end
end

model = model_func(x, z[5:end]) # dont pass all of z_t, because we use the previous 4 lags

chain = sample(model, NUTS(), 5000)
plot(chain)

I’ve also tried some other solutions (looking into @submodels, !addlogprob, etc.) but have been completely unsuccessful. Thanks!

Stephen2 · November 24, 2021, 11:29am

Apologies, z[n] ~ reduce(convolve, normals) should be z[n - 4] ~ reduce(convolve, normals).

mikkoku · November 24, 2021, 2:34pm

I think the problem is in the model formulation. It looks like z_t = \lambda z_{t-1} + \lambda y_t and this looks like an AR(1) model. If this is the case then it is easy to find the distribution of z_t, namely z_t \sim Normal(\lambda (z_{t-1} + kx_t), \lambda \sigma).

Stephen2 · November 24, 2021, 4:20pm

Thanks for the response. Unfortunately the model isn’t z_t = \lambda z_{t-1} + \lambda y_t. The transformation I want to apply is the Adstock decay effect, where the decay function decreases geometrically: z_t = \lambda y_t + \lambda ^2y_{t-1} + \lambda ^ 3 y_{t-2} + \dots (don’t worry about the lack of normalization, I do this later).

dlakelan · November 24, 2021, 5:43pm

Do I understand correctly that you believe z = f(SomeRandomStuff) + no_error

You will never get good sampling but another way to put this is

z ~ Dirac(f(SomeRandomStuff))

If this is the case, it’s similar to ABC methods where you have a simulation model and are trying to match data, but you can’t match it exactly so you have to tolerate some error. I’d suggest

z ~ Normal(f(SomeRandomStuff), epsilon)

where epsilon is small, now the size of epsilon is something you can tune to compromise between the Dirac type situation (where you will never get samples at all) and the situation where epsilon is large and you’ll get samples but most of them will be too far off your target to be acceptable.

Stephen2 · November 25, 2021, 3:43pm

Yes, that’s correct. z = f(\text{random_variable1, random_variable2, ...}), with no error. If the values of the random variables are known, then the exact value of z is known.

Thanks for the suggestion, I’ve tried to use Dirac before and yes, the lack of proper sampling is an issue. Regarding z ~ Normal(f(SomeRandomStuff), epsilon), are you suggesting I should remove the error term \epsilon from the equation y_t = k * x_t + \epsilon, and instead add it into the z_t equation z_t = \text{Normal}(\text{lag_transform}(\lambda, y_t), \,\epsilon_\text{new})? This could potentially work, however knowing the actual distribution of the original \epsilon is desirable.

Otherwise, if you meant to add an extra \epsilon term to the model, I’m not sure how this would be helpful. Is the idea to give it a very small prior, so that now the sampling works as intended and the added \epsilon has a negligible impact on the estimation of the other parameters?

dlakelan · November 25, 2021, 4:09pm

Basically this, let’s say Z is typically O(1) then give epsilon a prior with scale say O(0.01) and let it loose. The sampled distribution will be not perfectly the one you want but it will sample. You can also take a large sample and then adjust the sample after. There are techniques for such things.

Stephen2 · November 29, 2021, 2:47pm

Excellent, this is producing very decent results! Are there any resources that go deeper into this? I would like to know more about how to choose an appropriate epsilon prior, as well as the other technique you mentioned.

For reference, this is my working model now:

@model function model_func(x, z)
    k ~ Uniform(0.5, 3)
    λ ~ Uniform(0.4, 1)
    σ ~ truncated(Normal(0, 1), 0, Inf)
    ϵ ~ Uniform(0.0001, 0.001)

    y = [k * x_val + rand(Normal(0, σ)) for x_val in x]

    powers = [λ ^ i for i in 1:5]

    for n in 5:length(x)
        z_val::Float32 = 
            powers[1] * y[n] + powers[2] * y[n - 1] + powers[3] * y[n - 2] + powers[4] * y[n - 3] + powers[5] * y[n - 4]

        z[n-4] ~ Normal(z_val, ϵ)        
    end
end

dlakelan · November 29, 2021, 3:18pm

You could take a look at the CRC Handbook of Approximate Bayesian Computation (Sisson, Fan, and Beaumont).

Glad this is working for you!

From the perspective of Bayes as continuous valued logic, the Normal(z_val,eps) likelihood is essentially saying that samples of z_val which are very close to z are very credible values, and as you get farther away the credibility drops a lot. So you can think of it as an approximation to the model where they are exactly equal, or you can think of it as exactly a model in which z-z_val differs by at most a very small amount. If you’re willing to take this last view, then there’s no adjustment required.

Stephen2 · November 29, 2021, 4:48pm

Great, thanks again!

Topic		Replies	Views
Turing: Target Variable is a Deterministic Function of Random Variables New to Julia turing , distributions , bayesian-inference	0	529	November 19, 2021
Question on Bayesian Inference with Julia, using Turing and/or AdvancedHMC and/or Distributions Statistics question	11	1963	March 11, 2020
Numerical errors in logit normal model using Turing.jl Probabilistic Programming question , turing	27	3831	November 9, 2019
Some question about the structure of Turing models Probabilistic Programming question , turing	25	2884	November 6, 2020
Turning the target function for M-H into a distribution General Usage	0	214	August 3, 2021

Target Variable is a Deterministic Function of Random Variables

Related topics