What kind of distribution should I use for binary variable and others

RickandMortyforever · March 5, 2022, 4:11pm

Hi, Guys.

I have a question about building linear model

I’ve read some examples about linear model with continuos predictors.
But I don’t know how to set prior distributions on the binary(boolean) or categorcal or ordinal variables.

Let’s say I have dataframe x_train and y_train(IQ score).
x_train consists of 4 variables age, sex, self-esteem, favorite-fruit.

age is continuous variable, sex is binary, self-esteem is ordinal(1 to 5), fruit is categorical variable(1 to 6).

How should I build a linear model with non informative or weak priors?

@model function lin_reg(x, y)
	#priors
	α ~ Normal(mean(y), 10) # intercept
    σ ~ Exponential(1) #sigma
	beta1 ~ Normal(0, 10) #age
    beta2 ~ Bernoulli(0.5) # sex
    beta3 ~ DiscreteUniform(1, 5) # self-esteem
	beta4 ~ DiscreteUniform(1, 6) # fruit

	μ = α .+ beta1 * x[:, :age] .+ beta2 * x[:, :sex] .+ beta3 * x[:, :esteem] .+ beta4 * x[:, :fruit]
	
	y ~ MvNormal(μ, σ)
end

Thank you!

JesperMartinsson · March 5, 2022, 9:40pm

If x is binary or ordinal the effect of these on y (i.e. the betas) is often not binary or ordinal. Often the betas are continues and may have priors set accordingly. Try beta~uniform(-a,a) with a large a, or beta~normal(0,sigma) with a large sigma, or likewise and see how it goes.

Topic		Replies	Views
How to impose ordered constraints to model variables New to Julia question , package , turing	0	602	March 20, 2020
Simple multi-parameter bayesian linear regression returning un-expected results! Statistics statistics , turing , bayesian-inference	14	898	August 9, 2022
Linear model with categorical variable General Usage turing	2	386	April 29, 2022
Constrained ordered prior for ordered logistic model Probabilistic Programming turing , distributions , bayesian-inference	3	619	August 2, 2024
Why assign high variance for prior in bayesian linear regression? New to Julia question , statistics , ijulia	2	1491	January 29, 2022

What kind of distribution should I use for binary variable and others

Related topics