Bell shaped?

rocco_sprmnt21 · September 28, 2023, 6:13pm

Not a specific question about julia, but I’m sure someone can help me get a correct idea about the following problem.

I know that the average tax applied in the various US states on the price of cigarettes is mu=73cents, while the standard deviation is sigma=48cents.
Can one argue on the basis of this information that the distribution has a bell shape?

dlakelan · September 28, 2023, 6:16pm

no. The existence of a mean and standard deviation doesn’t imply much about the shape (it only really implies an upper bounds on how heavy the tails are).

Also, there are 50 states exactly, so the actual distribution is 50 point masses.

Christopher_Fisher · September 28, 2023, 6:32pm

It is possible to compute the mean and standard deviation of any distribution.

julia> using Distributions

julia> mean(Uniform(0, 1))
0.5

julia> std(Uniform(0, 1))
0.28867513459481287

Edit

Well, most distributions

julia> std(Cauchy(0, 1))
NaN

Dan · September 28, 2023, 6:45pm

The data behind these numbers is the tax in each state. If we write the tax rates as x_i where 1 <= i <= 50 (supposiing there are 50 states). What the OP says amounts to the following two constraints:
\sum_{i=1}^{50} x_i = 50*73
\sum_{i=1}^{50} x_i^2 = 50*(73^2+48^2)
where the units are cent in first equation and cent^2 for the second equation.

This has 50 unknowns with two equations, and that’s most of what the data guarantees.

(*) the second equation assumes an un-corrected std-dev calculation.

The answer just illustrates how little the mean and std-dev constrain the data, and any conclusion would need more input from common sense. For example, one might assume no state subsidizes smoking and therefore x_i >= 0 for all i.

Christopher_Fisher · September 28, 2023, 6:51pm

One thing you could do is leverage the central limit theorem and argue that the tax is the sum of many random factors, such as demographics and values of people in different states, legislative process, etc. Of course, it may not be a very strong argument, but it might be plausible.

rocco_sprmnt21 · September 28, 2023, 6:58pm

This was one of the reflections I made (the problem was posed to me by my daughter who studies psychology.) but I would like to see it developed with some formalism that perhaps would help me understand better

rocco_sprmnt21 · September 28, 2023, 7:05pm

I’m not very familiar with statistical formulas, but I have some doubts about this one

Dan · September 28, 2023, 7:06pm

Those 50 states would probably have a few states with a round number for a tax, say 50c or 0c. Even if a few states have the same tax, it makes a model with an underlying single distribution for all states, which is unimodal (i.e. bell-like shaped) statistically significantly unlikely.

Dan · September 28, 2023, 7:09pm

julia> X = rand(50).+1;

julia> σ = std(X; corrected=false)
0.2903419039234251

julia> μ = mean(X)
1.5242881190315183

julia> sum(X.*X)
120.38763454972622

julia> 50*(μ^2 + σ^2)
120.38763454972616

Christopher_Fisher · September 28, 2023, 7:10pm

For a formal treatment, I recommend researching the central limit theorem and the Berry-Esseen theorem, which discusses convergence to a normal distribution under stronger assumptions. If a demonstration will suffice, you can do something like the following:

using Plots

using Distributions

using Random

Random.seed!(90)

n_vars = 5

n_samples = 10_000

samples = rand(n_samples, n_vars)

sums = sum(samples, dims=2)

histogram(sums, norm=true)

x = range(0, 5, length=100)

dens = pdf.(Normal(mean(sums), std(sums)), x)

plot!(x, dens, leg=false, grid=false, color=:black)

example

Dan · September 28, 2023, 7:18pm

I’m sure the psychology students will be delighted to learn about this theorem and all the conditions and history of the central limit theorem. Especially going over the several right (and wrong) proofs in history

cchderrick · September 28, 2023, 8:01pm

I would have thought the random factors boils down to the two polarized political parties (not so random in a continuous spectrum)

just out of curiosity, went searching for the tax stats online, you can almost tell the political party of the state by looking at the tax. (suggest strong correlation in the sample data, and CLT assume independent data?)

rocco_sprmnt21 · September 28, 2023, 8:20pm

With the following data you get almost the same values as the example on which the question is asked.
In some cases the sale of cigarettes is subsidized. What political party would this be in the USA ?
How does the histogram function work?
How do you use 10000 values?

n_vars = 50

n_samples = 10_000

samples = rand(-510:657, n_samples, n_vars)
sums = sum(samples, dims=2)./n_vars


m,s=mean(sums), std(sums)
histogram(sums, norm=true)

l,r = m-4*s, m+4*s
x = range(l,r, length=Int(trunc(r-l)))

dens = pdf.(Normal(mean(sums), std(sums)), x)

plot!(x, dens, leg=false, grid=false, color=:black)

Christopher_Fisher · September 28, 2023, 10:48pm

These are good points. CLT holds under “weak” dependence according to this paper: https://www.sciencedirect.com/science/article/pii/0047259X81901287

I did not go through it in sufficient detail to understand what weak dependence means. My guess is that convergence will take longer with correlated random variables. If they are maximally correlated, the CLT will not apply.

mthelm85 · September 28, 2023, 11:07pm

The CDC makes excise tax rates on cigarettes by state available publicly:

EDIT

I couldn’t resist digging into the data at least a little bit. The following is a histogram of the excise tax rate:

Topic		Replies	Views
Evaluating explicitly the kernel for a Gaussian/Normal distribution Statistics question	1	645	March 13, 2017
Package Distributions gives different resullt for standard deviation General Usage	3	641	June 7, 2017
Constant/degenerate distribution Statistics question	3	960	May 3, 2019
Problems with skewness and kurtosis (but not mean and variance) New to Julia	1	542	July 3, 2020
Learning statistics: seeking advice on how to model a 2d-size distribution of clothing items Statistics	25	972	November 11, 2021

Bell shaped?

Related topics