I have to work on a problem that requires me to use characteristic functions to represent random variables, as these particular rv’s have no close form distribution. The last time I did something remotely similar was at least 4 years ago and it was on matlab, so my recollection of this is quite fuzzy. I am having particular trouble understanding the convention of the Fourier transform used and how I might need to take it into account.
First off, in the documentation at API · AbstractFFTs.jl, the convention used for the transform/inverse transform seems to indicate that the C.F. is supposed to be calculated with ifft, so to use fft to get the pdf from C.F. This is not consistent with the arguments being real valued arrays or complex arrays, so in my test below I will pretend I didn’t read these docs and will use ifft to go from C.F. to pdf, as my initial intuition would be.
Here’s a short attempt using the C.F. of the Gamma distribution (Gamma distribution - Wikipedia). I chose this one because it’s easy and short xto write.
using FFTW
n_grid = 2^13 # total number of grid points
N = n_grid / 2 # support for R- and R+
x_width = 2^3
Δx = x_width / n_grid # in original space
x_grid = Δx .* vec(-N:1:(N-1)) # in original space
Δxi = 2*π / x_width # in fourier space
xi_grid = Δxi .* vec(-N:1:(N-1)) # in fourier space
k = 9.0
θ = 0.5
# CF of the Gamma distribution
Φ(u) = (1 - 1im * u * θ) ^ (-k)
Φ_ongrid = map(Φ, xi_grid)
test_pdf = real(fftshift(ifft(Φ_ongrid)))
plot(x_grid, test_pdf)
I get some oscillating weird thing. I vaguely remember something about having to multiply the C.F. by its complex conjugate and something about the Plancherel theorem, but I can’t seem to find any references about it at the moment. Anyone used to doing this?
I went searching around but this was the only post that I found that was somewhat related, but since the author does the transformation and then the inverse, it doesn’t answer my questions, I believe
related post:
The transposition of the matlab code would look like that :
using FFTW, Distributions, Plots
function build_pdf(chf; xmin=-10,xmax=10,N=2^8)
rng = xmax - xmin
dx = rng/N
k = 0:(N-1)
w = (0.5 - N/2 .+ k) * ( 2π / rng)
ϕ = chf.(w[Int.((N/2+1)):end])
ϕ = vcat(conj.(ϕ[end:-1:1]),ϕ)
C = (-1+0im).^((1-1/N)*(xmin/dx .+ k))/rng
D = (-1+0im).^(-2*(xmin/rng) .* k)
return xmin .+ k * dx, real(C .* fft(D .* ϕ))
end
X = Gamma(9.0,0.5)
x, f = build_pdf(t -> cf(X,t); xmin = -10, xmax = 10,N = 2^8)
plot(x,f,label="reconstructed")
plot!(x, pdf.(X,x),label="true")
Which works. However more complicated characteristic functions do not work very well, as this code shows (a simple convolution of 4 gamma distributions)
Xs = (Gamma(rand(),rand()) for i in 1:4)
x, f = build_pdf(t -> prod(cf(X,t) for X in Xs); xmin = -10, xmax = 10,N = 2^8)
histogram(sum(rand(X,1000) for X in Xs),normalize=:pdf,bins=100)
plot!(x,f,label="reconstructed")
I had seen some books vaguely alluding to using quadratures and fft’ing the bins as well, but I assumed those were the author’s specific techniques for the application they were discussing.
Many thanks @lrnv , I was also in the process on doing that and reading the author’s own referenced paper, to understand the reasoning a bit better.
The x_min and x_max values seem crucial, as the types of distributions that require this kind of technique are most often than not fat tailed, and truncating the pdf doesn’t seem trivial to me. Additionally, maybe we can try to brute force for more precision by increasing the sampling points (I was initially trying with 2^13, for example).
I’ll try to play around with those parameters a bit to see if there’s any improvement as well.
The problem here seems to be to find a way of automatically calibrating those. The author suggest a max width of 6 sigma, but some of these distributions don’t even have defined 2nd moment.
Yes increasing the sampling points would help, but not that much for more complex distributions (such as the simple convolution of gammas i proposed).
X_min and x_max are just the zone of evaluation of the pdf : they would not really change the results, but of course if your distribution is fat_tailed, you might want x_max to be really big.
Another direction is to use a better integration technique than fft, but it of course depends on what you are trying to do. If you know the characteristic function (i mean, analytically), you might try Broomwitch integration directly, which usually gives better results with the right tools.
Another idea would be to concentrate on what you really want : why the pdf ? If the pdf is a step towards something else (e.g., maybe you want to sample the distribution ?) then there might be a better way of doing it than precomputing the pdf.
Yep, in that convolution case it does fail absolutely spectacularly, even with increased sampling.
In this case, I am working with (asymmetrical) tempered alpha stable distributions, which do have an analytical closed form for the characteristic function; the goal is to fit the tail tempering parameters. I am contemplating a few methods but they’re basically numerically optimizing something, like the MLE, ADS-test-statistics or QQplot L2 loss, and all of them involve either the pdf or the cdf.
To give an idea of references, Ranchev, Kim, Bianchi (2011) mention in section 6.2.1 using this type of integration to obtain the pdf and cdf. Other references refer to this one when vaguely mentioning parameter calibration techniques.
Regarding sampling, I have that semi covered, but I would like to avoid brute forcing this with Monte Carlo; trying to find an easier way first, the FFT approach seemed to be easier at first…
I had never heard of Broomwitch integration, I will check it, thanks for the tip.
EDIT: Ah, it’s the inverse laplace transform mentioned in the famous Arfen book!
This makes a new distribution for each t. Try making Xs a vector, to materialize the distributions once.
Although your example is actually fairly benign, your point holds - constructing distributions this way does sometimes require fancy quadrature schemes (and high-precision arithmetic).
Indeed, making it a vector is far superior as the distribution is not randomly picked at each t anymore:
Xs = [Gamma(10*rand(),rand()) for i in 1:10]
x, f = build_pdf(t -> prod(cf(X,t) for X in Xs); xmin = 0, xmax = 30,N = 2^12)
histogram(sum(rand(X,1000) for X in Xs),normalize=:pdf,bins=100)
plot!(x,f,label="reconstructed")
Thanks @Ralph_Smith for the remark.
I will try applying this to my particular problem. I will also compare the quadrature methods I see mentioned by authors and see if I can somehow use the ones in QuadGK. I’ll see if I can also report some findings here.
So, I was reading the source suggested by @rafael.guerra a bit, and for my purposes, the initial author’s methodology seems to be useful, so I wrote a minimal interpretation of it.