Adapt code to Julia

I’m trying to adapt the following matlab code to julia

% MATLAB code to perform the cross validation
lambda = 1;
n = 1000;
X = exprnd(1/lambda,[n,1]);
m = 6:200;
J = zeros(1,195);

for i=1:195
    [num,binc] = hist(X,m(i));
    h = n/m(i);
    J(i) = 2/((n-1)*h)-((n+1)/((n-1)*h))*sum( (num/n).^2 );
end

plot(m,J,'LineWidth',4,'Color',[0.9,0.2,0.0]);

Similar implemetation in Python:

# Python code to perform the cross validation
import numpy as np
import matplotlib.pyplot as plt
lambd = 1
n     = 1000
X     = np.random.exponential(1/lambd, size=n)
m     = np.arange(5,200)
J     = np.zeros((195))

for i in range(0,195):
  hist,bins = np.histogram(X,bins=m[i])
  h = n/m[i]
  J[i] = 2/((n-1)*h)-((n+1)/((n-1)*h))*np.sum((hist/n)**2)

plt.plot(m,J);

Which output is:

image

With the Julia code:

using Plots,Random,Distributions,StatsBase
λ = 1
n = 1_000
d = Exponential(1/λ)
X = rand(d,n)
m = 6:200
J = zeros(195)

for i = 1:195
    hist = fit(Histogram,X,nbins = m[i])
    h = n/m[i]
    J[i] = 2/((n-1)*h)-((n+1)/((n-1)*h))*sum( (hist.weights/n).^2 )
end

plot(m,J)

But the ouput is :

image

Am I missing something?

The problem is that nbins is not being respected by fit:

julia> h = fit(Histogram, rand(100), nbins=7)
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.2:1.0
weights: [15, 26, 15, 23, 21]
closed: left
isdensity: false

julia> h = fit(Histogram, rand(100), nbins=8)
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.2:1.0
weights: [24, 17, 23, 15, 21]
closed: left
isdensity: false

Seems wrong to me. Maybe someone has something to say about that.

4 Likes

It seems StatsBase.histrange has some extra logic in it, but I don’t know the reason why.

The documentation states

nbins : if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers).

One can, however, provide the edges directly to avoid this.

julia> fit(Histogram, rand(100), range(0.0,1.0,length=7+1))
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.14285714285714285:1.0
weights: [13, 9, 13, 18, 16, 19, 12]
closed: left
isdensity: false

julia> fit(Histogram, rand(100), range(0.0,1.0,length=10+1))
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.1:1.0
weights: [8, 10, 9, 8, 7, 16, 11, 10, 13, 8]
closed: left
isdensity: false

I also found an issue regarding this topic:

4 Likes

Changing fit(Histogram,X,nbins = m[i]) to fit(Histogram,X,range(minimum(X),maximum(X),length = m[i]+1)) did exactly what I needed, thank you!

3 Likes