Histogram fit error

error

#1

Hello,
I sometimes get an InexactError() when using fit(Histogram, data, ws, 0:0.25:4, closed=:left).
The issue is very strange because it doesn’t always happen. The same code sometimes runs fine, but sometimes
I get the following error:

ERROR: LoadError: InexactError()
Stacktrace:
 [1] trunc(::Type{Int64}, ::Float64) at ./float.jl:672
 [2] searchsortedlast(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Float64, ::Base.Order.ForwardOrdering) at ./sort.jl:157
 [3] _edge_binindex at /home/user/.julia/v0.6/StatsBase/src/hist.jl:183 [inlined]
 [4] (::StatsBase.##80#81{StatsBase.Histogram{Float64,1,Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}})(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Float64) at /home/user/.julia/v0.6/StatsBase/src/hist.jl:177
 [5] push!(::StatsBase.Histogram{Float64,1,Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}, ::Tuple{Float64}, ::Float64) at /home/user/.julia/v0.6/StatsBase/src/hist.jl:247
 [6] append!(::StatsBase.Histogram{Float64,1,Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}, ::Tuple{Array{Float64,1}}, ::Array{Float64,1}) at /home/user/.julia/v0.6/StatsBase/src/hist.jl:267
 [7] #fit#93(::Symbol, ::Function, ::Type{StatsBase.Histogram{Float64,N,E} where E where N}, ::Tuple{Array{Float64,1}}, ::StatsBase.Weights{Float64,Float64,Array{Float64,1}}, ::Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}) at /home/user/.julia/v0.6/StatsBase/src/hist.jl:289
 [8] (::StatsBase.#kw##fit)(::Array{Any,1}, ::StatsBase.#fit, ::Type{StatsBase.Histogram{Float64,N,E} where E where N}, ::Tuple{Array{Float64,1}}, ::StatsBase.Weights{Float64,Float64,Array{Float64,1}}, ::Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}) at ./<missing>:0 (repeats 2 times)
 [9] #fit#88(::Array{Any,1}, ::Function, ::Type{StatsBase.Histogram}, ::Array{Float64,1}, ::StatsBase.Weights{Float64,Float64,Array{Float64,1}}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Vararg{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},N} where N) at /home/user/.julia/v0.6/StatsBase/src/hist.jl:233
 [10] (::StatsBase.#kw##fit)(::Array{Any,1}, ::StatsBase.#fit, ::Type{StatsBase.Histogram}, ::Array{Float64,1}, ::StatsBase.Weights{Float64,Float64,Array{Float64,1}}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at ./<missing>:0
 [11] include_from_node1(::String) at ./loading.jl:576
 [12] include(::String) at ./sysimg.jl:14
 [13] process_options(::Base.JLOptions) at ./client.jl:305
 [14] _start() at ./client.jl:371
while loading /home/user/.problems/histogram.jl, in expression starting on line 730

I tried to reproduce the error with some simple data, but I was not successful. I am not sure if the problem is related to the data, or I was not lucky.
I created a gist with some data which sometimes reproduces the error: https://gist.github.com/SebastianM-C/06b66daa66d05fde19538fc4b1f07b25

My versioninfo() is:

julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e2ff (2017-10-24 22:15 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

I have StatsBase version 0.19.0:

julia> Pkg.status("StatsBase")
 - StatsBase                     0.19.0

Am I missing something?


#2

I can’t reproduce using your gist (on 0.6.1).


#3

Actually I can reproduce it from time to time on Julia 0.6 and 0.7. Thanks for posting a MWE, that’s very useful!

The InexactError is due to the presence of a NaN, which appears in append! when calling _multi_getindex with the index 716. Looks like this is due to the fact that the weights vector contains 716 elements, but the data only contains 715 elements. Due to @inbounds, depending on the memory layout, the result can either contain a (semi-random) valid float value, or an invalid one. So in all cases something is wrong, and it can affect the result.

Could you file an issue against StatsBase? We should ensure the array sizes are compatible. Maybe Julia itself should do that with eachindex, which would have caught the problem (I’ve filed an issue).


#4

Thank you very much! I opened an issue: https://github.com/JuliaStats/StatsBase.jl/issues/315