Estimate Empirical CDF using ecdf() on sample

If I generate a sample of some distribution, for instance of the Exponential distribution then there is no issue using ecdf to plot the Empirical CDF. However, if I apply a function and take a sample I get the error

ERROR: LoadError: MethodError: no method matching ecdf(::Vector{Any})
Closest candidates are:
ecdf(::AbstractVector{T} where T<:Real; weights) at ~/.julia/packages/StatsBase/n494Y/src/empirical.jl:56

Why can’t I pass a sample of this sort?

Your vector is of type Any and the method requires vectors of Real numbers. You should convert your vector to Float64 or something related.

convert(Vector{Float64},data) where data is your vector should do it.

Likely the package should relax this restriction though.

3 Likes

That worked, interesting that it doesn’t accept as is. As you said perhaps should be relaxed.

The ECDF is only defined for real numbers and often implemented with sorting, which you don’t want to to fail 80% of the way in because it ran into an errant complex number or string. Although, an ::AbstractVector{Any} method could attempt a conversion step before passing the Vector{<:Real} to the usual method.

2 Likes

YASGuide says it should be relaxed also. As it says,

For example, AbstractArray{<:MyType} does not describe "any value of type <:AbstractArray containing elements of type <:MyType "; it only describes a subset of such values (e.g. typeof(Any[1]) <: AbstractArray{<:Number} is false ).

In my opinion this kind of error is really annoying. To me it’s understandable to get that error 80% through if it hits a string or whatever; this would be a really unusual case where I’m calling it on some very long input (otherwise 80% or 0% doesn’t matter) that’s also not validated/tested at all (otherwise there wouldn’t be a string).

Yea the problem here is that the vector was erroneously of type Any, which is likely the most common reason people would pass Vector{Any} to methods like this anyway (perhaps because they read it from a file and did not check to see the result was correctly typed). The elements are really floats and all of the computation will work just fine, so it should not error.

The restriction on the vector type is most likely there for historical reasons (lots of code in StatsBase does that). PR welcome to fix it!

2 Likes