Estimate Empirical CDF using ecdf() on sample

Max_Beez · June 26, 2022, 9:13pm

If I generate a sample of some distribution, for instance of the Exponential distribution then there is no issue using ecdf to plot the Empirical CDF. However, if I apply a function and take a sample I get the error

ERROR: LoadError: MethodError: no method matching ecdf(::Vector{Any})
Closest candidates are:
ecdf(::AbstractVector{T} where T<:Real; weights) at ~/.julia/packages/StatsBase/n494Y/src/empirical.jl:56

Why can’t I pass a sample of this sort?

tbeason · June 26, 2022, 10:07pm

Your vector is of type Any and the method requires vectors of Real numbers. You should convert your vector to Float64 or something related.

convert(Vector{Float64},data) where data is your vector should do it.

Likely the package should relax this restriction though.

Max_Beez · June 26, 2022, 10:16pm

That worked, interesting that it doesn’t accept as is. As you said perhaps should be relaxed.

Benny · June 26, 2022, 10:37pm

The ECDF is only defined for real numbers and often implemented with sorting, which you don’t want to to fail 80% of the way in because it ran into an errant complex number or string. Although, an ::AbstractVector{Any} method could attempt a conversion step before passing the Vector{<:Real} to the usual method.

ericphanson · June 27, 2022, 1:45am

YASGuide says it should be relaxed also. As it says,

For example, AbstractArray{<:MyType} does not describe "any value of type <:AbstractArray containing elements of type <:MyType "; it only describes a subset of such values (e.g. typeof(Any[1]) <: AbstractArray{<:Number} is false ).

In my opinion this kind of error is really annoying. To me it’s understandable to get that error 80% through if it hits a string or whatever; this would be a really unusual case where I’m calling it on some very long input (otherwise 80% or 0% doesn’t matter) that’s also not validated/tested at all (otherwise there wouldn’t be a string).

tbeason · June 27, 2022, 12:27pm

Yea the problem here is that the vector was erroneously of type Any, which is likely the most common reason people would pass Vector{Any} to methods like this anyway (perhaps because they read it from a file and did not check to see the result was correctly typed). The elements are really floats and all of the computation will work just fine, so it should not error.

nalimilan · June 27, 2022, 12:42pm

The restriction on the vector type is most likely there for historical reasons (lots of code in StatsBase does that). PR welcome to fix it!

Topic		Replies	Views
How does one use empirical distributions? Statistics	1	852	December 6, 2018
How do I plot the estimated cumulative density function of some samples? General Usage plotting , gadfly , plots	6	6191	April 23, 2020
ANN: EmpiricalCDFs.jl Statistics	0	670	April 18, 2018
Discrete empirical distribution General Usage	3	295	May 14, 2023
[ANN] EmpiricalCDFs.jl registered and documented Statistics package , announcement	0	684	May 24, 2018

Estimate Empirical CDF using ecdf() on sample

Related topics