Sample() from a data vector with NA values

statistics

#1

Is there a way to use sample() function from a DataArray in a way that deals gracefully with NA values?

What I mean is that if I do sample(a), it works fine - sometimes returning a Float64, and sometimes returning NA. But when I add a number (eg sample(a, 5), it works if all the samples are Float64s or if they’re all NAs, but if it gets a mix (i assume), I get:

MethodError: cannot `convert` an object of type DataArrays.NAtype to an object of type Float64

Is there anyway around this (so that my sample can have both Float64 and NA values)?

EDIT: whole error message:

MethodError: Cannot `convert` an object of type DataArrays.NAtype to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
 in setindex!(::Array{Float64,1}, ::DataArrays.NAtype, ::Int64) at array.jl:415
 in direct_sample!(::DataArrays.DataArray{Float64,1}, ::Array{Float64,1}) at sampling.jl:35
 in #sample!#66(::Bool, ::Bool, ::Function, ::DataArrays.DataArray{Float64,1}, ::Array{Float64,1}) at sampling.jl:280
 in (::StatsBase.#kw##sample!)(::Array{Any,1}, ::StatsBase.#sample!, ::DataArrays.DataArray{Float64,1}, ::Array{Float64,1}) at <missing>:0
 in sample(::DataArrays.DataArray{Float64,1}, ::Int64) at sampling.jl:318
 in include_string(::String, ::String) at loading.jl:441
 in include_string(::String, ::String, ::Int64) at eval.jl:28
 in include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N}) at eval.jl:32
 in (::Atom.##53#56{String,Int64,String})() at eval.jl:40
 in withpath(::Atom.##53#56{String,Int64,String}, ::String) at utils.jl:30
 in withpath(::Function, ::String) at eval.jl:46
 in macro expansion at eval.jl:57 [inlined]
 in (::Atom.##52#55{Dict{String,Any}})() at task.jl:60

#2

You didn’t report the whole error message. To which type is it trying to convert?
What should sample() return if it hits an NA?


#3

I don’t think so, but you can do it with the new NullableArrays data format.


#4

sample constructs an output Vector{T} using the eltype of the input. You can use the in-place mutating sample! instead, and provide your own DataArray:

julia> sample!(a, similar(a, 5)) # Note the argument order; output comes second here
5-element DataArrays.DataArray{Int64,1}:
…

Perhaps changing sample to use similar internally would be generally useful.


#5

You can use the in-place mutating sample! instead, and provide your own DataArray

This works perfectly, thanks!

I agree that this would be useful generally, but I’m not sure what other implications this has. Perhaps just a note in the docs mentioning this work-around would be sufficient - this strikes me as something that could trip up more than just me. It was particularly odd that it worked sometimes (if none of the samples were NA )