How to find min/max by column from a matrix with missing values?

This doesn’t (obviously) work:

X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; missing -2.3; 5.2 -2.4]
minX = minimum(X,dims=1)
maxX = maximum(X,dims=1)

But the problem is that skipmissing() doesn’t keep the shape, so when I want to find min/max by each column, what should I implement ?

minX = [ minimum(skipmissing(X[:,col])) for col in 1:size(X)[2] ]

Drawback: returns a

2-element Array{Float64,1}

and not a

1×2 Array{Float64,2}

as expected. But maybe this doesn’t matter.

1 Like

Broadcasting solution:

julia> minimum.(skipmissing.(eachcol(X)))
2-element Array{Float64,1}:

I must stop thinking in comprehensions :grin:

FWIW I made a pull request to support this some time ago. It should be faster than eachrow (but not than eachcol probably) for Arrays since it processes continuous blocks of memory, but that also requires quite a bit of code.

Not necessarily. This is faster than the broadcasted version, with less allocation:

[minimum(skipmissing(col)) for col in eachcol(X)]

Also try extrema