Replacing missing values in a matrix is super slow

I have a matrix with the size of 360x180x1200. It contains lots of values that are equal to 100000002004087734272. I assume those are missing value indicators. I tried to use the below command to replace them with NaNs, but it is taking forever to complete.
A[A .> 100] .= NaN;

In comparison, when I tried to do the same thing in Matlab (see below), it was blazingly fast and literally took no time at all. It was done as soon as I finished clicking Enter.
A(A>100) = NaN;

What am I missing? Is there a faster way to do this in Julia?

Many thanks!

Here’s what I see:

using BenchmarkTools

A = 200*rand(360, 180, 1200);
@btime $A[$A .> 100] .= NaN
  55.450 ms (7 allocations: 9.27 MiB)

That’s pretty fast :slight_smile: so please give a complete MWE that reproduces the slow thing you are seeing…

2 Likes

Hi!

I wouldn’t call that super slow.

julia> x = randn((360, 180, 1200));

julia> f(x) =  x[x .>100] .= NaN;

julia> @time f(x);
  0.640111 seconds (2.93 M allocations: 143.932 MiB, 1.21% gc time, 89.95% compilation time)

julia> @time f(x);
  0.063607 seconds (8 allocations: 9.274 MiB)

Do you see something different on your machine? Do you have an array with Integer of Float values?

1 Like

as a side note: using replace! would allocate less.

I guess a relevant question might be: what is typeof(A)? Given that

julia> 100000002004087734272 > typemax(Int64)
true

maybe there’s a BigInt issue?

3 Likes

I tried the randomly generated matrix and it is indeed not that slow. However, for my own matrix, it is still taking forever.

It is a model output matrix of Float32, with missing values as 1e20.

Please show the type of your matrix, better a MWE.

If the type is not concrete like Matrix{Float64 that’s bad.

Maybe you have Matrix{Real} which is a matrix of many different types.

1 Like

Thanks! Here you go:

typeof(A) = Array{Union{Missing, Float32}, 3}

I don’t know anything about NetCDF, but maybe it is a memory mapped format? You could try collect(A) and see if that turns it into a plain old Matrix{Float32}

2 Likes

Unfortunately, the command of A = collect(A) is taking forever to execute as well.

It is now done, and the type of my new A is now shown as the below:

typeof(A) = Array{Union{Missing, Float32}, 3}

I guess the issue could be related to the NCDatasets package that did not load the values out cleanly.

For loops in Julia are faster than Matlab-style tricks, that Julia can also do and faster.

Try for example (improved with input from @DNF):

function capnan!(A, v)
    for i in eachindex(A)
        (A[i] > v) && (A[i] = NaN)
    end
end
1 Like

Thanks! Unfortunately, I got the below error:

LoadError: TypeError: non-boolean (Missing) used in boolean context

Stacktrace:

[1] **capnan!(** A::NCDatasets.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float32, Nothing, Nothing, Nothing, Nothing, Nothing}}}, v::Int64 **)**

&& and || error with missing values. Actually, now I’m confused about why your original code works if you have missings.

The package MissingsAsFalse.jl is helpful here.

This is not the same as missing value.

1 Like

This is what eachindex is for😉

1 Like

I keep getting the below error. Maybe Julia consider that large value as a missing value?

unable to check bounds for indices of type Missing

typeof(A) = Array{Union{Missing, Float32}, 3}

No, you have missing values in your array. missing is different from NaN.

julia> x = [1 2; missing 4]
2×2 Matrix{Union{Missing, Int64}}:
 1         2
  missing  4

julia> x[x .> 1] .= 100
ERROR: ArgumentError: unable to check bounds for indices of type Missing
2 Likes

I think this is the exact problem I’m having and I’m trying to replace the missing values with NaNs.

typeof(A) = Array{Union{Missing, Float32}, 3}

Any idea about how to do that? I tried the below and it did not work:
replace!(A, missing=>NaN);

Are you trying to replace missing values with NaN? Earlier you said you were trying to replace the number 100000002004087734272 with missing.

Use replace!

julia> x = [1 2; missing 4]
2×2 Matrix{Union{Missing, Int64}}:
 1         2
  missing  4

julia> replace!(x, missing => 5)
2×2 Matrix{Union{Missing, Int64}}:
 1  2
 5  4
2 Likes

My code:

@show typeof(A);
replace!(A, missing=>NaN);
@show typeof(A);

Results:

typeof(A) = Array{Union{Missing, Float32}, 3}
typeof(A) = Array{Union{Missing, Float32}, 3}

Does that mean it did not work?