# Replacing missing values in a matrix is super slow

I have a matrix with the size of 360x180x1200. It contains lots of values that are equal to 100000002004087734272. I assume those are missing value indicators. I tried to use the below command to replace them with NaNs, but it is taking forever to complete.
A[A .> 100] .= NaN;

In comparison, when I tried to do the same thing in Matlab (see below), it was blazingly fast and literally took no time at all. It was done as soon as I finished clicking Enter.
A(A>100) = NaN;

What am I missing? Is there a faster way to do this in Julia?

Many thanks!

Here’s what I see:

``````using BenchmarkTools

A = 200*rand(360, 180, 1200);
@btime \$A[\$A .> 100] .= NaN
55.450 ms (7 allocations: 9.27 MiB)
``````

That’s pretty fast so please give a complete MWE that reproduces the slow thing you are seeing…

2 Likes

Hi!

I wouldn’t call that super slow.

``````julia> x = randn((360, 180, 1200));

julia> f(x) =  x[x .>100] .= NaN;

julia> @time f(x);
0.640111 seconds (2.93 M allocations: 143.932 MiB, 1.21% gc time, 89.95% compilation time)

julia> @time f(x);
0.063607 seconds (8 allocations: 9.274 MiB)
``````

Do you see something different on your machine? Do you have an array with Integer of Float values?

1 Like

as a side note: using `replace!` would allocate less.

I guess a relevant question might be: what is `typeof(A)`? Given that

``````julia> 100000002004087734272 > typemax(Int64)
true
``````

maybe there’s a `BigInt` issue?

3 Likes

I tried the randomly generated matrix and it is indeed not that slow. However, for my own matrix, it is still taking forever.

It is a model output matrix of Float32, with missing values as 1e20.

If the type is not concrete like `Matrix{Float64` that’s bad.

Maybe you have `Matrix{Real} ` which is a matrix of many different types.

1 Like

Thanks! Here you go:

typeof(A) = Array{Union{Missing, Float32}, 3}

I don’t know anything about NetCDF, but maybe it is a memory mapped format? You could try `collect(A)` and see if that turns it into a plain old `Matrix{Float32}`

2 Likes

Unfortunately, the command of `A = collect(A)` is taking forever to execute as well.

It is now done, and the type of my new A is now shown as the below:

typeof(A) = Array{Union{Missing, Float32}, 3}

I guess the issue could be related to the NCDatasets package that did not load the values out cleanly.

For loops in Julia are faster than Matlab-style tricks, that Julia can also do and faster.

Try for example (improved with input from @DNF):

``````function capnan!(A, v)
for i in eachindex(A)
(A[i] > v) && (A[i] = NaN)
end
end
``````
1 Like

Thanks! Unfortunately, I got the below error:

``````LoadError: TypeError: non-boolean (Missing) used in boolean context

Stacktrace:

 **capnan!(** A::NCDatasets.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float32, Nothing, Nothing, Nothing, Nothing, Nothing}}}, v::Int64 **)**
``````

`&&` and `||` error with missing values. Actually, now I’m confused about why your original code works if you have `missing`s.

The package MissingsAsFalse.jl is helpful here.

This is not the same as `missing` value.

1 Like

This is what `eachindex` is for😉

1 Like

I keep getting the below error. Maybe Julia consider that large value as a missing value?

`unable to check bounds for indices of type Missing`

typeof(A) = Array{Union{Missing, Float32}, 3}

No, you have `missing` values in your array. `missing` is different from `NaN`.

``````julia> x = [1 2; missing 4]
2×2 Matrix{Union{Missing, Int64}}:
1         2
missing  4

julia> x[x .> 1] .= 100
ERROR: ArgumentError: unable to check bounds for indices of type Missing
``````
2 Likes

I think this is the exact problem I’m having and I’m trying to replace the missing values with NaNs.

typeof(A) = Array{Union{Missing, Float32}, 3}

Any idea about how to do that? I tried the below and it did not work:
`replace!(A, missing=>NaN);`

Are you trying to replace `missing` values with `NaN`? Earlier you said you were trying to replace the number `100000002004087734272` with `missing`.

Use `replace!`

``````julia> x = [1 2; missing 4]
2×2 Matrix{Union{Missing, Int64}}:
1         2
missing  4

julia> replace!(x, missing => 5)
2×2 Matrix{Union{Missing, Int64}}:
1  2
5  4
``````
2 Likes

My code:

``````@show typeof(A);
replace!(A, missing=>NaN);
@show typeof(A);
``````

Results:

``````typeof(A) = Array{Union{Missing, Float32}, 3}
typeof(A) = Array{Union{Missing, Float32}, 3}
``````

Does that mean it did not work?