leon
December 16, 2021, 4:22pm
1
I have a matrix with the size of 360x180x1200. It contains lots of values that are equal to 100000002004087734272. I assume those are missing value indicators. I tried to use the below command to replace them with NaNs, but it is taking forever to complete.
A[A .> 100] .= NaN;
In comparison, when I tried to do the same thing in Matlab (see below), it was blazingly fast and literally took no time at all. It was done as soon as I finished clicking Enter.
A(A>100) = NaN;
What am I missing? Is there a faster way to do this in Julia?
Many thanks!
sijo
December 16, 2021, 4:44pm
2
Here’s what I see:
using BenchmarkTools
A = 200*rand(360, 180, 1200);
@btime $A[$A .> 100] .= NaN
55.450 ms (7 allocations: 9.27 MiB)
That’s pretty fast so please give a complete MWE that reproduces the slow thing you are seeing…
2 Likes
Hi!
I wouldn’t call that super slow.
julia> x = randn((360, 180, 1200));
julia> f(x) = x[x .>100] .= NaN;
julia> @time f(x);
0.640111 seconds (2.93 M allocations: 143.932 MiB, 1.21% gc time, 89.95% compilation time)
julia> @time f(x);
0.063607 seconds (8 allocations: 9.274 MiB)
Do you see something different on your machine? Do you have an array with Integer of Float values?
1 Like
bkamins
December 16, 2021, 4:52pm
4
as a side note: using replace!
would allocate less.
nilshg
December 16, 2021, 4:56pm
5
I guess a relevant question might be: what is typeof(A)
? Given that
julia> 100000002004087734272 > typemax(Int64)
true
maybe there’s a BigInt
issue?
3 Likes
leon
December 16, 2021, 4:57pm
6
I tried the randomly generated matrix and it is indeed not that slow. However, for my own matrix, it is still taking forever.
It is a model output matrix of Float32, with missing values as 1e20.
roflmaostc:
x[x .>100] .= NaN;
Please show the type of your matrix, better a MWE.
If the type is not concrete like Matrix{Float64
that’s bad.
Maybe you have Matrix{Real}
which is a matrix of many different types.
1 Like
leon
December 16, 2021, 4:59pm
8
Thanks! Here you go:
typeof(A) = Array{Union{Missing, Float32}, 3}
nilshg
December 16, 2021, 5:00pm
9
I don’t know anything about NetCDF, but maybe it is a memory mapped format? You could try collect(A)
and see if that turns it into a plain old Matrix{Float32}
2 Likes
leon
December 16, 2021, 5:03pm
10
Unfortunately, the command of A = collect(A)
is taking forever to execute as well.
It is now done, and the type of my new A is now shown as the below:
typeof(A) = Array{Union{Missing, Float32}, 3}
I guess the issue could be related to the NCDatasets package that did not load the values out cleanly.
For loops in Julia are faster than Matlab-style tricks, that Julia can also do and faster.
Try for example (improved with input from @DNF ):
function capnan!(A, v)
for i in eachindex(A)
(A[i] > v) && (A[i] = NaN)
end
end
1 Like
leon
December 16, 2021, 5:10pm
12
Thanks! Unfortunately, I got the below error:
LoadError: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
[1] **capnan!(** A::NCDatasets.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float32, Nothing, Nothing, Nothing, Nothing, Nothing}}}, v::Int64 **)**
&&
and ||
error with missing values. Actually, now I’m confused about why your original code works if you have missing
s.
The package MissingsAsFalse.jl is helpful here.
leon:
100000002004087734272
This is not the same as missing
value.
1 Like
DNF
December 16, 2021, 5:14pm
15
rafael.guerra:
@inbounds for k = 1:size(A,3), j = 1:size(A,2), i = 1:size(A,1)
This is what eachindex
is for😉
1 Like
leon
December 16, 2021, 5:14pm
16
I keep getting the below error. Maybe Julia consider that large value as a missing value?
unable to check bounds for indices of type Missing
typeof(A) = Array{Union{Missing, Float32}, 3}
No, you have missing
values in your array. missing
is different from NaN
.
julia> x = [1 2; missing 4]
2×2 Matrix{Union{Missing, Int64}}:
1 2
missing 4
julia> x[x .> 1] .= 100
ERROR: ArgumentError: unable to check bounds for indices of type Missing
2 Likes
leon
December 16, 2021, 5:17pm
18
I think this is the exact problem I’m having and I’m trying to replace the missing values with NaNs.
typeof(A) = Array{Union{Missing, Float32}, 3}
Any idea about how to do that? I tried the below and it did not work:
replace!(A, missing=>NaN);
Are you trying to replace missing
values with NaN
? Earlier you said you were trying to replace the number 100000002004087734272
with missing
.
Use replace!
julia> x = [1 2; missing 4]
2×2 Matrix{Union{Missing, Int64}}:
1 2
missing 4
julia> replace!(x, missing => 5)
2×2 Matrix{Union{Missing, Int64}}:
1 2
5 4
2 Likes
leon
December 16, 2021, 5:24pm
20
My code:
@show typeof(A);
replace!(A, missing=>NaN);
@show typeof(A);
Results:
typeof(A) = Array{Union{Missing, Float32}, 3}
typeof(A) = Array{Union{Missing, Float32}, 3}
Does that mean it did not work?