Replacing missing values in a matrix is super slow

It did work. replace! changes the array in-place. This means it doesn’t change it’s type. Use replace (with no !), to make a new array that is just Array{Float32, 3}

julia> replace(x, missing => 5)
2×2 Matrix{Int64}:
 1  2
 5  4

but be warned, this matrix can no longer hold missings. So you won’t be able to do your imputation 100000002004087734272 => missing in-place after.

1 Like

Note that this is not the problem here, and your approach is even slower and more verbose :smiley:

julia> function capnan!(A, v)
           for i in eachindex(A)
               (A[i] > v) && (A[i] = NaN)
           end
       end
capnan! (generic function with 1 method)

julia> f(x, v) =  x[x .> v] .= NaN;

julia> x = randn((360, 180, 1200)) * 1000000;

julia> using BenchmarkTools

julia> @btime capnan!($x, 100);
  92.596 ms (0 allocations: 0 bytes)

julia> @btime f($x, 100);
  67.427 ms (7 allocations: 9.27 MiB)

julia> @btime replace!(e -> e > 100 ? NaN : e, x);
55.464 ms (1 allocation: 16 bytes)
2 Likes

Dear All,

Many thanks for your big help. I did the below after replacing the missing values and now it is working! Hooray

replace!(A, missing => NaN)

A = convert.(Float64, A)

typeof(A) = Array{Float64, 3}

BTW, it is blazingly fast as well.

Happy holidays!

1 Like

Strange, benchmarked it here to be about 2 times faster. Julia 1.7, Win11. Will check again later. It might have to do with inplace function benchmarking.

1 Like

Interesting

julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
1 Like

Did you intend to convert it to Float64 instead of Float32? Otherwise it would be better to just directly do

replace(A, missing=>NaN) #or NaN32? 
1 Like

they seem to be doing pretty much the same thing:

julia> @benchmark begin replace!(A, missing => NaN); A = convert.(Float64, A) end setup=(A=rand([missing, 0.3], 1000, 1000))
BenchmarkTools.Trial: 375 samples with 1 evaluation.
 Range (min … max):  3.881 ms …   5.811 ms  ┊ GC (min … max): 0.00% … 21.07%
 Time  (median):     3.936 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.467 ms ± 755.305 μs  ┊ GC (mean ± σ):  5.62% ±  7.53%

  ▆█                                                           
  ██▅▃▂▃▁▁▁▁▁▁▁▂▁▃▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▃▃▂▃▃▂▂▁▁▁▁▁▁▁▃▇▆▃ ▂
  3.88 ms         Histogram: frequency by time         5.7 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.

julia> @benchmark begin A=replace(A, missing => NaN); end setup=(A=rand([missing, 0.3], 1000, 1000))
BenchmarkTools.Trial: 391 samples with 1 evaluation.
 Range (min … max):  3.244 ms …   5.190 ms  ┊ GC (min … max): 0.00% … 10.32%
 Time  (median):     3.306 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.823 ms ± 763.539 μs  ┊ GC (mean ± σ):  6.61% ±  8.74%

  ▅█▅▁                                     ▂              ▄▄▁  
  ████▁▅▁▁▁▄▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆█▇█▇▅▄▁▁▁▁▁▁▁▁▁▁▁███ ▆
  3.24 ms      Histogram: log(frequency) by time      5.11 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.

And then one should choose the simpler solution. But they’re not quite the same, considering the output type.

@roflmaostoc, as a paranthesis to this post and in order to check the efficiency of the proposed for loop compared to matlab style bitmap indexing, I performed the following benchmarks for inplace functions using setup:

capnan0!(A,v) = A[A .> v] .= NaN

function capnan1!(A, v)
    for i in eachindex(A)
        (A[i] > v) && (A[i] = NaN)
    end
end

using BenchmarkTools
@btime capnan0!(A, 100)  setup=(A=200*rand(360,180,1200)) evals=1   # 180 ms (8 allocations: 305 MiB)
@btime capnan1!(A, 100)  setup=(A=200*rand(360,180,1200)) evals=1   #  44 ms (0 allocations: 0 bytes)

As you can see, the for loop is ~4x faster, assuming the benchmarks are done correctly.

see versioninfo() details here

julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core™ i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, icelake-client)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_STACKFRAME_FUNCTION_COLOR = blue
JULIA_WARN_COLOR = cyan
JULIA_EDITOR = code.cmd -g
JULIA_NUM_THREADS = 8

1 Like

I see what was going on:
Previous benchmarks were affected by branch prediction (or stuff like this) since from the second pass on, the function was not doing something effectively (in place update).

julia> Ac=copy(A); @time capnan0!(Ac, 100); @time capnan0!(Ac, 100); @time capnan0!(Ac, 100);
  0.264360 seconds (9 allocations: 305.862 MiB)
  0.066205 seconds (8 allocations: 9.274 MiB)
  0.066188 seconds (8 allocations: 9.274 MiB)

julia> Ac=copy(A); @time capnan1!(Ac, 100); @time capnan1!(Ac, 100); @time capnan1!(Ac, 100);
  0.062892 seconds
  0.050146 seconds
  0.053584 seconds

But still, I’m confused why the first call allocates 305.862 MiB and further calls not :confused:

I mean, it make sense that there is one allocation for the bit array

julia> x = A1 .> 100;

julia> varinfo(r"x")
  name      size summary                 
  –––– ––––––––– ––––––––––––––––––––––––
  x    9.270 MiB 360×180×1200 BitArray{3}

But why another ~300MiB?

1 Like

For the same reason it takes about 4x the time of the other runs. It is counting both time and allocations from the compilation.

It’s definitely not compilation.
See

julia> Ac=copy(A); @time capnan0!(Ac, 100); @time capnan0!(Ac, 100); @time capnan0!(Ac, 100);
  0.548873 seconds (9 allocations: 305.963 MiB)
  0.099174 seconds (8 allocations: 9.274 MiB)
  0.084230 seconds (8 allocations: 9.274 MiB)

julia> Ac=copy(A); @time capnan0!(Ac, 100); @time capnan0!(Ac, 100); @time capnan0!(Ac, 100);
  0.525876 seconds (9 allocations: 305.963 MiB, 15.80% gc time)
  0.068308 seconds (8 allocations: 9.274 MiB)
  0.073729 seconds (8 allocations: 9.274 MiB)

This agrees with my findings that the for loops solution is way faster. For example in this function I left a note that it’s 5x faster (seeking for zeros)

https://github.com/GenericMappingTools/GMT.jl/blob/master/src/common_options.jl#L3924

I see now, sorry for misunderstanding. The pattern evokes some lazy behaviour, maybe copy is not really doing the copy until the object is forced to? Or the OS is giving a pointer and a promise of memory but does allocate it until use? (I had seen this behavior a lot in C++ with very large arrays of undefined values.) The problem of this last hypothesis is that the Array has values inside already, no?

Yeah, maybe

But, that memory should be already allocated and really be there

julia> A .= 200*rand(360,180,1200);;

julia> A .+= 1;

julia> A .+= 1.0;

julia> @time capnan0!(A, 100); @time capnan0!(A, 100); @time capnan0!(A, 100);
  0.267676 seconds (9 allocations: 311.902 MiB)
  0.072806 seconds (8 allocations: 9.274 MiB)
  0.074731 seconds (8 allocations: 9.274 MiB)

julia> @time capnan0!(A, 100); @time capnan0!(A, 100); @time capnan0!(A, 100); # values are changed
  0.072465 seconds (8 allocations: 9.274 MiB)
  0.072763 seconds (8 allocations: 9.274 MiB)
  0.072659 seconds (8 allocations: 9.274 MiB)

Doesn’t the first run perform all the assignments by replacing the values with NaNs and the following runs have “nothing” to do?

But even the first run should not allocate an array of the full size.
Everything is done in-place.

See this, that works but is doing the same!

julia> capnan0!(A,B,v) = B[A .> v] .= NaN
capnan0! (generic function with 2 methods)

julia> A = randn((128, 512, 256));

julia> B = copy(A);

julia> capnan0!(A, B, 100);

julia> @time capnan0!(A, B, 100); A .= B; @time capnan0!(A, B, 100); A .= B; @time capnan0!(A, B, 100); #
  0.018046 seconds (8 allocations: 2.004 MiB)
  0.014085 seconds (8 allocations: 2.004 MiB)
  0.014415 seconds (8 allocations: 2.004 MiB)

There are some aliasing bugs which might occur also here?

EDIT: The mechanism maybe works in the second run since we do no assignments then.

1 Like

Yes, but the problem is, why when you refill the Array with random values the first run takes a lot of memory again? What is the extra 300MiB allocation?

julia> A = 200 .* rand(360,180,1200);

julia> @time capnan0!(A, 100); @time capnan0!(A, 100); @time capnan0!(A, 100);
  0.733982 seconds (2.07 M allocations: 398.443 MiB)
  0.062742 seconds (8 allocations: 9.274 MiB)
  0.062664 seconds (8 allocations: 9.274 MiB)

julia> A .= 200 .* rand(360,180,1200);

julia> @time capnan0!(A, 100); @time capnan0!(A, 100); @time capnan0!(A, 100);
  0.266805 seconds (9 allocations: 305.897 MiB, 17.05% gc time)
  0.063220 seconds (8 allocations: 9.274 MiB)
  0.066316 seconds (8 allocations: 9.274 MiB)

julia> capnan0!(A,v) = A[A .> v] .= NaN;

julia> A .= 200 .* rand(360,180,1200);

julia> @time capnan0!(A, 100); @time capnan0!(A, 100); @time capnan0!(A, 100);
  0.231621 seconds (9 allocations: 305.903 MiB)
  0.063112 seconds (8 allocations: 9.274 MiB)
  0.062455 seconds (8 allocations: 9.274 MiB)
2 Likes

I created now this issue.

Thanks for helping :slight_smile:

2 Likes