Replacing some elements of an array with corresponding elements from another array

leon · October 23, 2021, 1:43am

My goal is to replace any elements in array A that are either NaN or <0 with the corresponding elements in Array B.

Below is my code:

A = [1, 2.1, -5, 6, -3, 8, NaN];
B = [8, 8.2,  8, 8,  8, 8, 8];
C = replace!(x -> (x<0 || isnan(x)) ? B : x, A)

Here is the error:

MethodError: Cannot `convert` an object of type Vector{Float64} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  ...

Why isn’t it working? Many thanks!

JeffreySarnoff · October 23, 2021, 2:45am

Here is one way to do that.

A = [1, 2.1, -5, 6, -3, 8, NaN];
B = [8, 8.2,  8, 8,  8, 8, 8];

isreplaceable(x) = x < 0 || isnan(x)

function replacewith!(A::AT, B::AT, predicate) where
                      {N, T, AT<:AbstractArray{T,N}}
   here = map(predicate, A) 
   A[here] .= B[here]
   A
end

# and use this way
replacewith!(A,B, isreplaceable);

# A becomes [1.0 , 2.1,  8.0,  6.0,  8.0,  8.0,  8.0]

Be that as it may, use @DataFrames approach (next).

DataFrames · October 23, 2021, 3:36am

use loop

isreplaceable(x) = x < 0 || isnan(x)
function replacewith!(A, B, predicate)
        for i in 1:length(A)
             if isreplaceable(A[i])
                 A[i] = B[i]
             end
      end
     A
end

rafael.guerra · October 23, 2021, 9:07am

Possibly because x iterates over the scalar elements of A and that code tries to replace such elements by full B arrays.

A comprehension could be used:

C = [a < 0 || isnan(a) ? b : a for (a,b) in zip(A,B)]

sijo · October 23, 2021, 9:28am

Another option:

inds = findall(x->x<0 || isnan(x), A)
A[inds] .= B[inds]

I wish we could write findall(<(0) || isnan, A)

rafael.guerra · October 23, 2021, 9:34am

One could even get rid of findall:

inds = (A .< 0) .| isnan.(A)
A[inds] .= B[inds]

sijo · October 23, 2021, 9:36am

I wonder though if it’s maybe less efficient, since it then has to loop over all the false values too, in the assignment?

leon · October 23, 2021, 10:01am

Many thanks for all the solution. My data has millions of rows and I’m able to test their performance:

This is the fastest way of doing this (106 seconds):

B= replace!(x -> (x<0 || isnan(x)) ? 0 : x, B);
A = [a < 0 || isnan(a) ? b : a for (a, b) in zip(A, B)];

This is the 2nd fastest (196 seconds):

inds1 = findall(x -> x<0 || isnan(x), B);
B[inds1] .= 0;

inds2 = findall(x -> x<0 || isnan(x), A);
A[inds2,20] .= B[inds2]

This is the slowest method (330 seconds):

B[B.<0 .| isnan.(B)] .= 0;

Ind2 = map(A) do a
    a < 0 || isnan(a);
end
Ind2 = vec(Ind2);
if ~isempty(Ind2)
    A[Ind2] = B[Ind2];
end

rafael.guerra · October 23, 2021, 10:47am

@leon, it looks like you’ve changed the original problem and now B also has negative values and NaNs. This may confuse the logic of the post a bit.

lmiq · October 23, 2021, 11:26am

The fastest way is quite certainly a loop, as suggested above. Maybe something like this is close to optimal:

julia> function replacewith!(A, B, isrepleceable)
           for i in eachindex(A)
               @inbounds A[i] = ifelse(isrepleceable(A[i]), B[i], A[i])
           end
           return A
       end
replacewith! (generic function with 1 method)

julia> @btime replacewith!(A,B,x -> x < 0 || isnan(x)) setup=(A = [1, 2.1, -5, 6, -3, 8, NaN]; B = [8, 8.2,  8, 8,  8, 8, 8]);
  7.139 ns (0 allocations: 0 bytes)

To compare with, for example:

julia> function rep(A,B)
       B= replace!(x -> (x<0 || isnan(x)) ? 0 : x, B);
       A = [a < 0 || isnan(a) ? b : a for (a, b) in zip(A, B)];
       end
rep (generic function with 1 method)

julia> @btime rep(A,B) setup=(A = [1, 2.1, -5, 6, -3, 8, NaN]; B = [8, 8.2,  8, 8,  8, 8, 8]);
  39.938 ns (1 allocation: 144 bytes)

Or maybe this is a better benchmark:

julia> A = rand(10000); B = rand(10000);

julia> @btime replacewith!(a,b,x -> x < 0.1) setup=(a = copy(A); b = copy(B)) evals=1;
  2.080 μs (0 allocations: 0 bytes)

julia> @btime rep(a,b) setup=(a = copy(A); b = copy(B)) evals=1; # adapted inner conditional
  23.956 μs (2 allocations: 78.20 KiB)

leon · October 23, 2021, 12:06pm

Wow, this is fascinating! I was always told that a loop will be the slowest way of handling anything. This is so interesting to know.

LaurentPlagne · October 23, 2021, 1:27pm

Loops are the lowest abstraction here : i.e. closest to the actual hardware process.
Loops are only slow when you compare interpreted loops (e.g. CPython or Matlab) with compiled libraries.
Because Julia loops are compiled, there should be no way to get faster with more abstract expressions (considering sequential counterparts).

This does not imply that abstract expressions should be systematically avoided : they can be more compact, less error prone and even allow for parallel implementation on various hardware.

lmiq · October 23, 2021, 5:09pm

Getting used to loops is also sort or liberating, because you don’t need to remember, learn, or search, for a function that does what you want every time. Just write what is logically natural and move forward. The impossibility of doing this (and get fast code) is what always kept me away from other high level languages, here you don’t need to adapt the logic to the language.

Seif_Shebl · October 23, 2021, 6:19pm

And you can enjoy a nother 7X speedup using @Elrod’s magic (aka LoopVectorization):


using LoopVectorization

function replacewith!(A, B)
    @tturbo for i in eachindex(A) 
        A[i] = (A[i] < 0) | isnan(A[i]) ? B[i] : A[i]
    end
    return A
end

A = rand(10000); B = rand(10000);

@btime replacewith!(a,b) setup=(a = copy(A); b = copy(B));
  956.098 ns (0 allocations: 0 bytes)

lmiq · October 23, 2021, 6:35pm

Using IfEelse.ifelse here may be even faster.

leon · October 23, 2021, 6:56pm

Very cool. Thanks for sharing.

Is this kind of like parallel computing? BTW, my computer has a 8-Core Intel Core i9 process. What is the best way to make sure all of the 8 cores will be put into good use, but will not crash my computer?

Seif_Shebl · October 23, 2021, 7:05pm

Yes, that’s multithreading, a kind of light parallelization method. My results are on a 4-core machine and the number of threads is set to 8, for your processor the number of threads can be set to 16 and you can get more than 10X speedup. The number of threads can be set at the Julia command line as -t auto. You can find this info in the Julia manual:

-t, --threads {N|auto} Enable N threads; auto currently sets N to the number of local CPU threads but this might change in the future

Elrod · October 23, 2021, 10:57pm

LoopVectorization will do that automatically.

Topic		Replies	Views
How to replace all values in an array with missing values? General Usage question , arrays , nan , missing-values	43	6656	April 15, 2023
Debugging a script to replace NaN or -999 values with missing values General Usage question	6	361	October 22, 2021
Replace int element from another element keeping the original General Usage array , arrays	1	281	August 23, 2021
How to replace/set/change value in array/dataframe if value is less/greater than? New to Julia	3	5357	December 1, 2020
Change type from Any to actual types General Usage	14	1335	December 13, 2020

Replacing some elements of an array with corresponding elements from another array

Related topics