Fastest way to check if condition is true

Hello,
I’m looking for the fastest way to check whether this condition is true:
minimum(broadcast(abs,broadcast(-,P,Ds[i]))) == 0

here P is an Array of integers and Ds[i] an integer of another array inside a loop I want to check P against. I first used size(findall(x->x==Ds[i],P))[1] > 0, but it seemed the first method is faster.

Now I’m wondering if it can be even done better.
Thanks

1 Like

Don’t be afraid to use a for loop.


function test_equal(dsi, p)
    for p in P
        p - dsi != 0 && return false  
    end
    return true
end

You can also use any which probably has the same, or slightly more efficient, implementation as above

any(t -> t > Ds[i], P)
2 Likes

you probably meant
any(t-> t==Ds[i],P)
or?

yes, depending on the function you want.

Also note that you can put code in mono-spaced font by using ` around it for short snippets. and blocks, use three ` characters.

```
your code here
```

Thanks. I tried your example with any(), but

minimum(broadcast(abs,broadcast(-,P,Ds[i]))) == 0

seems faster.

How are you benchmarking? Bake you sure you aren’t benchmarking in global scope, use the @btime macro from the BenchmarkTools package.

I’m using @time. Any difference to @btime? What do you mean by global scope?

A correct way to do benchmarking is to use @btime. It evaluates the function many times and reports the minimum time taken.

The REPL is the Global Scope. See #1 here

Ideally, you want to be wrapping whatever you want to time in a function and then timing that function. For example, to time the function written by @pdeffebach you should do

@btime test_equal($dsi, $p)

The $ sign is a convention to interpolate external variables in the benchmark expression. See below for more details

1 Like

Hi! First of all, you shouldn’t write broadcasting like this. This is what the dot-notation is for: More Dots: Syntactic Loop Fusion in Julia

Instead, you should write it like this:

minimum(abs.(P .- Ds[i])) == 0

This is more readable, and also faster, since your code first creates a temporary array with the Ds[i] subtracted from P, and then a new one with the absolute values. Then finally your code iterates over the entire result vector to find the minimum, and compares it to zero.

So this is clearly inefficient (and the example with the dots is also inefficient, though slightly less so.)

The correct answer is to use

any(==(Ds[i]), p)  # ==(y) is the same as x -> (x==y)

This does not create any temporary arrays, and will stop the moment it finds a match.

Benchmarking: Always use BenchmarkTools for serious benchmarking, and especially for code with short run-times. Here are some timing examples:

julia> p = rand(1:1000, 1000);

julia> dsi = rand(1:1000);

julia> @btime minimum(broadcast(abs, broadcast(-, $p, $dsi))) == 0
  1.467 μs (2 allocations: 15.88 KiB)
false

julia> @btime minimum(abs.($p .- $dsi)) == 0
  904.152 ns (1 allocation: 7.94 KiB)
false

julia> @btime any(==($dsi), $p)
  402.750 ns (0 allocations: 0 bytes)
false

The code with dots fuses the operations, and you get half the number of allocations, and also better performance. The any code allocates zero memory.

As you can see there is no match here, so this is the worst case performance scenario for the any code. Let’s insert a match early in the vector to see how that affects performance:

julia> p[5] = dsi;  # early in the vector

julia> @btime minimum(broadcast(abs, broadcast(-, $p, $dsi))) == 0
  1.499 μs (2 allocations: 15.88 KiB)
true

julia> @btime minimum(abs.($p .- $dsi)) == 0
  898.548 ns (1 allocation: 7.94 KiB)
true

julia> @btime any(==($dsi), $p)
  3.386 ns (0 allocations: 0 bytes)
true

No improvement for the broadcasting versions, but the any version finds the match almost immediately and bails out.

This example showcases some general performance guidelines in Julia: Don’t create unnecessary arrays, try to iterate over the data as few times as possible, and bail out once you have the answer.

(Actually, this is mostly true for all programming languages, though there are some, like Matlab or numpy, where calling into a fast library function offsets the time-waste of creating redundant temporary arrays. In Matlab, for example, I would have implemented this as any(p == dsi), even though it makes an extra temporary vector.)

7 Likes

Thanks. I will try when Julia is properly working again. I just updated to 1.3 and SharedArrays doesn’t work anymore? Any idea?

when I type
p=SharedArray{Int64}(10)

it gets stuck loading.

Besides some minor questions: Why are you using brackets as in

any(==(Ds[i]), p)

is there a specific reason? Also: In your code you prefix it with $. Why? I can not open the Julia.pdf manual since it somehow doesn’t work with the adobe I’m using.

Sorry, I haven’t used SharedArrays in a long time.

Are you referring to the parentheses after the equal signs? ==(x) creates a function that checks for equality with x, so you do need the parentheses. Check out the documentation by typing

julia> ?==

It’s the same as writing v -> v==x.

The $ are there to avoid the problems with global variables in benchmarking. Check out the docs of BenchmarkTools here: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions

1 Like

Are the problems with global variables only there when benchmarking? I mean why is the problem not present when just running the code without benchmarking anything? Shouldn’t this be the relevant case to test?

It’s the same problem also outside of benchmarking. You should avoid non-constant global variables, for example by wrapping your code in functions.

It can be relevant to measure how long it takes to run some code on globals from the command line. But mostly, code snippets like any(==(Ds[i]), p), will be called numerous times from inside some other piece of code, somewhere in a function, and you want to know how long it will under those circumstances. When you run code interactively from the command line, it doesn’t much matter if it takes 1.4 μs or 4 ns.

1 Like

Just to understand it right: So when calling any(==(Ds[i]),P) inside a for loop, the variables Ds[i] and P do not need a prefix $, because they are not globals? But P and Ds are globally defined?! I’m confused.
Why are globals so slow?

Inside a function, not a loop. And this way of using $ for variable interpolation is specific to the BenchmarkTools package.

The problem is with non-const globals, because the compiler cannot be sure about their types and has to produce code that can handle any type.

Take a look at the performance tips in the manual, it’s highly informative: https://docs.julialang.org/en/v1/manual/performance-tips/index.html

1 Like

Thanks for the link.

So when Ds and P are constants why do they need a $prefix in benchmarking, wasn’t it only for non-constant globals?

Lets say the array P contains values from a function p(n) with n=1…N. Is it possible to call something like
any(==(Ds[i]),p) with the function p?

Yeah, it’s not necessary for const variables. That’s why it’s not necessary to interpolate in the the function names, such as any, ==, findall etc.

You could write

any(n -> p(n)==Ds[i], 1:N)

hm yeah, but that is actually slower.

How do I check if P or Ds are global or constant?

Should I always prefix any array with “const” ?

Slower than what? Than using an already pre-computed array of the values p(n)? I would expect so. Depends on how expensive the function evaluations are, but it’s likely to be more expensive than a simple array lookup.

Yes, but you said I should use functions and no global variables, thats why I tried.