Fastest way to check if condition is true

Diger · December 9, 2019, 5:27pm

Hello,
I’m looking for the fastest way to check whether this condition is true:
minimum(broadcast(abs,broadcast(-,P,Ds[i]))) == 0

here P is an Array of integers and Ds[i] an integer of another array inside a loop I want to check P against. I first used size(findall(x->x==Ds[i],P))[1] > 0, but it seemed the first method is faster.

Now I’m wondering if it can be even done better.
Thanks

pdeffebach · December 9, 2019, 5:41pm

Don’t be afraid to use a for loop.


function test_equal(dsi, p)
    for p in P
        p - dsi != 0 && return false  
    end
    return true
end

You can also use any which probably has the same, or slightly more efficient, implementation as above

any(t -> t > Ds[i], P)

Diger · December 9, 2019, 5:47pm

you probably meant
any(t-> t==Ds[i],P)
or?

pdeffebach · December 9, 2019, 7:19pm

yes, depending on the function you want.

Also note that you can put code in mono-spaced font by using ` around it for short snippets. and blocks, use three ` characters.

```
your code here
```

Diger · December 9, 2019, 10:04pm

Thanks. I tried your example with any(), but

minimum(broadcast(abs,broadcast(-,P,Ds[i]))) == 0

seems faster.

pdeffebach · December 9, 2019, 10:38pm

How are you benchmarking? Bake you sure you aren’t benchmarking in global scope, use the @btime macro from the BenchmarkTools package.

Diger · December 9, 2019, 11:01pm

I’m using @time. Any difference to @btime? What do you mean by global scope?

bashonubuntu · December 9, 2019, 11:54pm

A correct way to do benchmarking is to use @btime. It evaluates the function many times and reports the minimum time taken.

The REPL is the Global Scope. See #1 here

Ideally, you want to be wrapping whatever you want to time in a function and then timing that function. For example, to time the function written by @pdeffebach you should do

@btime test_equal($dsi, $p)

The $ sign is a convention to interpolate external variables in the benchmark expression. See below for more details

https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions

DNF · December 10, 2019, 8:07am

Hi! First of all, you shouldn’t write broadcasting like this. This is what the dot-notation is for: More Dots: Syntactic Loop Fusion in Julia

Instead, you should write it like this:

minimum(abs.(P .- Ds[i])) == 0

This is more readable, and also faster, since your code first creates a temporary array with the Ds[i] subtracted from P, and then a new one with the absolute values. Then finally your code iterates over the entire result vector to find the minimum, and compares it to zero.

So this is clearly inefficient (and the example with the dots is also inefficient, though slightly less so.)

The correct answer is to use

any(==(Ds[i]), p)  # ==(y) is the same as x -> (x==y)

This does not create any temporary arrays, and will stop the moment it finds a match.

Benchmarking: Always use BenchmarkTools for serious benchmarking, and especially for code with short run-times. Here are some timing examples:

julia> p = rand(1:1000, 1000);

julia> dsi = rand(1:1000);

julia> @btime minimum(broadcast(abs, broadcast(-, $p, $dsi))) == 0
  1.467 μs (2 allocations: 15.88 KiB)
false

julia> @btime minimum(abs.($p .- $dsi)) == 0
  904.152 ns (1 allocation: 7.94 KiB)
false

julia> @btime any(==($dsi), $p)
  402.750 ns (0 allocations: 0 bytes)
false

The code with dots fuses the operations, and you get half the number of allocations, and also better performance. The any code allocates zero memory.

As you can see there is no match here, so this is the worst case performance scenario for the any code. Let’s insert a match early in the vector to see how that affects performance:

julia> p[5] = dsi;  # early in the vector

julia> @btime minimum(broadcast(abs, broadcast(-, $p, $dsi))) == 0
  1.499 μs (2 allocations: 15.88 KiB)
true

julia> @btime minimum(abs.($p .- $dsi)) == 0
  898.548 ns (1 allocation: 7.94 KiB)
true

julia> @btime any(==($dsi), $p)
  3.386 ns (0 allocations: 0 bytes)
true

No improvement for the broadcasting versions, but the any version finds the match almost immediately and bails out.

This example showcases some general performance guidelines in Julia: Don’t create unnecessary arrays, try to iterate over the data as few times as possible, and bail out once you have the answer.

(Actually, this is mostly true for all programming languages, though there are some, like Matlab or numpy, where calling into a fast library function offsets the time-waste of creating redundant temporary arrays. In Matlab, for example, I would have implemented this as any(p == dsi), even though it makes an extra temporary vector.)

Diger · December 10, 2019, 1:22pm

Thanks. I will try when Julia is properly working again. I just updated to 1.3 and SharedArrays doesn’t work anymore? Any idea?

when I type
p=SharedArray{Int64}(10)

it gets stuck loading.

Besides some minor questions: Why are you using brackets as in

any(==(Ds[i]), p)

is there a specific reason? Also: In your code you prefix it with $. Why? I can not open the Julia.pdf manual since it somehow doesn’t work with the adobe I’m using.

DNF · December 10, 2019, 1:36pm

Sorry, I haven’t used SharedArrays in a long time.

Are you referring to the parentheses after the equal signs? ==(x) creates a function that checks for equality with x, so you do need the parentheses. Check out the documentation by typing

julia> ?==

It’s the same as writing v -> v==x.

The $ are there to avoid the problems with global variables in benchmarking. Check out the docs of BenchmarkTools here: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions

Diger · December 10, 2019, 8:32pm

Are the problems with global variables only there when benchmarking? I mean why is the problem not present when just running the code without benchmarking anything? Shouldn’t this be the relevant case to test?

DNF · December 10, 2019, 8:57pm

It’s the same problem also outside of benchmarking. You should avoid non-constant global variables, for example by wrapping your code in functions.

It can be relevant to measure how long it takes to run some code on globals from the command line. But mostly, code snippets like any(==(Ds[i]), p), will be called numerous times from inside some other piece of code, somewhere in a function, and you want to know how long it will under those circumstances. When you run code interactively from the command line, it doesn’t much matter if it takes 1.4 μs or 4 ns.

Diger · December 10, 2019, 9:00pm

Just to understand it right: So when calling any(==(Ds[i]),P) inside a for loop, the variables Ds[i] and P do not need a prefix $, because they are not globals? But P and Ds are globally defined?! I’m confused.
Why are globals so slow?

DNF · December 10, 2019, 9:16pm

Inside a function, not a loop. And this way of using $ for variable interpolation is specific to the BenchmarkTools package.

The problem is with non-const globals, because the compiler cannot be sure about their types and has to produce code that can handle any type.

Take a look at the performance tips in the manual, it’s highly informative: Performance Tips · The Julia Language

Diger · December 10, 2019, 9:19pm

Thanks for the link.

So when Ds and P are constants why do they need a $prefix in benchmarking, wasn’t it only for non-constant globals?

Lets say the array P contains values from a function p(n) with n=1…N. Is it possible to call something like
any(==(Ds[i]),p) with the function p?

DNF · December 10, 2019, 9:24pm

Yeah, it’s not necessary for const variables. That’s why it’s not necessary to interpolate in the the function names, such as any, ==, findall etc.

You could write

any(n -> p(n)==Ds[i], 1:N)

Diger · December 10, 2019, 9:38pm

hm yeah, but that is actually slower.

How do I check if P or Ds are global or constant?

Should I always prefix any array with “const” ?

DNF · December 10, 2019, 9:41pm

Slower than what? Than using an already pre-computed array of the values p(n)? I would expect so. Depends on how expensive the function evaluations are, but it’s likely to be more expensive than a simple array lookup.

Diger · December 10, 2019, 9:42pm

Yes, but you said I should use functions and no global variables, thats why I tried.

Topic		Replies	Views
Push! versus preallocation New to Julia	17	2783	June 11, 2020
Detect if constant General Usage constant-propagation	49	1719	June 16, 2023
How to speed up array comparisons? Performance question	12	1105	May 20, 2018
How to make a function 120 times faster General Usage performance , scope , parameters	15	826	July 7, 2022
Substituting elements of array is very slow General Usage	23	1365	July 22, 2021

Fastest way to check if condition is true

Related topics