I know there are a number of topics already open on this subject, but I haven’t found this specific pattern yet. I think I’m look for the map do equivalent of conditional array comprehension.
Here’s a small example and why it isn’t working
a = rand(10)
f = exp # or any other function
b = map(a) do x
c = f(x)
if c > 2
return c
end
end
filter(!isnothing, b)
The problem with this approach is that the final type is Vector{Union{Nothing, Float64}} when I need it to simply be Vector{Float64} for subsequent steps.
If I use conditional array comprehension the type is correct, but I need to repeat the calculation of the intermediate variable c = f(x) which is more complicated in my use case.
b = [f(x) for x in a if f(x) > 2]
Is there a third way that returns the correct type without requiring I calculate c = f(x) twice?
using BenchmarkTools
a = rand(1000000)
f = exp # or any other function
b1 = @btime [$f(x) for x in $a if $f(x) > 2]
b2 = @btime [fx for fx in ($f(x) for x in $a) if fx > 2]
yielding
14.796 ms (14 allocations: 5.23 MiB)
14.368 ms (14 allocations: 5.23 MiB)
but maybe we need to use a more expensive function to see an effect.
Thanks for the help, everyone. I realize that my MWE was a little too minimum. Here’s a better one below.
function foo(x) # expensive function
for _ in 1:1000
x = sqrt(x^2)
end
return x
end
function bar(x) # more expensive function
for _ in 1:10000
x = sqrt(x^2)
end
return x
end
# desired functionality
b0 = map(a) do x
c = foo(x)
if c > 0.5
return bar(c)
end
end
filter(!isnothing, b0)
@goerch’s and @Seif_Shebl’s suggestions are still applicable and give the following
a = rand(100)
b1 = @btime [$bar($foo(c)) for c in $a if $foo(c) > 0.5]
b2 = @btime [$bar(c) for c in ($foo(x) for x in $a) if c > 0.5]
b3 = @btime $bar.(filter!(>(0.5), $foo.($a)))
for
3.304 ms (5 allocations: 1.98 KiB)
3.097 ms (5 allocations: 1.98 KiB)
3.031 ms (2 allocations: 1.41 KiB)
Unfortunately, I can’t assume my function is invertible @rocco_sprmnt21.
Is there a more efficient way to use filter! in this context?
In the case that you can’t filter out data before applying foo, this is about as efficient as it gets, since the time of calling those function wastly dominates the time it takes to make a few allocations along the way:
# since we want only the time necessary for execution but not for allocating, we need something to prevent calls from being compiled away entirely
julia> @noinline noop(x) = x
noop (generic function with 1 method)
# time it takes for the foo and bar calls only:
julia> @btime for x in $a
noop(foo(x))
end
511.200 μs (0 allocations: 0 bytes)
julia> barinputs = filter!(>(0.5), foo.(a));
julia> @btime for x in $barinputs
noop(bar(x))
end
2.552 ms (0 allocations: 0 bytes)
In case you worry about the allocations: if you are fine with overwriting your input a, you can get rid of those, but unless the immediate values are huge in storage, the net gain is negligible:
julia> function foowithoutallocations!(a)
@inbounds for i in eachindex(a)
a[i] = foo(a[i])
end
a
end
foowithoutallocations! (generic function with 1 method)
julia> function barwithoutallocations!(a)
@inbounds for i in eachindex(a)
a[i] = bar(a[i])
end
a
end
barwithoutallocations! (generic function with 1 method)
julia> b4 = @btime barwithoutallocations!(filter!(>(0.5), foowithoutallocations!(acopy))) setup = (acopy=copy(a));
3.025 ms (0 allocations: 0 bytes)
I’d throw in this for readability:
julia> b5 = @btime [bar(c) for c in foo.($a) if c > 0.5];
3.083 ms (6 allocations: 2.86 KiB)