What's the "right" way to broadcast vector-valued functions?

jlchan · October 23, 2019, 4:09pm

I’m trying to figure out the “right” way to combine broadcasting with functions for vector-valued inputs/outputs. For example, the following function defines foo through broadcasted operations.

using LinearAlgebra

function foo(a,b)
    return a.*b, a.+b
end

c,d = foo(collect(1:3),collect(0:2))

Is there a way to perform the same operation using the non-broadcast version of foo? Using foo.(a,b) with the scalar version of foo returns an array of tuples, instead of the tuple of arrays in the broadcast version.
Is it OK to just define all functions using broadcast operations (so that they work for both scalar and vectorized inputs), or will there be some performance hit?

Thanks!

Elrod · October 24, 2019, 4:35am

julia> function foo(a, b)
           a .* b, a .+ b
       end
foo (generic function with 1 method)

julia> bar(a, b) = a * b, a + b
bar (generic function with 1 method)

julia> foo(2.3, 4.5)
(10.35, 6.8)

julia> bar(2.3, 4.5)
(10.35, 6.8)

julia> @code_llvm optimize=true debuginfo=:none foo(2.3, 4.5)

define void @julia_foo_17899([2 x double]* noalias nocapture sret, double, double) {
top:
  %3 = fmul double %1, %2
  %4 = fadd double %1, %2
  %.sroa.0.0..sroa_idx = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  store double %3, double* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
  store double %4, double* %.sroa.2.0..sroa_idx1, align 8
  ret void
}

julia> @code_llvm optimize=true debuginfo=:none bar(2.3, 4.5)

define void @julia_bar_17908([2 x double]* noalias nocapture sret, double, double) {
top:
  %3 = fmul double %1, %2
  %4 = fadd double %1, %2
  %.sroa.0.0..sroa_idx = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  store double %3, double* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
  store double %4, double* %.sroa.2.0..sroa_idx1, align 8
  ret void
}

They’ll have the same runtime performance.

ranocha · October 24, 2019, 4:47am

If you want to use only the scalar version of foo, you have to destruct the Array{Tuple{Int64,Int64},1} returned by foo_sc.(a, b), e.g. like this:

julia> function foo_sc(a, b)
           a*b, a+b
       end
foo_sc (generic function with 1 method)

julia> function foo_bc(a, b)
           a.*b, a.+b
       end
foo_bc (generic function with 1 method)

julia> a, b = collect(1:3), collect(0:2)
([1, 2, 3], [0, 1, 2])

julia> function foo_destruct(a, b)
           tmp = foo_sc.(a, b)
           last.(tmp), first.(tmp)
       end
foo_destruct (generic function with 1 method)

julia> foo_bc(a, b)
([0, 2, 6], [1, 3, 5])

julia> foo_destruct(a, b)
([1, 3, 5], [0, 2, 6])

Since the data is traversed multiple times, you pay some performance penalty:

julia> using BenchmarkTools

julia> @benchmark foo_bc($a, $b)
BenchmarkTools.Trial: 
  memory estimate:  256 bytes
  allocs estimate:  3
  --------------
  minimum time:     57.775 ns (0.00% GC)
  median time:      61.215 ns (0.00% GC)
  mean time:        74.469 ns (13.24% GC)
  maximum time:     38.152 μs (99.73% GC)
  --------------
  samples:          10000
  evals/sample:     983

julia> @benchmark foo_destruct($a, $b)
BenchmarkTools.Trial: 
  memory estimate:  384 bytes
  allocs estimate:  4
  --------------
  minimum time:     79.169 ns (0.00% GC)
  median time:      81.857 ns (0.00% GC)
  mean time:        97.842 ns (12.35% GC)
  maximum time:     38.397 μs (99.64% GC)
  --------------
  samples:          10000
  evals/sample:     969

As shown be @Elrod, you won’t have a performance penalty if you use the broadcasted function for simple scalar arguments.
If I remember correctly, there has been a performance penalty for using broadcasting with StaticArrays in OrdinaryDiffEq. That’s why the non-inplace versions don’t use broadcasting while the inplace versions do use it, e.g.
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/fac6a61a827b9f25ffe3a93299904bfb48861c4a/src/perform_step/low_order_rk_perform_step.jl#L581-L582
vs.
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/fac6a61a827b9f25ffe3a93299904bfb48861c4a/src/perform_step/low_order_rk_perform_step.jl#L631-L633

jlchan · October 24, 2019, 1:34pm

Thanks @Elrod and Hendrik!

jlchan · November 6, 2019, 4:15pm

Found a related question from 2018: Destructuring and broadcast. The motivation was similar (avoid storage and allocation of intermediate terms).

Adding broadcasting to tuple functions to Base is still under development. In the meantime, I’ve found that Destruct.jl is fast and achieves what I’m looking for. The “unzip” function from this StackExchange post

unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
x = [1 2 3]
y = [2 3 4]
bar = (x,y)->(y,x)
f,g = unzip(bar.(x,y)) # usage

also seems pretty efficient.

Topic		Replies	Views
Best way to vectorize a function General Usage package	6	2576	July 6, 2022
How to broadcast over only certain function arguments? General Usage question , broadcast	15	12587	March 1, 2022
Broadcasting call to a vector-valued function Performance broadcast	4	319	November 11, 2023
Broadcasting a scalar New to Julia broadcast	4	1076	May 8, 2017
Destructuring and broadcast General Usage question	12	4246	May 2, 2021

What's the "right" way to broadcast vector-valued functions?

Related topics