Best way to vectorize a function

Hello,

I have a function like this in a package I’m currently developing:

function jtheta3(z::Number, tau::Number)
  if imag(tau) <= 0
    ArgumentError("Invalid `tau`.")
  end
  return _jtheta3(z, tau)
end

Imagine the user wants to use it for numerous values of z but always the same value of tau. Then the check of tau will be repeated and this will consume some unnecessary time. Therefore I want to vectorize this function in z. What is the best way? Doing another function with the same name but taking a vector z as argument, or using a single function with the type of z being the union of the two possible types?

I think I would go with the function that accepts both types, trying to use broadcasting inside. Maybe the function that does not check tau and can be broadcasted can be useful, if tau is checked somewhere else before. Something like:

julia> function _f(z,tau)
           z + tau
       end
_f (generic function with 1 method)

julia> function f(z, tau)
           @assert tau > 0
           _f.(z,tau)
       end
f (generic function with 1 method)

julia> f(1,3)
4

julia> f([1,2],3)
2-element Vector{Int64}:
 4
 5

julia> _f.(1,2)
3

julia> _f.([1,2],2)
2-element Vector{Int64}:
 3
 4


Looks nice. Thank you.

Is there a standard terminology for a function like the function f you defined? With R, we would say this function is vectorized. Does one use this terminology in the Julia world? Or broadcasted function? (I never saw the word “broadcast” before, I don’t know its French translation yet).

The broadcasting is specifically the . between the function and it’s args, which turns a function that takes single elements into one that takes collections. It’s very similar to what other langs call vectorization.
Unlike other languages, it’s not needed for performance, explicit for loops can be as fast or faster than broadcasted calls. (broadcasting becomes a loop anyway)

2 Likes

Some more discussion. The broadcast (.) in Julia is a very powerful syntax. For example, given this simple function that takes three arguments:

julia> f(x,y,z) = x + y + sin(z)
f (generic function with 1 method)

julia> f(1.0,2.0,3.0)
3.1411200080598674

You can apply it to three vectors just by using the .:

julia> f.([1.0,2.0],[3.0,4.0],[5.0,6.0])
2-element Vector{Float64}:
 3.0410757253368614
 5.720584501801074

Which roughly equivalent to the hand written loop:

julia> function g(x,y,z)
           w = similar(x)
           for i in eachindex(x)
               w[i] = x[i] + y[i] + sin(z[i])
           end
           return w
       end
g (generic function with 1 method)

julia> g([1.0,2.0],[3.0,4.0],[5.0,6.0])
2-element Vector{Float64}:
 3.0410757253368614
 5.720584501801074

Note

The broadcasting is actually more general because it promotes types for you - given vectors of integers it recognizes that the output is a float - and it only works if all the arrays have the same dimensions, which guarantees that all memory access are inbounds, so that the broadcast may be faster than the loop that does not have that guarantee.

So you can “broadcast” any function you write yourself, and you get the “vectorized” version of the operation.

A “vectorized” function in other languages, R for example, is something like the loop above, except that the loop is written not in R, but in a lower-level language. In Julia the loop is fine, and the broadcasted or vectorized operations are mostly written in Julia under the hood.

Finally, note that a function like the one of this example does not even need to be written explicitly, because you can just use broadcasting from the beginning:

julia> x = [1.0,2.0]; y = [3.0,4.0]; z = [5.0,6.0];

julia> x .+ y .+ sin.(z)
2-element Vector{Float64}:
 3.0410757253368614
 5.720584501801074

julia> @. x + y + sin(z)
2-element Vector{Float64}:
 3.0410757253368614
 5.720584501801074

Just add the . to every operation and you get the same result of the “vectorized” f, and that will be lowered to a fused loop as f is. The @. macro helps one to broadcast everything (for long expressions missing a . is easy in the first case).

The complication of your case is that you do not want to broadcast every operation of your function (you don’t want to check tau for every element). Yet, a small test does not show any performance difference between the two cases:

julia> function f1(z, tau)
           @assert tau > 0 ArgumentError("tau < 0")
           z + tau
       end
f1 (generic function with 1 method)

julia> function f2(z, tau)
           z + tau
       end
f2 (generic function with 1 method)

julia> z = rand(1000);

julia> @btime f1.($z, 1.0);
  437.112 ns (1 allocation: 7.94 KiB)

julia> @btime f2.($z, 1.0);
  438.706 ns (1 allocation: 7.94 KiB)

It is possible that the compiler is smart enough and only checks tau once.

2 Likes

There is a very informative post on dot syntax by its originator stevengj.

4 Likes