Julia's Broadcast vs Jax's vmap

darsnack · May 7, 2020, 11:14pm

I ran the benchmarks on the GPU:

using BenchmarkTools, CuArrays
using LinearAlgebra: dot

D = 10^3
BS = 10^2

x = randn(D)
X = randn(D, BS)
y = randn(D)
cX = cu(X)
cy = cu(y)
Xt = permutedims(X)
cXt = cu(Xt)

dot(x, y)
dot(cu(x), cy)

broadcast_dot(X, y) = [dot(x, y) for x in eachslice(X; dims = 2)]
matmul_dot(Xt, y) = Xt * y

Now running on the CPU:

@btime broadcast_dot($X, $y)
16.652 μs (108 allocations: 6.56 KiB)

@btime matmul_dot($Xt, $y)
13.867 μs (1 allocation: 896 bytes)

And on the GPU:

@btime CuArrays.@sync broadcast_dot($cX, $cy)
321.091 ms (208 allocations: 8.45 KiB)

@btime CuArrays.@sync matmul_dot($cXt, $cy)
238.609 μs (8 allocations: 208 bytes)

Of note is that the following definition did not work:

broadcast_dot(X, y) = dot.(eachslice(X; dims = 2), Ref(y))

Topic		Replies	Views
Blog post: Loop fusion and vectorization in Julia 0.6 Internals & Design announcement , broadcast	28	8420	May 4, 2017
When should I write loops or vectorised calls? General Usage	17	1791	December 1, 2020
Performance of simple broadcasting operations with many arguments Performance performance , broadcast	15	1593	November 29, 2021
Arithmetic broadcasting in Julia 5x slower than MATLAB Performance	17	1071	May 26, 2022
When to use broadcasting with . vs map General Usage broadcast	23	5267	October 4, 2022

Julia's Broadcast vs Jax's vmap

Related topics