Blog post: Loop fusion and vectorization in Julia 0.6

Stephen_Vavasis · January 26, 2017, 5:54pm

Steven, I certainly agree that it is much better to allocate many small temporary objects on the heap rather than many large temporary objects. The new broadcast operations go a long way to eliminate heap temporaries. The lazy-operation/smart-getindex trick apparently can further boost the new syntax so that it is possible to replace large allocations for temporaries with small ones in some use cases. But it would be even better to eliminate heap allocations for temporaries. Performance is one reason. Another reason to eliminate heap allocations, even small ones, is that a user trying to find the cause of a large memory allocation would have an easier job if there aren’t many small allocations acting as distractions.

Since the core developers are already apparently putting some effort into finding cases when it is safe to allocate temporaries on the stack instead of the heap, I am suggesting that this is one case (a small temporary whose purpose is to support a smart getindex operation inside a broadcast) that deserves attention.

stevengj · January 26, 2017, 6:51pm

We want to have a generic solution for allocating small temporary objects on the stack (https://github.com/JuliaLang/julia/pull/12205). On the one hand, that means that the eventual solution should fix your particular problem. On the other hand, I don’t think we’ll want to implement any special-case solution just for this particular problem in the short term.

stevengj · January 30, 2017, 6:38am

I’ve just pushed a PR to implement @. (https://github.com/JuliaLang/julia/pull/20321). Undotted calls are supported via $ splicing syntax.

jgreener64 · February 2, 2017, 1:29pm

Nice!

The section “broadcast vs. map” is useful. But I’m wondering if there are any easy rules for performance of broadcast vs. map? e.g. if your array/function is of a particular complexity then use one or the other to get the fastest-running code.

stevengj · February 2, 2017, 1:38pm

In the cases where they do the same thing, e.g. abs.(x) vs map(abs, x), the performance should be essentially the same.

bobportmann · April 11, 2017, 2:57pm

Nice blog post and I agree this is killer feature. In the “Broadcast vs. Map” section and it says, “map requires all arguments to have the same length” but I noticed in the 0.6-beta REPL

julia> map(+, (1,2), 3)
1-element Array{Int64,1}:
 4

julia> length((1,2))
2

julia> length(3)
1

which violates this rule. Is this expected or a bug?

Thanks,
Bob

PS In case it matters:


julia> versioninfo()
Julia Version 0.6.0-pre.beta.0
Commit bd84fa1bad (2017-03-31 12:58 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

stevengj · April 11, 2017, 3:29pm

This stems from the fallback method:

map(f, iters...) = collect(Generator(f, iters...))

and the Generator type allows iterators to have unequal lengths. I think this was designed to allow infinite iterators to be combined with finite ones, but the silent truncation seems undesirable to me when all the iterators have lengths.

Closely related to the discussion at Do not truncate zip inputs · Issue #20499 · JuliaLang/julia · GitHub

benninkrs · May 4, 2017, 2:18am

@stevengj Just came across your blog post. It is very clear and enlightening. Thank you!

purplishrock · May 4, 2017, 5:39am

Just saw this today. Many people on the list probably saw it too…

Topic		Replies	Views
Newbie question about loop fusion and broadcasting New to Julia	9	999	June 12, 2020
Julia matrix-multiplication performance Performance linearalgebra	20	8611	October 30, 2022
Interesting post about SIMD dot product (and cosine similarity) Offtopic performance	17	852	December 2, 2024
Alternate BLAS libraries? General Usage blas	22	2896	July 4, 2020
Elementwise multiplication of arrays across many cores General Usage parallel	5	2331	April 14, 2017

Blog post: Loop fusion and vectorization in Julia 0.6

Related topics