Blog post: Loop fusion and vectorization in Julia 0.6

Steven, I certainly agree that it is much better to allocate many small temporary objects on the heap rather than many large temporary objects. The new broadcast operations go a long way to eliminate heap temporaries. The lazy-operation/smart-getindex trick apparently can further boost the new syntax so that it is possible to replace large allocations for temporaries with small ones in some use cases. But it would be even better to eliminate heap allocations for temporaries. Performance is one reason. Another reason to eliminate heap allocations, even small ones, is that a user trying to find the cause of a large memory allocation would have an easier job if there aren’t many small allocations acting as distractions.

Since the core developers are already apparently putting some effort into finding cases when it is safe to allocate temporaries on the stack instead of the heap, I am suggesting that this is one case (a small temporary whose purpose is to support a smart getindex operation inside a broadcast) that deserves attention.

We want to have a generic solution for allocating small temporary objects on the stack (https://github.com/JuliaLang/julia/pull/12205). On the one hand, that means that the eventual solution should fix your particular problem. On the other hand, I don’t think we’ll want to implement any special-case solution just for this particular problem in the short term.

3 Likes

I’ve just pushed a PR to implement @. (https://github.com/JuliaLang/julia/pull/20321). Undotted calls are supported via $ splicing syntax.

6 Likes

Nice!

The section “broadcast vs. map” is useful. But I’m wondering if there are any easy rules for performance of broadcast vs. map? e.g. if your array/function is of a particular complexity then use one or the other to get the fastest-running code.

In the cases where they do the same thing, e.g. abs.(x) vs map(abs, x), the performance should be essentially the same.

3 Likes

Nice blog post and I agree this is killer feature. In the “Broadcast vs. Map” section and it says, “map requires all arguments to have the same length” but I noticed in the 0.6-beta REPL

julia> map(+, (1,2), 3)
1-element Array{Int64,1}:
 4

julia> length((1,2))
2

julia> length(3)
1

which violates this rule. Is this expected or a bug?

Thanks,
Bob

PS In case it matters:


julia> versioninfo()
Julia Version 0.6.0-pre.beta.0
Commit bd84fa1bad (2017-03-31 12:58 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

This stems from the fallback method:

map(f, iters...) = collect(Generator(f, iters...))

and the Generator type allows iterators to have unequal lengths. I think this was designed to allow infinite iterators to be combined with finite ones, but the silent truncation seems undesirable to me when all the iterators have lengths.

Closely related to the discussion at Do not truncate zip inputs · Issue #20499 · JuliaLang/julia · GitHub

@stevengj Just came across your blog post. It is very clear and enlightening. Thank you!

Just saw this today. Many people on the list probably saw it too…

1 Like