I’m excited to get started with Julia as a long-time Python user, tired of typing np.array. One thing that is really exciting to me is the ability to avoid writing my own ufuncs, especially avoiding the slow but easy to write frompyfunc ones.
One thing that is confusing me right now is the difference between, say, * and .*. Say I let x = randn(10), and I want to compute the outer product: x * x', aka x .* x'. These are of course the same, but from what I can tell they have different performance characteristics when combined with other broadcasted operations – the .* form can be fused with other broadcasted operations while * cannot.
However, to me, it seems like both of these are doing broadcasting, and that there is no room to interpret them differently from a mathematical point of view. So, is there an intuition for what underlying factor makes the difference for loop fusion? Is there just some loop fusion switch that gets turned on only when a . is present?
Is there just some loop fusion switch that gets turned on only when a . is present?
Yes, that’s precisely it. Broadcast’s loop fusion is a syntax thing. While Julia’s compiler could maybe someday get smart enough to look inside the * method to see that it’s “just” a broadcast — and thus can avoid allocating its temporary, it requires a rather complex escape analysis to make sure that this intermediate isn’t stored/used/re-used/changed/etc and a complex function analysis to ensure that it actually is doing something that is broadcast-compatible.
On the other hand, just looking at an expression and combining the .s is a very simple pass.
Thanks all for the interesting responses! I can tell this is a nice community. The “More Dots” article was especially helpful.
To make sure I’m getting it: that article says that dot-calls are sugar for broadcast. But, if I’m understanding right, it also seems to indicate that dots are explicitly required in order for loop fusion to kick in. So, although (as @klaff writes) x .* x' and x * x' both end up mapping onto calls to broadcast, only the former will trigger loop fusion (if there were more vectorized operations involved). Is that the right idea?
Yes, that’s exactly right. For example: v .* v' .+ 2 is effectively the same as:
vt = v'
broadcast((x, xt)->x*xt + 2, v, vt)
Had I written this as v * v' .+ 2, then we’d first compute the result of * (regardless of the implementation of *) and then do the broadcasted addition, effectively:
temp = v * v'
broadcast(x->x+2, temp)
The consecutively-connected dots are how we define what goes into that inner “kernel” (the anonymous function in the cartoon equivalence above). To make this even more clear, you can compare the result of zeros(5) .+ rand() vs. zeros(5) .+ rand.().
You’re welcome! But also, you asked a very astute question. I thought I understood dot fusion but didn’t really understand the purely syntactical nature of it.
This is still making my brain hurt so here’s a question:
I think zeros(5) .+ rand() is equivalent to (+).(zeros(5), rand()) is equivalent to broadcast(+, zeros(5), rand()).
Is this the right way to think about this one? zeros(5) .+ rand.() is equivalent to (+).(zeros(5),rand.()) is equivalent to (x -> x+rand()).(zeros(5)) is equivalent to [Edit - added this line] broadcast(x -> x+rand(), zeros(5))?