At present, the broadcast in julia is always lazy, no matter the inputs’ axes.
I think, at least for numeric calcution, if both the input(s) and output are scalar, we could perform the calculation immediately, as the storage of Intermediate variable is always negligible. In other words we might merge the scalars during the broadcasted
chain.
At present, a pre-defined function could merge const scalars well, like:
julia> a = randn(1000); b = randn(1000); c = similar(a,(1000,1000));
julia> f(x,y) = sin(exp(2pi)*x*y);
julia> @btime @. $c = f($a,$b');
16.042 ms (0 allocations: 0 bytes)
while, a nested Broadcasted object not:
julia> @btime @. $c = sin(exp(2pi)*$a*$b');
38.451 ms (0 allocations: 0 bytes)
Of course, we could avoid it by merge the scalars ourselves, like:
julia> temp = exp(2pi); @btime @. $c = sin($temp*$a*$b');
16.034 ms (0 allocations: 0 bytes)
But I think such operation could be done during the broadcasted
chain, just add a muti-dispatch like:
const AbstractScalar = Union{Number,AbstractArray{<:Number,0}}
broadcasted(::S, f, args::Vararg{AbstractScalar}) where {S<:BroadcastStyle}=
combine_eltypes(f, args) <: Number ? f(map(first,args)...) : Broadcasted{S}(f, args)
and
@btime @. $c = sin(exp(2pi)*$a*$b');
16.306 ms (0 allocations: 0 bytes)
Since a * b * c
will be transformed to *(a,b,c)
, the broadcasteded
call for these expandable funtion should be expanded at once, i.e.
for op in (:+, :*, :&, :|, :xor, :min, :max, :kron)
@eval begin
@inline broadcasted(::typeof($op),x,y) = begin
x′ = broadcastable(x)
y′ = broadcastable(y)
broadcasted(combine_styles(x′,y′), $op, x′, y′)
end
broadcasted(::typeof($op),x,y,args...) = begin
temp = broadcasted($op,x,y)
broadcasted($op,temp,args...)
end
end
end