What is the future of `sin(::Vector)` when there is `sin.(::Vector)`?

Since - as I understand it - the new broadcast sugar is the new way of computing the element-wise value of a function I wonder what the plan is with the special vectorized versions, such as sin(::Vector) for example. Is the plan that they will continue to work as is or are there thoughts about repurposing the signature?

I am asking because I contemplate how to go about this with LossFunctions.jl and I do try to keep things “julian”. Personally, at least for my use-case, I am considering making the special old version return a generator expression (in the lines of (sin(x[i]) for i in eachindex(x)) to go with the sinus example), which would allow me in my use-case to do mean(sin(x)), which for loss functions would be quite nice. So maybe my second question, if you grand me this little digression, is that something that would be considered un-julian?

All vectorized functions like sin(::Vector) are going to be deprecated, except for very specific cases. You can write mean(sin(x) for x in X) to avoid allocating a new vector.

2 Likes

Those specific cases being the cases where it’s faster to have a specific method for vector computations (aren’t special functions able to be computed faster in vectorized forms using SIMD and other tricks? Will these tricks be added to Amal.jl and be part of the standard math library?)? Will there be a good documented way to know what the exceptions will be?

Wow, that ended up being a lot of questions that’s a bit more general than the title.

I can work with that. Thanks.

If people aren’t expecting the function without . to be vectorized then I have a simpler time repurposing the signature (for the functions of the package, that is). Well, the one paramater sin example isn’t the best example for my use-case I admit. For my use-case it would be mean(loss(y,yhat) for (y,yhat) in zip(Y, Yhat)) or something like that, which is a handful and a bit much to expect a user to bother with.

Edit: I am not quite sure how to “accept” your answer or I would do so.

See also Deprecate functions vectorized via `@vectorize_(1|2)arg` in favor of compact broadcast syntax by Sacha0 · Pull Request #17302 · JuliaLang/julia · GitHub and subsequent PRs cited in this one.

I hope not: sin([1,2,3]) makes no sense from the Taylor series definition, regardless of if there’s a fast way to broadcast.

If there’s a faster specific broadcast, then one could overload

function broadcast(f::typeof(sin),x::Vector)
# do fast broadcast
end

so that sin.([1,2,3]) is automatically fast.

Unfortunately, you have to be careful in overloading broadcast based on functions since dot-call fusion will result in any number of complex signatures and the unknown type of some anonymous function created by the parser… so you need to be sure your optimized method produces identical output to before you defined it (e.g. including that the outputted storage container can’t depend on the function in any robust way) and be aware the your optimization will only be used in a vanilla call to broadcast.

I’d love to see a more modular system, but I have no way of going about it.

I don’t think you’d want the optimization used in fused broadcasts to avoid allocation, so the desired behaviour is that exp.(sin.([1,2,3])) would revert to the default broadcast behaviour.

If a user is expected to be aware that sin([1,2,3]) exists and is faster than sin.([1,2,3]), then they’d be capable of unfusing a call if desired: exp.((sin.([1,2,3]))) would use the fast broadcast. (Assuming here that fusing doesn’t dig through parentheses.)

I don’t think you’d want the optimization used in fused broadcasts to avoid allocation

Probably true in any practical case.

I think I’ve taken a slight tangent here: what I am saying is that broadcast can dispatch on a function, but usually shouldn’t (or should only in a restricted set of circumstances). However, the logic of broadcast depends only on the storage contains inputted (and the return type of the function). Base has internal functions like broadcast_c (“broadcast container”) and so-on to perform this logic but I’m still working my way through it to see if I can find ways to hook into this system more neatly than StaticArrays.jl and Switches.jl currently do. Something modular and well-documented (like the promotion system & promote_rule) here would be fantastic!

To the topic - it would be even better if the modular system specified how to specialize your (scalar) function to run faster over collections of inputs (e.g. could become stateful to eliminate redundant computations, or have some SIMD friendly signature, or whatever). EDIT: a good system would compose correctly such that SIMD friendliness is maintained even if automatic fusion occurs.

OK, I don’t know enough about broadcast overloading to comment.

My primary point is that f([1,2,3]) should only be implemented when it makes sense for the function in question. So the ‘except for very specific cases’ should not include something like exp([1,2,3]) as the following makes no sense:

\sum_{k=0}^\infty {[1,2,3]^k \over k!}

and is the reasonable definition for exp (well, not counting Cauchy integral formulas).

1 Like

What about if a pre-allocated version exists, then it won’t fuse?

sin!(out,[1 3 5])

The reason I’m thinking about this is because things like GPU libraries will definitely want to use functions on vectors, and a pre-allocated version probably works in any of these cases?

FWIW, I suspect this line of thinking is eventually going to lead to the conclusion that many optimizations are not possible unless your full vector-level computation is represented as a plan for a computation that you can optimize as a whole since you need to exploit the way in which functions interact for maximum performance.

Building such a system would be an amazing win for Julia, but I feel like the simplicity of how broadcast works now is a virtue for functionality that goes in Base since it’s reasonable for a good programmer to learn how broadcast works in full detail.

Isn’t sin!(out,[1 3 5]) equivalent to

out .= sin.([1 3 4])

It seems like with the right overrides this syntax would translate to GPU arrays.

Yes, that’s the issue though. As pointed out, your solution of:

function broadcast(f::typeof(sin),x::Vector)
# do fast broadcast
end

doesn’t necessarily work because, if I understand correctly, when broadcasts are fused like in y .= sin.(x) + cos.(x), the f that is called is an anonymous function f = (x)->sin(x)+cos(x) and so writing an override for out .= sin.([1 3 4]) doesn’t easily generalize to cases where broadcast fusing will take place.

Here’s a way of looking at it. When you don’t have optimized vector forms, on the scalar form what you’d want to do is:

f = (x) -> sin(x) + cos(x)
y .= f.(x)

But you can also avoid temporaries by using non-allocating sin! and cos!, and then add them. (But this gets difficult because you need a cache). So in the case where you want to use optimized vector functions, you’d want to do something like:

sin!(cache1,x)
cos!(cache2,x)
y.= cache1 .+ cache2

Somehow we want something like y .= sin.(x) + cos.(x) to mean both of these, depending on whether a “fast vectorized form” exists. I am not sure that this could be done automatically (since how else could caches be specified?), so I am suggesting that

y .= sin!(cache1,x) .+ cos!(cache2,x)

would have to be the way to go. The RNG already do something like this so there’s precedent. So I would suggest that, since any optimized vector form is for performance, the only form that should really need to be present is the mutating version. It would be easy to document too because it would be a different function. At least placeholders for common functions like sin! should be exported from Base so that way libraries like GPU libraries could extend it. I think that’s still pretty clean and very Julia syntax.

@johnmyleswhite I also agree with that - I’ve been experimenting with using the Transformation of CoordinateTransformations.jl more generally for stating, planning and executing computations outside of geometry. I feel you could do something like the planning of Query.jl using higher-order functions rather than macros using this approach, and yes, you could break up a vectorized computation into a bunch of introspection-friendly steps.

@ChrisRackauckas I guess my point is that its equivalent to

broadcast!(sin,  cache1, x)
broadcast!(cos,  cache2, x)
y .= cache1 .+ cache2

Therefore, overriding broadcast!(::typeof(sin),::GPUVector,::GPUVector) and broadcast!(::typeof(cos),::GPUVector,::GPUVector) would be all that’s necessary, with no need for sin! and cos!.

If a user called

broadcast!(x->cos(x) + sin(x),cache1,x)

on GPU arrays, it would just error out.

But isn’t this what broadcast fusion will automatically do?

So the user should write broadcast!(cos,cache1,x) if they don’t want broadcast fusion

I could use additional insight.
On the one hand there is multidispatch and that had preferred

sin(x::TypeOfValue) .. end
sin(xs::Vector{TypeOfValue}) .. end
# and the uses
sin( value )
sin( values )

On the other one hand there is broadcasting and that prefers

sin(x::TypeOfValue) .. end
# and the uses
sin( value )
sin.( values )

Sometimes I do not know this:
Is my value stands alone or is a vector of values or a fixed length array of values?
In that case I would have been inclined to define sin{T}( values::Vector{T} ) == sin.(values); but that is frowned upon (and, apparently, not costless even if it were accepted).

Is there a [relatively] new perspective that things that are a self-contained thing (as if length([thing…]) ==1) should not be used interchangeably with things that are a sequence of thing (as if length([thing…]) > 1)?

sin.(1) returns sin(1), So always use sin.(x) if you want it to work on both Vector and Number interchangeably.