It looks strange to me, too, and I think it was a particularly poor choice in the design of the Julia language. And, quite frankly, I consider the “explanation” borderline offensive… but the important point is
That is, don’t try to bend the language to existing idioms in languages you may already be familiar with. Just learn the syntax and the idioms native to Julia, and take them as a given.
And, not really directly related but even more important (hence my original curt response): everyone should be maximally allergic to defining methods for functions they don’t own on a set of arguments they don’t fully own (aka “type piracy”). Never, ever do it unless you really understand the implications of what you’re doing. Without that, it can easy lead to catastrophic correctness bugs, and it continues to be the source of major latency issues within the language by causing invalidations.
Honestly I didn’t get how troublesome + would be for strings in Julia until recently — when I started really hammering on reductions. It’s not all that unusual to consider concatenating a bunch of strings with a reduction. But if we used + then folks could (and I’d wager would) end up writing sum(strs) (or more straightforwardly reduce(+, strs)) to do this. And that might jumble up the orderings!
This is in contrast to using prod or reduce(*, strs), which is guaranteed to preserve the orderings. I still wouldn’t encourage doing this, but at least it won’t do the wrong thing!
Why? reduce assumes that the operator is associative, but not commutative — the implementation can change the associativity, but not the ordering. And * is associative.
Why, what am I missing here? The only thing I could object to is that in my opinion * carries no connotation of (non)commutativity at all, i.e. it could go either way (e.g. matrix multiplication, convolution).
It better not! There is nothing in the Julia language that would enforce some kind of commutativity of +. I can easily imagine a type for which a direct sum is implemented via + in lieu of the more correct ⨁ – and direct sums are not commutative in the sense most people would expect (only up to isomorphisms: For two matrices A and B, A ⨁ B is a different matrix than B ⨁ A, even if they’re isomorphic, with a change of basis). And there are certainly many colloquial uses of “plus” that are inherently non-commutative (including string concatenation: “put your first name plus last name in this field”)
The docstrings of neither sum nor prod say anything about this explicitly, but reduce guarantees preserving the order (assuming the underlying collection is ordered). It only makes assumptions about associativity. I would expect this behavior to translate back to sum and prod, as I’m sure it does.
I don’t think there would be anything wrong if we had chosen + for string concatenation, or if people were using sum(strings) to concatenate strings. At the end of the day it should boil down to “Julia uses * for string concatenation”, and leave it at that. I wish Julia wouldn’t be the only language using *, but had chosen any of the “prior art” of +, ., .., ~, &, \\. But okay. Just that ridiculous “abstract algebra” justification should be stricken from the record. Strings are not a field of vectors with a bilinear product! None of that mathematical structure applies even remotely in this situation. And of course, most fields are commutative, so 99.9% of all * symbols are just as commutative as + symbols (despite, as a quantum physicist, non-commutative algebras being my bread and butter). I can guarantee that nobody ever would have been confused by a non-commutive + for string concatenation. Alas. /end rant
For better or worse, there are a number of implementations that take do take advantage of the commutativity of +, even with arbitrary element types! And while the compiler transformation underpinning @simd is called reassoc, it actually does a reordering! Now, yes, Julia won’t do this @simd-based reordering for non-simdy datatypes, and + truly is commutative for the simdy datatypes. I don’t think it’s possible, but I’m still not fully convinced it’s impossible to divine up a non-commutative struct for which @simd would do the wrong thing. But even without SIMD instructions, doing a SIMD-like reordering can be significantly advantageous for pipelining processors.
Well, if true, that’s a major correctness bug waiting to blow up in our collective faces. You can’t just assume that every + everywhere is commutative for non-numbers.
Julia won’t do this @simd -based reordering for non-simdy datatypes
That’s fine then… if you do it in a context where you know the operands of + and can guarantee that + is commutative in this context, optimize away.
But just blindly assuming the operater + as a member of some CommutativeOps is a code smell, at the very least, and we shouldn’t be nonchalant about it.
Just a simple warning to not do type piracy. By example. John von Neumann’s definition of natural number is very simple. 0 is the empty set, a positive integer is the set of smaller natural numbers.
I once illustrated this live as follows (This messes up your julia session!)
Base.iterate(i::Int, state::Int=0) = (state ≥ i) ? nothing : (state, state+1)
Base.length(i::Int) = i
Base.IteratorSize(::Type{Int}) = Base.HasLength()
Base.eltype(::Type{Int}) = Int
julia> for i in 4
println(i)
end
0
1
2
3
julia> collect(5)
5-element Vector{Int64}:
0
1
2
3
4
Neat.
Then, I forgot to restart julia. Now, my new definition is at odds with the julia default, where collect(5)[] == 5, and for i in 4 ... end is a single iteration with i == 4. Some parts of julia, and some packages, depend on this behaviour. And my change is in Base, so it’s global. Chaos followed, until I remembered what I had done.
I don’t think vectors or fields are mentioned in the explanation, but monoids are.
For all s,t,u :: String, we have
"" * s == s
s * "" == s
s * (t * u) == (s * t) * u
which seems like a monoid.
When library authors make an assumption that library users are unaware of, we’re at risk of correctness problems. Is there any chance we could put “implementers should be commutative” in the Base.+ docstring? Or “implementers may or may not be commutative”? Either way I think it would help reduce the risk of miscommunication.
That’s true. However, if we did that then we couldn’t say “+ implementers should be commutative”, which is a nice property for + to have because + is also used commutatively for numbers and arrays. Using * for Strings we can say “* implementers should be (approximately) associative”. Giving contracts to generic functions is a good idea, imo.
Of course, those restrictions are not currently in the docstrings for those operators, but I’d like them to be.
So I think * is fine (albeit unusual), but I’d be also happy with ++ or append(...) which could be implemented by any sequence type, such as Vectors. I just want generic functions to have strong generic contracts that both library authors and users can count on (and any violations are done intentionally not accidentally).
For example, addition of ordinal numbers is not commutative, and digging deeper can unearth more niche non-commutative generalizations of the addition of natural or real numbers. I’d agree that commutativity isn’t a property of numerical types or operations in isolation like a supertype or interface, it often must take both into account like a multiply-dispatched Holy trait. That could also help optimize * as well; non-commutative multiplication of matrices and quaternions (or concatenation of strings) don’t need to prevent optimizations for commutative multiplication of other types.
Pretty sure implementing + commutativity is unrelated to aesthetically replacing * for concatenation, I’d say this should be split to a separate thread.
I can only speak for myself, but the blurb about free monoids just reeks of This Decision Is Correct Because I Am Smarter Than You
like yes, I do know what a monoid is, and yes I see that String endowed with * is one, but it just feels pretty pretentious to be appealing to a definition from mathematical objects that are super irrelevant. I would have much preferred that just say “* is used for string concatenation because + often implies commutativity” and leave it at that.
I can sympathize with that, especially since this is the only place the term monoid is used in the manual. Nevertheless, I don’t think that the intention was to offend anyone.
Generally, I am not easily offended, especially by explanations, so I don’t mind this being in the manual, but perhaps a PR removing this and rewording would make sense.
Should we mark some answer as answering this question? The top one saying “yes”, or the one above with “no”…? Or even mine here?
The discussion that followed is interesting (to me), I hope we’re not scaring new users away in “New to Julia” group. I believe the answer is yes, you CAN, define +, and using that + alone in your code will always be ok, but seemingly not with sum (who would use for strings?). But it doesn’t mean you really should do this for your own code, let alone for packages, i.e. help others do this.
This is interesting, but even more interesting to me would be an operator that seemingly concatenates strings, but doesn’t actually, returns a concat type of prefix and suffix, only lazily doing an actually concatenation, maybe never in practice. If you concat again, you might naively get a tree of that type, but then I think we went to far, and we could only ever have two parts, not doing the concat until we really need it e.g. for a regex, or to call C code that expects the string linear in memory.
I’m still not sure from the discussion, if sum would be a problem… in reality, even is used…
captures it quite well. “Offended” was a bit of rhetorical hyperbole, more like “eye-roll”, as in, with absolute respect for the Julia inventors, “people had their heads way too far up their mathematical asses when this was decided” I think we should be wary of that kind of mathematically pedantic mindset. While it may have some charm, that wears off pretty quickly, and then can be rather off-putting, especially to new users. A more recent and more egregious example (not to veer of too far off-topic in this thread again) was the opposition to using / to join paths, cf.
This is not to say I’m opposed to rigorous and precise interfaces. Quite the opposite. But operators can have different meanings in different contexts. Nobody is going to be confused with + or / having a different meaning for strings or paths than for numbers.