Is it possible to define '+' operator for string as '*' do for them?

For better or worse, there are a number of implementations that take do take advantage of the commutativity of +, even with arbitrary element types! And while the compiler transformation underpinning @simd is called reassoc, it actually does a reordering! Now, yes, Julia won’t do this @simd-based reordering for non-simdy datatypes, and + truly is commutative for the simdy datatypes. I don’t think it’s possible, but I’m still not fully convinced it’s impossible to divine up a non-commutative struct for which @simd would do the wrong thing. But even without SIMD instructions, doing a SIMD-like reordering can be significantly advantageous for pipelining processors.

1 Like

Well, if true, that’s a major correctness bug waiting to blow up in our collective faces. You can’t just assume that every + everywhere is commutative for non-numbers.

Julia won’t do this @simd -based reordering for non-simdy datatypes

That’s fine then… if you do it in a context where you know the operands of + and can guarantee that + is commutative in this context, optimize away.

But just blindly assuming the operater + as a member of some CommutativeOps is a code smell, at the very least, and we shouldn’t be nonchalant about it.

1 Like

Just a simple warning to not do type piracy. By example. John von Neumann’s definition of natural number is very simple. 0 is the empty set, a positive integer is the set of smaller natural numbers.

I once illustrated this live as follows (This messes up your julia session!)

Base.iterate(i::Int, state::Int=0) = (state ≥ i) ? nothing : (state, state+1)
Base.length(i::Int) = i
Base.IteratorSize(::Type{Int}) = Base.HasLength()
Base.eltype(::Type{Int}) = Int

julia> for i in 4
           println(i)
       end
0
1
2
3

julia> collect(5)
5-element Vector{Int64}:
 0
 1
 2
 3
 4

Neat.

Then, I forgot to restart julia. Now, my new definition is at odds with the julia default, where collect(5)[] == 5, and for i in 4 ... end is a single iteration with i == 4. Some parts of julia, and some packages, depend on this behaviour. And my change is in Base, so it’s global. Chaos followed, until I remembered what I had done.

3 Likes

this also bugs me occasionally. another example is the docs for /

Right division operator: multiplication of x by the inverse of y on the right.

seemingly motivated by traditional algebraic definitions. but this does not correctly describe the behavior of floating point arithmetic

1 Like

I don’t think vectors or fields are mentioned in the explanation, but monoids are.

For all s,t,u :: String, we have

  • "" * s == s
  • s * "" == s
  • s * (t * u) == (s * t) * u

which seems like a monoid.

When library authors make an assumption that library users are unaware of, we’re at risk of correctness problems. Is there any chance we could put “implementers should be commutative” in the Base.+ docstring? Or “implementers may or may not be commutative”? Either way I think it would help reduce the risk of miscommunication.

4 Likes

sure, but + could have been the symbol for that just fine as monoids only have one operation

1 Like

That’s true. However, if we did that then we couldn’t say “+ implementers should be commutative”, which is a nice property for + to have because + is also used commutatively for numbers and arrays. Using * for Strings we can say “* implementers should be (approximately) associative”. Giving contracts to generic functions is a good idea, imo.

Of course, those restrictions are not currently in the docstrings for those operators, but I’d like them to be.

So I think * is fine (albeit unusual), but I’d be also happy with ++ or append(...) which could be implemented by any sequence type, such as Vectors. I just want generic functions to have strong generic contracts that both library authors and users can count on (and any violations are done intentionally not accidentally).

3 Likes

For example, addition of ordinal numbers is not commutative, and digging deeper can unearth more niche non-commutative generalizations of the addition of natural or real numbers. I’d agree that commutativity isn’t a property of numerical types or operations in isolation like a supertype or interface, it often must take both into account like a multiply-dispatched Holy trait. That could also help optimize * as well; non-commutative multiplication of matrices and quaternions (or concatenation of strings) don’t need to prevent optimizations for commutative multiplication of other types.

Pretty sure implementing + commutativity is unrelated to aesthetically replacing * for concatenation, I’d say this should be split to a separate thread.

3 Likes

I curious, which part of the explanation are you referring to?

2 Likes

I can only speak for myself, but the blurb about free monoids just reeks of This Decision Is Correct Because I Am Smarter Than You

like yes, I do know what a monoid is, and yes I see that String endowed with * is one, but it just feels pretty pretentious to be appealing to a definition from mathematical objects that are super irrelevant. I would have much preferred that just say “* is used for string concatenation because + often implies commutativity” and leave it at that.

7 Likes

I can sympathize with that, especially since this is the only place the term monoid is used in the manual. Nevertheless, I don’t think that the intention was to offend anyone.

Generally, I am not easily offended, especially by explanations, so I don’t mind this being in the manual, but perhaps a PR removing this and rewording would make sense.

2 Likes

Should we mark some answer as answering this question? The top one saying “yes”, or the one above with “no”…? Or even mine here?

The discussion that followed is interesting (to me), I hope we’re not scaring new users away in “New to Julia” group. I believe the answer is yes, you CAN, define +, and using that + alone in your code will always be ok, but seemingly not with sum (who would use for strings?). But it doesn’t mean you really should do this for your own code, let alone for packages, i.e. help others do this.

This is interesting, but even more interesting to me would be an operator that seemingly concatenates strings, but doesn’t actually, returns a concat type of prefix and suffix, only lazily doing an actually concatenation, maybe never in practice. If you concat again, you might naively get a tree of that type, but then I think we went to far, and we could only ever have two parts, not doing the concat until we really need it e.g. for a regex, or to call C code that expects the string linear in memory.

I’m still not sure from the discussion, if sum would be a problem… in reality, even is used…

Don’t take the comment too seriously, but

captures it quite well. “Offended” was a bit of rhetorical hyperbole, more like “eye-roll”, as in, with absolute respect for the Julia inventors, “people had their heads way too far up their mathematical asses when this was decided” :wink: I think we should be wary of that kind of mathematically pedantic mindset. While it may have some charm, that wears off pretty quickly, and then can be rather off-putting, especially to new users. A more recent and more egregious example (not to veer of too far off-topic in this thread again) was the opposition to using / to join paths, cf.

This is not to say I’m opposed to rigorous and precise interfaces. Quite the opposite. But operators can have different meanings in different contexts. Nobody is going to be confused with + or / having a different meaning for strings or paths than for numbers.

9 Likes

I think “an operator” has an ambiguity that’s important: a symbol vs a function.

There are some examples of Julia functions having different properties on different types and I am often confused and bothered by it. I want a generic function to have a generic contract, not a contract that changes all its invariants depending on the argument types.

The whole point of namespaces is so that the same name can be used for different purposes without changing the meaning of a shared object. If an author wants to use the name + for functionality that isn’t commutative, or the name map for functionality that doesn’t preserve keys(_), I much much prefer that they use their own MyModule.+, and MyModule.map rather than overloading Base.+ and Base.map.

So in the case of this thread, if Base.+ were used for String concatenation, I wouldn’t like that because I want Base.+ to be always commutative; If MyModule.+ – which is a different function, with a different contract, but the same symbol + – were used for String concatenation, that would be fine by me — go nuts.

Imo, if a function has a generic contract, then no specialization of it to specific types should violate that contract. The specialization should semantically subtype the generic function. Otherwise – if specializations can lawfully violate the generic interface – there is no generic interface.

2 Likes

I think this is an important point. When reading it now I find myself nodding and thinking “fair”, but I vividly remember being new to Julia, coming from Python, and finding this explanation not only rather grating (like, if all the other language designers are too pea-brained to realize this, how come + is working just fine for them?), but also suggesting a lack of pragmatism in the design that I assumed would make the language unpleasant to use in general. This was wrong, of course, but it wasn’t a good first impression. Simply saying “in Julia, we use *” would be plenty, no explanation or defense needed.

4 Likes

I can remember seing the choice of * and reading the explanation, and thinking “oh, well, I guess that makes sense.” it’s not like + is incredibly natural either.

There’s a somewhat arbitrary choice. There’s a somewhat reasonable explanation. This looks like the 1- vs 0-based debate again, only less important.

Getting annoyed, not just with the choice, but with the presence of a short explanation, and still thinking about it years later? Isn’t that … excessive?

3 Likes

When I first started learning French (I am still not anywhere near fluent) I used to think “Why is this word conjugated in such a strange fashion. Funny, I never thought that when learning English, nor when learning my first language.

After a while I realized I would be far better off just accepting French being the way it was, that I would have no influence in changing anything and be done with it. Reading this thread made me realize that this mindset should apply to computer languages (after version 1.0) as well. At best it may refine the documents.

This is kind of the long way of saying what @stevengj said in post 4.

P.S. The post did make me look up again the long forgotten definitions of commutative, and associative. Things I use when I do algebra but no longer had a name for.

3 Likes

The free monoid bit is kind of silly—many operations could be used for a monoid and it just feels like pointless jargon. Taking offense at it feels a little bit extra, but ok.

The really interesting question is: If concatenation is * then what would + be? If you consider strings to be patterns that match themselves, then + would be alternation of patterns in the regex sense. In other words, if a and b are patterns, then a * b is a pattern that matches a followed by b and a + b is a pattern that matches a or b. And this is makes regular expressions, with strings embedded in them as constant patterns, into a semiring: the additive identity is the impossible pattern (matches no strings) and the multiplicative identity is the empty string. You can easily check that the semiring properties hold; and yes, + is commutative while * is not. I suspect that the observation that * makes strings a free monoid is an echo of this, which I explained at least once when someone asked about this. The docs appropriately don’t want to get into this whole semiring of regular expressions business, but the part of that definition that’s just about strings is that they form a free monoid under *.

So we could have used a different opertor than * like ++, but using + still feels strictly wrong. There’s the whole non-commutativity issue, but also the fact that if you’ve made + concatenation then if you wanted to have an operator for alternation, what would you use? I guess |? Then you’re really punning your operators kind of badly. Having | and + be the operators for a semiring feels messy. In languages that have + for concatenation, it’s sometimes allowed to repeat a string using multiplication, but which side do you multiply on? 5*"xo" or "xo"*5? It’s unclear—because when + is commutative it doesn’t matter. Maybe that’s fine and you could allow both. But when you use * for concatenation it’s obvious that you want "xo"^5 for repeating xo five times, not 5^"xo". Then there’s the whole bit about using str/suffix to remove a suffix and prefix\str to remove a prefix. People didn’t want that exported from Base Julia (too funky) but at least you can define it yourself and it is coherent with *. With + and - this doesn’t work nearly as nicely because you have str - suffix for removing a suffix, but there’s no “left subtraction” for removing a prefix—again, because addition is presumed to be commutative.

3 Likes

only semi-serious: should strings support left and right division…?

I proposed that on GitHub long ago and it was shot down as too weird, but I still quite like it. My main question is what str/suffix should do if str doesn’t actually end with suffix? Error? Return str unmodified?

2 Likes