Is it possible to define '+' operator for string as '*' do for them?

make a Path :eyes:

2 Likes

Yep, that’s a reason that str1/str2 doing path concatenation isn’t great. Yes, it looks nice, but there’s a different sensible meaning for that operation. If there were a path type, having path1/path2 do path joining would be better, but then it’s very different from strings. Seems better to avoid and use a different syntax. There’s also always the option of having a non-standard string type for paths, e.g. path"foo/$part/bar.txt" which could normalize to the right kind of path and do checks like if part is something like / or /root then raise an error.

1 Like

The string operations wikipedia page is linked from that thread and oh, hey, would you look at that—the mathematical syntax used for all the string operations is exactly the same as Julia’s syntax for string operations. What a coincidence!

3 Likes

I had almost forgotten that: actual, literal concatenation/juxtaposition looks like implicit multiplication. I think that’s actually what made me think * was OK.

3 Likes

If this was directed to me, I just want to note that I don’t harbor the slightest annoyance about the choice, I haven’t thought about the explanation in years, and when reading it now, my reaction is “neat, I learned something”. @StefanKarpinski’s comment above extending this idea to a semiring is downright fascinating.

This thread just reminded me that when I was first learning Julia 6-7 years ago and encountered this explanation, I walked over and showed it to my colleague so we could roll our eyes in disbelief together. First impressions matter, so I thought that was worth sharing here. (With my sincere apologies to whoever wrote that paragraph, which, again, I now find educational.)

4 Likes

Partly, I guess, but not mainly.

Same experience here – heavy eye roll, general indifference, acceptance, fascination. I’ll let others decide what it means about me that program syntax can make me go through something like the stages of grief.

Julia makes enough Big Design Choices that I would still back + if we were deciding today because I don’t think another layer of discomfort to newcomers is worth the benefits, but it’s definitely neat.

7 Likes

Another worthwhile Wikipedia page on the subject is

1 Like

Or we could have a + b = min(a, b). We then compactify the space of strings with a -Inf and +Inf. We have that +Inf is a neutral element for addition, whereas -Inf is an annihilator. I’m sure the ideal structure can be figured out, but I’d need a cup of coffee.

Languages that don’t worry about algebra or generic operations can pun like crazy. :slight_smile:

$ pike
Pike v8.0 release 1738 running Hilfe v3.5 (Incremental Pike Frontend)
> "hello world" / " ";
(1) Result: ({ /* 2 elements */
                "hello",
                "world"
            })
> "hello world" / 2.8;
(2) Result: ({ /* 4 elements */
                "hel",
                "lo ",
                "wo",
                "rld"
            })
> "hello world" * 2.8;
(3) Result: "hello worldhello worldhello wor"

(Reference documentation)

The Pike language uses division of a string for splitting, i.e. a/b is (Julia-equivalent) split(a,b). a*b is then join(a,b).

1 Like

I can see an aesthetic and possibly a logical advantage of * over + there, but it’s not strong enough to me to warrant much more of an official explanation than “we know it’s unorthodox for programming, but it’s rooted in string function notation from formal language theory”.

  1. In practice, people didn’t have a problem with a symbol + used for a commutative operation (addition) also being used for a non-commutative operation (concatenation) when the input types can differentiate the call. We already do that for multiplication of integers and floating points versus matrix multiplication. In that context, arguments for preserving commutativity thus seem like a minor aesthetic preference that isn’t broadly applied.
  2. Framing addition as the commutative one and multiplication as the non-commutative one in general doesn’t pass muster when both addition and multiplication are usually commutative. The element of truth is when just one of them is non-commutative, it’s multiplication e.g. addition and multiplication of 1x1 matrices show the generalization of scalar addition and multiplication. There isn’t both an addition and multiplication for strings (though languages might), so the stated choice of * for being non-commutative is out of the usual context.
  3. String operations don’t need to generalize or be associated with addition or multiplication, putting aside notation. The identity element could be a zero for addition or a one for multiplication (Julia does do one(String) == ""), but a string x doesn’t really have an additive inverse x + (-x) == "" or a multiplicative inverse x * inv(x) == "". We do have chopsuffix(x, x) == choppreffix(x, x) == "", but there’s no direct -x or inv(x) input, and it does nothing if the strings don’t match enough (endswith/startswidth), which is much more unusual for addition or multiplication than non-commutative addition.
  4. / and \ pairing suffix/prefix stripping with concatenation as right/left division is paired with multiplication is aesthetically pleasing, but that doesn’t match the right/left quotient that returns an empty string upon mismatches, and what-ifs aren’t nearly as strong an argument as the existing language. If anything, chopsuffix/chopprefix suggests string operations should all be named functions; repeat is overloaded to do what ^ does, and string can do string concatenation. Languages with + for concatenation mostly use named functions for string operations, and even Pike has some on top of its fuller “arithmetic.”
2 Likes

I’m sure some variant of the Grothendieck construction can be applied to create string inverses, both additive and multiplicative. Sooner or later we’ll end up with a free module of strings over the integers, hopefully with a countable set of generators.

1 Like

Programming languages are full of notation that was initially pretty arbitrary, it’s just that we got used to it over decades.

It’s nice if operators have some internal consistency and/or resemblance to a familiar notation, but that is just one aspect. Brevity, clarity, etc are also important.

Since Julia is primarily not a string-processing language, I am not sure we should introduce operators for a lot of string-related API. Sure, * is fine, arguably ^ is useful too, but after that, diminishing returns kick in. Packages can do /, \, ++, //, in their own namespace if necessary for DSLs, while Base and the standard libraries can just expose functions that have no operator equivalent.

1 Like

I imagine you may well be thinking of the (dormant, but not forgotten/abandoned) thread Designing a Paths Julep, but in case you or any interested readers have forgotten, that’s exactly what I’m proposing there :slight_smile:

Another upside of this sort of approach is that based on whether a $string or $path is being interpolated, you can catch and handle unsafe input like:

# from reading config/user input/etc.
untrusted_file = "../../../../../../etc/passwd"
read_path = path"user_content/$untrusted_file"
                            # ^ interpolation error thrown

Eh, IEEE floats may muddy the waters, but + is the Abelian group operator. Just because it’s not perfectly adhered to is no reason to abandon convention.

That said, I do see the confusion around *, and I think there’s a solid argument that instead of picking between punning addition/multiplication it would be sensible to dedicate an operator to the concatenation of containery-types (vectors, strings, etc.) like ++. Until Julia 2.0 though, we’ll just have to live with the status quo, and this niggle is fairly low my list of breaking changes I wish we could make.

1 Like

I’m not sure we want that. Containers aren’t all sequences, so concatenation can work really differently. Multidimensional arrays have special array literal syntax backed by vect/hcat/vcat/hvcat, and that doesn’t concatenate input strings (and non-AbstractArrays) because we really want to write strings as separate elements. We often want to write AbstractArrays as separate elements, too, which requires seemingly superfluous brackets. As for sequences, concatenation may not even be useful enough to warrant implementation: [(1:2);(4:5)] cannot be another inlinable UnitRange, so [(1:2);(3,4)] should not be either for type stability.

We also wouldn’t need to invent a new syntax for strings, we already have string interpolation. "$(a)b$(c)d" is just a more flexible alternative to a * "b" * c * "d" lowering right to string instead of *.