Why are there all these strange stumbling blocks in Julia?

For the same reason that these wats don’t break regular python code. These Most are just weird edge cases that you pretty much never see in the wild.

Edit: I did not want to dismiss the possibility that there are bugs happening because of these behaviors. But it’s not worse than for any other language IMHO. To elaborate, from the list linked by Tim Holy I consider only one entry (precedence of the range operator) a common footgun & two more entries as possible candidates for bugs that may appear in the wild

details

(typo changing _ to - and accidentally shadowing Base.:- and maybe ambigous float literal juxtaposition if somebody manages to combine questionable variable naming, questionable whitespace placement and complete unawareness of scientific notation)

As always, it’s good to be aware of the edge cases of any language you work with.

5 Likes

The parsing precedence of : and of &/== causes bugs all the time. Others like fld/div mentioned above may cause bugs as well.

I tend to be very liberal in my use of parentheses for this reason. I prefer being explicit.

8 Likes

The answer by @DNF is straightforward, every language has its own choices and doesn’t have to be a copy of any other language. However, I find two valid points in the OP:

I find this truly annoying. I don’t mean specifically parsing, I mean something like truncated integer division. In C and Fortran, I could do for example int x = 5.2 / 2 to get x = 2. To do the same thing in Julia, I have to write x = trunc(Int, 5.2 / 2), there is no straightforward function/symbol for truncated integer conversion, and Int(2.6) doesn’t help here. Whereas in C and Fortran you get it implicitly for free based on the types of variables.

Again, this is also a valid point. Using * for concatenating two strings is completely unintuitive, regardless of “commutativity” or any other algebraic property. No one (I assume) would think of stacking two strings beside each other using * instead of +. Simplicity matters, but Julia, at many times, chooses pickiness over simplicity.

3 Likes

This is only about familiarity, not intuition. Intuitively, concatenation is closer to multiplication than to addition, in mathematics that is even how you normally write multiplication: by concatenating symbols. xy is the product of x and y.

21 Likes

I think this is an “abstract algebra” v colloquial usage issue.

For non-mathematicians “x” concatenated with “x” gives “xx” and its natural to think of this as “two x’s”, hence + feels right.

In abstract algebra * is just an operation that satisfies different properties (I think “free group” and “word” are relevant terms). Since concatenation satisfies these properties it makes sense to use *.

Since Julia was designed with scientific computing (or numerical mathematics) in mind it makes a lot of sense they went with the mathematical version.

11 Likes

Isn’t it unusual for a * to have measure(s * t) == measure(s) + measure(t), where measure is length?

The most famous example of course is a logarithm

I hadn’t expected length(::String) to behave logarithmically but I guess maybe I should.

4 Likes

free group (as mentioned above) is literally an example… though because a free group has “inverse letters” the length of a*b may be less than length(a) + length(b)

We can make alternation work in practice by specializing the + operator:

Boring code
using Base: PCRE, wrap_string

# copypasta from regex * method, regex.jl line 804

function Base.:+(r1::Union{Regex,AbstractString,AbstractChar}, rs::Union{Regex,AbstractString,AbstractChar}...)
    mask = PCRE.CASELESS | PCRE.MULTILINE | PCRE.DOTALL | PCRE.EXTENDED # imsx
    match_opts   = nothing # all args must agree on this
    compile_opts = nothing # all args must agree on this
    shared = mask
    for r in (r1, rs...)
        r isa Regex || continue
        if match_opts === nothing
            match_opts = r.match_options
            compile_opts = r.compile_options & ~mask
        else
            r.match_options == match_opts &&
                r.compile_options & ~mask == compile_opts ||
                throw(ArgumentError("cannot multiply regexes: incompatible options"))
        end
        shared &= r.compile_options
    end
    unshared = mask & ~shared
    Regex(join((wrap_string(r1, unshared), wrap_string.(rs, Ref(unshared))...), "|"), compile_opts | shared, match_opts)
end

Base.:+(r::Regex) = r 
julia> x, y = r"abc", r"def"
(r"abc", r"def")

julia> match(x+y, "abc"), match(x+y, "def")
(RegexMatch("abc"), RegexMatch("def"))

julia> x*(x+y)  # r"abc(?:abc|def)", equivalent to r"abcabc|abcdef"
r"(?:abc)(?:(?:abc)|(?:def))"

Could be nice to get it into Base. As you can see, concatenation is distributive over alternation.

The unfortunate thing about these implementations for concatenation and alternation is, it’s not very efficient: it’s basically just converting the regex into a string, doing string manipulation, and then recompiling a new regex from the string. Furthermore this can’t be folded by the compiler. Probably in theory it could be much more efficient, but that might require deeper hooks into the PCRE library than we have (I have no clue). Maybe somebody will one day write a pure Julia PCRE library, in which case the sky would be the limit.

Also, I’m not the happiest about this:

julia> x+y, y+x
(r"(?:abc)|(?:def)", r"(?:def)|(?:abc)")

julia> ans[1] == ans[2] # lies!
false

the two regular expressions above are equivalent, in that they describe the same regular language and the same set of finite automata, but the equality tests for the PCRE library’s regular expressions are not very sophisticated.

Furthermore, we don’t currently have a way to express the Kleene star. We could possibly overload our only unary postfix operator Base.adjoint, but that doesn’t feel right. Perhaps one day the parser will be modified so that (x*) will call Base.star(x)—then Kleene algebra will be fully supported IIUC. (oh how I wish our keyboards had though, to disambiguate multiplication from convolution and Kleene stars.)

It’s definitely unintuitive, but to a compsci nerd it should be straightforward because it aligns with first-semester CS computability theory stuff.

I think the best way to grok it, is to think of the operators’ use in finite automata and regular expressions. This is a really good tutorial. And then, instead of thinking of * as numerical multiplication, think of it more like logical multiplication “and,” & or , and instead of thinking of + as numerical addition, think of it like logical addition “or,” | or .

In other words, a finite state machine which accepts the regular language described by the regular expression r"abc" * r"def" will only accept strings that have "abc" and "def" (in that order). And (if we have alternation implemented like I’ve done above) the FSM described by the regular expression r"abc" + r"def" will accept strings that have "abc" or "def".

Then take the same operator rules for regular expressions and apply them to the strings they process. Alternation doesn’t have an analogous meaning for strings as for regular expressions, but concatenation does.

It’s arguable we should’ve used & for concatenation and | for alternation, but that ship has probably sailed. And Stephen Kleene isn’t around, so we can’t prod him to request the Kleene algebra to use these operators instead.

At the end of the day though, just like anything else, you get used to it.

2 Likes

+ and * already have meaning in regex too, which makes it more confusing.

@uniment , if alternation is + shouldn’t it have an additive inverse and identity?

2 Likes

Yes, this is a point pretty strongly in favor of & and | for concatenation and alternation.

Or we could build a time machine and stop whoever decided to put asterisks on keyboards but not the multiplication dot.

3 Likes

The concatenation identity (empty string) is written with a 1, and the alternation identity (null string) is written with a 0. See algebraic properties here.

Because this algebra is a semiring, an additive inverse isn’t necessary. I don’t know who makes these rules. :sweat_smile:

4 Likes

Perhaps you are aware of this, but there is a nice package that may be of interest to you: GitHub - BioJulia/Automa.jl: A julia code generator for regular expressions

2 Likes

I wasn’t aware. Interesting that they introduce & as an intersection operator. They dropped exponentiation though, and don’t have a Kleene star operation.

Pretty neat library. I’ll play around with it, thanks for bringing it up!

A small anecdote as to “what is natural”.

A frend of mine was a team leader in software company making accounting software or like that. She had difficulty to explain to her programmer that their customers DO NOT want to have paragraphs numbering starting from zero.

21 Likes

Maybe those customers should just learn about pointer offset. :slight_smile:

2 Likes

I do agree with the point, but…
What is intuition if not familiarity?

A bit of a philosophical question, I guess. But I think it includes predicting something new from something familiar. Perhaps applying familiar knowledge in news ways, or combining several familiar elements that are not normally used together. I definitely don’t think it’s seeing the plain old familiar thing, there has to be something new that you recognize as ‘making sense’.

1 Like