Why are there all these strange stumbling blocks in Julia?

The most famous example of course is a logarithm

I hadn’t expected length(::String) to behave logarithmically but I guess maybe I should.

4 Likes

free group (as mentioned above) is literally an example… though because a free group has “inverse letters” the length of a*b may be less than length(a) + length(b)

We can make alternation work in practice by specializing the + operator:

Boring code
using Base: PCRE, wrap_string

# copypasta from regex * method, regex.jl line 804

function Base.:+(r1::Union{Regex,AbstractString,AbstractChar}, rs::Union{Regex,AbstractString,AbstractChar}...)
    mask = PCRE.CASELESS | PCRE.MULTILINE | PCRE.DOTALL | PCRE.EXTENDED # imsx
    match_opts   = nothing # all args must agree on this
    compile_opts = nothing # all args must agree on this
    shared = mask
    for r in (r1, rs...)
        r isa Regex || continue
        if match_opts === nothing
            match_opts = r.match_options
            compile_opts = r.compile_options & ~mask
        else
            r.match_options == match_opts &&
                r.compile_options & ~mask == compile_opts ||
                throw(ArgumentError("cannot multiply regexes: incompatible options"))
        end
        shared &= r.compile_options
    end
    unshared = mask & ~shared
    Regex(join((wrap_string(r1, unshared), wrap_string.(rs, Ref(unshared))...), "|"), compile_opts | shared, match_opts)
end

Base.:+(r::Regex) = r 
julia> x, y = r"abc", r"def"
(r"abc", r"def")

julia> match(x+y, "abc"), match(x+y, "def")
(RegexMatch("abc"), RegexMatch("def"))

julia> x*(x+y)  # r"abc(?:abc|def)", equivalent to r"abcabc|abcdef"
r"(?:abc)(?:(?:abc)|(?:def))"

Could be nice to get it into Base. As you can see, concatenation is distributive over alternation.

The unfortunate thing about these implementations for concatenation and alternation is, it’s not very efficient: it’s basically just converting the regex into a string, doing string manipulation, and then recompiling a new regex from the string. Furthermore this can’t be folded by the compiler. Probably in theory it could be much more efficient, but that might require deeper hooks into the PCRE library than we have (I have no clue). Maybe somebody will one day write a pure Julia PCRE library, in which case the sky would be the limit.

Also, I’m not the happiest about this:

julia> x+y, y+x
(r"(?:abc)|(?:def)", r"(?:def)|(?:abc)")

julia> ans[1] == ans[2] # lies!
false

the two regular expressions above are equivalent, in that they describe the same regular language and the same set of finite automata, but the equality tests for the PCRE library’s regular expressions are not very sophisticated.

Furthermore, we don’t currently have a way to express the Kleene star. We could possibly overload our only unary postfix operator Base.adjoint, but that doesn’t feel right. Perhaps one day the parser will be modified so that (x*) will call Base.star(x)—then Kleene algebra will be fully supported IIUC. (oh how I wish our keyboards had though, to disambiguate multiplication from convolution and Kleene stars.)

It’s definitely unintuitive, but to a compsci nerd it should be straightforward because it aligns with first-semester CS computability theory stuff.

I think the best way to grok it, is to think of the operators’ use in finite automata and regular expressions. This is a really good tutorial. And then, instead of thinking of * as numerical multiplication, think of it more like logical multiplication “and,” & or , and instead of thinking of + as numerical addition, think of it like logical addition “or,” | or .

In other words, a finite state machine which accepts the regular language described by the regular expression r"abc" * r"def" will only accept strings that have "abc" and "def" (in that order). And (if we have alternation implemented like I’ve done above) the FSM described by the regular expression r"abc" + r"def" will accept strings that have "abc" or "def".

Then take the same operator rules for regular expressions and apply them to the strings they process. Alternation doesn’t have an analogous meaning for strings as for regular expressions, but concatenation does.

It’s arguable we should’ve used & for concatenation and | for alternation, but that ship has probably sailed. And Stephen Kleene isn’t around, so we can’t prod him to request the Kleene algebra to use these operators instead.

At the end of the day though, just like anything else, you get used to it.

2 Likes

+ and * already have meaning in regex too, which makes it more confusing.

@uniment , if alternation is + shouldn’t it have an additive inverse and identity?

2 Likes

Yes, this is a point pretty strongly in favor of & and | for concatenation and alternation.

Or we could build a time machine and stop whoever decided to put asterisks on keyboards but not the multiplication dot.

3 Likes

The concatenation identity (empty string) is written with a 1, and the alternation identity (null string) is written with a 0. See algebraic properties here.

Because this algebra is a semiring, an additive inverse isn’t necessary. I don’t know who makes these rules. :sweat_smile:

4 Likes

Perhaps you are aware of this, but there is a nice package that may be of interest to you: GitHub - BioJulia/Automa.jl: A julia code generator for regular expressions

2 Likes

I wasn’t aware. Interesting that they introduce & as an intersection operator. They dropped exponentiation though, and don’t have a Kleene star operation.

Pretty neat library. I’ll play around with it, thanks for bringing it up!

A small anecdote as to “what is natural”.

A frend of mine was a team leader in software company making accounting software or like that. She had difficulty to explain to her programmer that their customers DO NOT want to have paragraphs numbering starting from zero.

21 Likes

Maybe those customers should just learn about pointer offset. :slight_smile:

2 Likes

I do agree with the point, but…
What is intuition if not familiarity?

A bit of a philosophical question, I guess. But I think it includes predicting something new from something familiar. Perhaps applying familiar knowledge in news ways, or combining several familiar elements that are not normally used together. I definitely don’t think it’s seeing the plain old familiar thing, there has to be something new that you recognize as ‘making sense’.

1 Like

Talking about personal tastes, I have no problem adapting to 0-based indexing in C and 1-based indexing in Julia, but Python’s semi-open range always appears weird and unintuitive to me. And a Julia programmer learning Python will be shocked to find out that perfectly working scripts cannot run when pasted into the command-line REPL, due to indentation and blank lines…

9 Likes

First language I learned was APL (traumatic :grimacing:). I put parentesis EVERYWHERE I just can’t type
return x
it has to be
return(x)

I will religiously put
return(nothing)
if there is nothing to return. I get comments from my colleagues all the time for my weird syntax. Too old to change.

5 Likes

Hehe. Not a fan of this one. Does it help if you consider the point that return is a keyword, but you are using it with function call syntax, which can be confusing for the reader? I feel it obfuscates the meaning of the statement, giving the impression that it calls into some other function.

And do you do this with all keywords? Even function, struct, using, module, else, export?

Personally, I think explicit

return nothing

is a good thing (without the parens, though.)

5 Likes

oh dear lord, what fury hath thou wrought

julia> if(false) return(1) else(return(2)) end
2
3 Likes

Or even better

julia> if(false) return(1) else(return(2)) end()
2
4 Likes

I don’t think this is correct for R:

> "a" + "b"
Error in "a" + "b" : non-numeric argument to binary operator
> paste0("a","b")
[1] "ab"
3 Likes

good catch, sorry. I verified some of the weird ones but I had assumed all the + were correct.