The most famous example of course is a logarithm
I hadnât expected length(::String)
to behave logarithmically but I guess maybe I should.
free group (as mentioned above) is literally an example⌠though because a free group has âinverse lettersâ the length of a*b may be less than length(a) + length(b)
We can make alternation work in practice by specializing the +
operator:
Boring code
using Base: PCRE, wrap_string
# copypasta from regex * method, regex.jl line 804
function Base.:+(r1::Union{Regex,AbstractString,AbstractChar}, rs::Union{Regex,AbstractString,AbstractChar}...)
mask = PCRE.CASELESS | PCRE.MULTILINE | PCRE.DOTALL | PCRE.EXTENDED # imsx
match_opts = nothing # all args must agree on this
compile_opts = nothing # all args must agree on this
shared = mask
for r in (r1, rs...)
r isa Regex || continue
if match_opts === nothing
match_opts = r.match_options
compile_opts = r.compile_options & ~mask
else
r.match_options == match_opts &&
r.compile_options & ~mask == compile_opts ||
throw(ArgumentError("cannot multiply regexes: incompatible options"))
end
shared &= r.compile_options
end
unshared = mask & ~shared
Regex(join((wrap_string(r1, unshared), wrap_string.(rs, Ref(unshared))...), "|"), compile_opts | shared, match_opts)
end
Base.:+(r::Regex) = r
julia> x, y = r"abc", r"def"
(r"abc", r"def")
julia> match(x+y, "abc"), match(x+y, "def")
(RegexMatch("abc"), RegexMatch("def"))
julia> x*(x+y) # r"abc(?:abc|def)", equivalent to r"abcabc|abcdef"
r"(?:abc)(?:(?:abc)|(?:def))"
Could be nice to get it into Base
. As you can see, concatenation is distributive over alternation.
The unfortunate thing about these implementations for concatenation and alternation is, itâs not very efficient: itâs basically just converting the regex into a string, doing string manipulation, and then recompiling a new regex from the string. Furthermore this canât be folded by the compiler. Probably in theory it could be much more efficient, but that might require deeper hooks into the PCRE library than we have (I have no clue). Maybe somebody will one day write a pure Julia PCRE library, in which case the sky would be the limit.
Also, Iâm not the happiest about this:
julia> x+y, y+x
(r"(?:abc)|(?:def)", r"(?:def)|(?:abc)")
julia> ans[1] == ans[2] # lies!
false
the two regular expressions above are equivalent, in that they describe the same regular language and the same set of finite automata, but the equality tests for the PCRE libraryâs regular expressions are not very sophisticated.
Furthermore, we donât currently have a way to express the Kleene star. We could possibly overload our only unary postfix operator Base.adjoint
, but that doesnât feel right. Perhaps one day the parser will be modified so that (x*)
will call Base.star(x)
âthen Kleene algebra will be fully supported IIUC. (oh how I wish our keyboards had â
though, to disambiguate multiplication from convolution and Kleene stars.)
Itâs definitely unintuitive, but to a compsci nerd it should be straightforward because it aligns with first-semester CS computability theory stuff.
I think the best way to grok it, is to think of the operatorsâ use in finite automata and regular expressions. This is a really good tutorial. And then, instead of thinking of *
as numerical multiplication, think of it more like logical multiplication âand,â &
or â§
, and instead of thinking of +
as numerical addition, think of it like logical addition âor,â |
or â¨
.
In other words, a finite state machine which accepts the regular language described by the regular expression r"abc" * r"def"
will only accept strings that have "abc"
and "def"
(in that order). And (if we have alternation implemented like Iâve done above) the FSM described by the regular expression r"abc" + r"def"
will accept strings that have "abc"
or "def"
.
Then take the same operator rules for regular expressions and apply them to the strings they process. Alternation doesnât have an analogous meaning for strings as for regular expressions, but concatenation does.
Itâs arguable we shouldâve used &
for concatenation and |
for alternation, but that ship has probably sailed. And Stephen Kleene isnât around, so we canât prod him to request the Kleene algebra to use these operators instead.
At the end of the day though, just like anything else, you get used to it.
+
and *
already have meaning in regex too, which makes it more confusing.
@uniment , if alternation is +
shouldnât it have an additive inverse and identity?
Yes, this is a point pretty strongly in favor of &
and |
for concatenation and alternation.
Or we could build a time machine and stop whoever decided to put asterisks on keyboards but not the multiplication dot.
The concatenation identity (empty string) is written with a 1, and the alternation identity (null string) is written with a 0. See algebraic properties here.
Because this algebra is a semiring, an additive inverse isnât necessary. I donât know who makes these rules.
Perhaps you are aware of this, but there is a nice package that may be of interest to you: GitHub - BioJulia/Automa.jl: A julia code generator for regular expressions
I wasnât aware. Interesting that they introduce &
as an intersection operator. They dropped exponentiation though, and donât have a Kleene star operation.
Pretty neat library. Iâll play around with it, thanks for bringing it up!
A small anecdote as to âwhat is naturalâ.
A frend of mine was a team leader in software company making accounting software or like that. She had difficulty to explain to her programmer that their customers DO NOT want to have paragraphs numbering starting from zero.
Maybe those customers should just learn about pointer offset.
I do agree with the point, butâŚ
What is intuition if not familiarity?
A bit of a philosophical question, I guess. But I think it includes predicting something new from something familiar. Perhaps applying familiar knowledge in news ways, or combining several familiar elements that are not normally used together. I definitely donât think itâs seeing the plain old familiar thing, there has to be something new that you recognize as âmaking senseâ.
Talking about personal tastes, I have no problem adapting to 0-based indexing in C and 1-based indexing in Julia, but Pythonâs semi-open range
always appears weird and unintuitive to me. And a Julia programmer learning Python will be shocked to find out that perfectly working scripts cannot run when pasted into the command-line REPL, due to indentation and blank linesâŚ
First language I learned was APL (traumatic ). I put parentesis EVERYWHERE I just canât type
return x
it has to be
return(x)
I will religiously put
return(nothing)
if there is nothing to return. I get comments from my colleagues all the time for my weird syntax. Too old to change.
Hehe. Not a fan of this one. Does it help if you consider the point that return
is a keyword, but you are using it with function call syntax, which can be confusing for the reader? I feel it obfuscates the meaning of the statement, giving the impression that it calls into some other function.
And do you do this with all keywords? Even function
, struct
, using
, module
, else
, export
?
Personally, I think explicit
return nothing
is a good thing (without the parens, though.)
oh dear lord, what fury hath thou wrought
julia> if(false) return(1) else(return(2)) end
2
Or even better
julia> if(false) return(1) else(return(2)) end()
2
I donât think this is correct for R:
> "a" + "b"
Error in "a" + "b" : non-numeric argument to binary operator
> paste0("a","b")
[1] "ab"
good catch, sorry. I verified some of the weird ones but I had assumed all the +
were correct.