I wondered why you can’t search for a substring with “in”, like “za” in “pizza”
While it would be nice, and match python, after searching, I can also understand the difference between “in” searching for an element, not a sequence of elements.and wanting to be consistent with usage with arrays
Though you can make it work now by doing: Base.in( pattern::AbstractString, src::AbstractString ) = occursin( pattern, src)
What i wonder is couldn’t the already existing “contains” function be defined as an infix operator? It seems like this wouldn’t break anything.
So, the currently working: if contains( “pizza”, “za” ) …
It is not a good idea to redefine Base.in for AbstractString. Since it is existing Base code, you could break other code in Base that is built upon the old definition.
yeah defining it was just a test, I mostly process numbers, not strings.
I wasn’t asking about user-defined custom operators, but rather proposing specifically “contains” as one.
Part of the reason that thread died down is that Julia nowadays has so many infix operators to choose from (infinitely many, in fact), thanks to Unicode + suffixing of existing operators. e.g.
julia> ∈ₛ(a,b) = occursin(a,b)
∈ₛ (generic function with 1 method)
julia> "za" ∈ₛ "pizza"
true
You can even define an operator ∈ᵒᶜᶜᵘʳˢⁱⁿ, in fact.
It’s a lot easier for the parser and you this way. contains seems unambiguous, but without the method call’s mandatory parentheses, you have to specify its precedence in the massive hierachy of operators so something like "pizza" contains "iz" * "za" works. Then you have to communicate an unfamiliar operator and its precedence with everyone you share your code with.
Operator suffixing preserves familiarity. ∈ᵒᶜᶜᵘʳˢⁱⁿ looks like an ∈ and inherits its characteristics like precedence and associativity. in is not exactly like ∈ in this way, though, can’t suffix it to make a derived operator. Can’t dot-broadcast it, either.
This query comes from a language discussion, about the value of syntactic sugar and obviousness for readability+learning, and how python does that well for basic things, with substring search as an example.
A problem with both “occursin” and “contains” is some mental ambiguity about the argument orders (they are opposites). In a language with “.method()” syntax like c++, there’s no ambiguity as you would use a syntax like
if ( something.contains( pattern ))
which matches English grammar. That’s why the infix operator version used in python is also clear, and would be in Julia.
The fact that someone defined Base.in(str,str) as a function which throws an exception telling you to use “occursin”, demonstrates how intuitive it is as an infix operator to people
While the unicode ones are interesting, I’d hesitate to advocate for the Julia over Python by saying “the simplest way to check for a substring is to write your own personal unicode-named function (that only you will know, and won’t appear in any sample code) and call that”
No… In English you say “the oven contains the pizza”, not " contains the pizza, the oven" :).
Not trying to start a fight, this examp!e was just an observation that came up compared to python, when discussing computer languages for a technical game artist to learn for data processing.
the goal of any programming language shouldn’t be “it literally reads like correct, complete English sentences”, the python example should be plenty clear.
While "piz" in "pizza" throws an error, 'z' in "pizza" works just fine. Julia treats Strings as a sequence of Chars, and in is consistently used to check if an element is present in a collection.
Python’s in actually treats strings as special. (1,2) in (1,2,3) seems just as natural as "piz" in "pizza", but in only searches for substrings, not subsequences in general. I think the reason Julia’s in(::String, ::String) throws an error is more because many new users had used Python and expect this special treatment of strings. Habits are often mistaken for intuition.
This of course is not relevant to the merits of making occursin or contains infix operators, it’s just a comment on the error for in(::String, ::String).
It’s not hard to sympathize with the idea that code is more readable if it resembles normal speech. After all, that is why people use pseudocode. However, I don’t think special-casing contains in the parser is a good solution, syntax-wise. It will just cause confusion as to why contains is infix, when occursin isn’t. And then startswith, endswith etc…
I don’t know why in is special-cased - perhaps because it’s used in for-loop syntax, so it needs to be special-cased anyway.
The solution to this probably isn’t to figure out how to hack Julia with clever mechanisms to allow infix operations. The simplest and best solution here is just to not have it be infix and live with the fact that here, the syntax doesn’t match English.
I think you should just ignore occursin, and use contains. If I don’t misremember, the latter simply replaces the former, and occursin is mostly around for backwards compatibility.
I’m not so sure in should work on substrings. I would rather see a dedicated operator for this.
Oh, BTW, this post: Please read: make it easier to help you has some good tips on writing posts. In particular, it shows you how to include formatted and inline code, so you don’t have to use e.g. bold.
No, that’s not true. The main reason contains was introduced was because the curried version contains(needle) was deemed useful. There is no intention to remove occursin any time soon.
This custom-operator infix notation doesn’t work for me, using Julia 1.7.0 on both Windows and Linux. (Regular invocation using parentheses works, however.) Wonder why . . .
update 1
Actually, it does work, if I copy-paste @stevengj’s code-snippet above. But for some reason when I type it into the REPL, I get an error:
ERROR: syntax: extra token """ after end of expression
Stacktrace:
[1] top-level scope
@ none:1
I don’t get this error when I type directly using prefix notation: