"contains" as operator?

I wondered why you can’t search for a substring with “in”, like
“za” in “pizza”

While it would be nice, and match python, after searching, I can also understand the difference between “in” searching for an element, not a sequence of elements.and wanting to be consistent with usage with arrays

Though you can make it work now by doing:
Base.in( pattern::AbstractString, src::AbstractString ) = occursin( pattern, src)

What i wonder is couldn’t the already existing “contains” function be defined as an infix operator? It seems like this wouldn’t break anything.

So, the currently working:
if contains( “pizza”, “za” ) …

could also be written as

if “pizza” contains “za”

1 Like

It is not a good idea to redefine Base.in for AbstractString. Since it is existing Base code, you could break other code in Base that is built upon the old definition.

Regarding arbitrary infix operators, there is this long thread (without any explicit conclusion, AFAIK): https://github.com/JuliaLang/julia/issues/16985

5 Likes

yeah defining it was just a test, I mostly process numbers, not strings.
I wasn’t asking about user-defined custom operators, but rather proposing specifically “contains” as one.

2 Likes

The closest we get is piping the pizzas:

"za" |> occursin("pizza")
"pizza" |> contains("za")
3 Likes

Part of the reason that thread died down is that Julia nowadays has so many infix operators to choose from (infinitely many, in fact), thanks to Unicode + suffixing of existing operators. e.g.

julia> ∈ₛ(a,b) = occursin(a,b)
∈ₛ (generic function with 1 method)

julia> "za" ∈ₛ "pizza"
true

You can even define an operator ∈ᵒᶜᶜᵘʳˢⁱⁿ, in fact.

12 Likes

It’s a lot easier for the parser and you this way. contains seems unambiguous, but without the method call’s mandatory parentheses, you have to specify its precedence in the massive hierachy of operators so something like "pizza" contains "iz" * "za" works. Then you have to communicate an unfamiliar operator and its precedence with everyone you share your code with.

Operator suffixing preserves familiarity. ∈ᵒᶜᶜᵘʳˢⁱⁿ looks like an and inherits its characteristics like precedence and associativity. in is not exactly like in this way, though, can’t suffix it to make a derived operator. Can’t dot-broadcast it, either.

This query comes from a language discussion, about the value of syntactic sugar and obviousness for readability+learning, and how python does that well for basic things, with substring search as an example.

A problem with both “occursin” and “contains” is some mental ambiguity about the argument orders (they are opposites). In a language with “.method()” syntax like c++, there’s no ambiguity as you would use a syntax like

if ( something.contains( pattern ))

which matches English grammar. That’s why the infix operator version used in python is also clear, and would be in Julia.

The fact that someone defined Base.in(str,str) as a function which throws an exception telling you to use “occursin”, demonstrates how intuitive it is as an infix operator to people :slight_smile:

While the unicode ones are interesting, I’d hesitate to advocate for the Julia over Python by saying “the simplest way to check for a substring is to write your own personal unicode-named function (that only you will know, and won’t appear in any sample code) and call that”

3 Likes

Well, that is just one syntactic sugar where Julia loses (and probably there are others). In others, Julia wins, i. e.:

>>> f(x) = x + 1
  File "<stdin>", line 1
SyntaxError: cannot assign to function call

or, not being a python user, this:

>>> def f(x) :
...     x + 1
... 
>>> a = f(1)
>>> print(a)
None

One gets used with the language, and both has pretty natural syntax, with each one its quirks.

I like Julia syntax in general better than python’s, but I don’t think that is the greatest selling point in comparison to it.

4 Likes

Approach 1:

julia> struct Contained end;

julia> contained = Contained();

julia> Base.in(::Contained, haystack) = occursin(haystack);

julia> →(x, f) = f(x);

julia> "za" → contained in "pizza"
true

Approach 2:

julia> macro the_pattern_str(str, _)
           :(ThePattern($str))
       end;

julia> struct ThePattern{T}
           value::T
       end

julia> Base.in(p::ThePattern, haystack) = occursin(p.value, haystack);

julia> the_pattern"za"is_contained in "pizza"
true
2 Likes

but that’s exactly how Julia does it…

occursin(a,b) <-> "a occurs in b"
contains(a,b) <-> "a contains b"

it’s silly to chase some hypothetical pure, perfect infix order/syntax for something, because then python would have:

1.+(2)

instead of 1 + 2.

also the

string.find("substring")

can’t be read “like english” either, maybe it should be find(str1) in str2? /s

3 Likes

No… In English you say “the oven contains the pizza”, not " contains the pizza, the oven" :).

Not trying to start a fight, this examp!e was just an observation that came up compared to python, when discussing computer languages for a technical game artist to learn for data processing.

4 Likes

the goal of any programming language shouldn’t be “it literally reads like correct, complete English sentences”, the python example should be plenty clear.

1 Like

Straw, meet man. I’m done

1 Like

While "piz" in "pizza" throws an error, 'z' in "pizza" works just fine. Julia treats Strings as a sequence of Chars, and in is consistently used to check if an element is present in a collection.

Python’s in actually treats strings as special. (1,2) in (1,2,3) seems just as natural as "piz" in "pizza", but in only searches for substrings, not subsequences in general. I think the reason Julia’s in(::String, ::String) throws an error is more because many new users had used Python and expect this special treatment of strings. Habits are often mistaken for intuition.

This of course is not relevant to the merits of making occursin or contains infix operators, it’s just a comment on the error for in(::String, ::String).

3 Likes

It’s not hard to sympathize with the idea that code is more readable if it resembles normal speech. After all, that is why people use pseudocode. However, I don’t think special-casing contains in the parser is a good solution, syntax-wise. It will just cause confusion as to why contains is infix, when occursin isn’t. And then startswith, endswith etc…

I don’t know why in is special-cased - perhaps because it’s used in for-loop syntax, so it needs to be special-cased anyway.

The solution to this probably isn’t to figure out how to hack Julia with clever mechanisms to allow infix operations. The simplest and best solution here is just to not have it be infix and live with the fact that here, the syntax doesn’t match English.

3 Likes

Another good solution IMO would be to find a unicode operator for contains or occursin that makes sense mathematically and add it to the language.

I think you should just ignore occursin, and use contains. If I don’t misremember, the latter simply replaces the former, and occursin is mostly around for backwards compatibility.

I’m not so sure in should work on substrings. I would rather see a dedicated operator for this.

Oh, BTW, this post: Please read: make it easier to help you has some good tips on writing posts. In particular, it shows you how to include formatted and inline code, so you don’t have to use e.g. bold.

1 Like

No, that’s not true. The main reason contains was introduced was because the curried version contains(needle) was deemed useful. There is no intention to remove occursin any time soon.

3 Likes

This custom-operator infix notation doesn’t work for me, using Julia 1.7.0 on both Windows and Linux. (Regular invocation using parentheses works, however.) Wonder why . . .

update 1

Actually, it does work, if I copy-paste @stevengj’s code-snippet above. But for some reason when I type it into the REPL, I get an error:

ERROR: syntax: extra token """ after end of expression
Stacktrace:
  [1] top-level scope
    @ none:1

I don’t get this error when I type directly using prefix notation:

∈ₛ("za","pizza")

Curious.

update 2

If I use triple-quotes, the infix notation works:

julia> """za""" ∈ₛ """pizza"""
true

Curiouser.

It works for me. Can this be a problem with copy-pasting? Have you tried typing the code directly into the REPL?