Note: This was originally meant to be a question, but I solved it along the way. However, I want to post it here anyway because it feels as a common beginner mistake and I wonder what could be done to improve the error message, so that the next person can have a better experience.
–
If I have a dataframe like this
4×2 DataFrame
Row │ a b
│ Int64 String
─────┼───────────────
1 │ 1 foo
2 │ 2 bar
3 │ 3 foo
4 │ 4 baz
and I have a vector bs = ["bar", "baz", "zap", "zip"]
And I want to take a subset using the @subset
macro, “give me all rows where :b is one of the elements in the vector”
My intuition (as for many others, probably) is to do
@subset df :b .∈ bs
but that gives an error message
use occursin(needle, haystack) for string containment
error(::String)@error.jl:35
in(::String, ::String)@search.jl:644
_broadcast_getindex_evalf@broadcast.jl:670[inlined]
_broadcast_getindex@broadcast.jl:643[inlined]
getindex@broadcast.jl:597[inlined]
copy@broadcast.jl:899[inlined]
materialize@broadcast.jl:860[inlined]
...
Okey…
So I try "foo" ∈ ["foo", "bar"]
- that works!
Well, whatever, I alter my code and try to follow the “advice” of the error
@subset df occursin.(:b, bs)
And that gives an empty DataFrame as a result, but no error message.
Hmm… have I misunderstood occursin
?
I try: occursin("foo", bs)
But that doesn’t work! Aha!
no method matching occursin(::String, ::Vector{String})
Closest candidates are:
occursin(::Union{AbstractChar, AbstractString}
...
Alright, so occursin
is not even the function I am looking for! The error gave me a misleading advice. occursin
seems to have to do with finding substrings in strings. Well well. I look through the docs and find a contains
function as well, but that seems to be pretty much the same function with other argument order.
Well, I return to the in
operator and remember seeing something about that broadcasting can be tricky in some cases, and that you might have to “protect” it whatever that means.
After looking through the in
docs I see there is this Ref
thing, and voila it works.
Solved!
Okey, so why am I posting this?
-
Could the error message be improved so that a) it is not as misleading and b) maybe gives a better hint - not using Ref seems to me to be common error “pattern” for a new user, whereas
occursin
sounds more specific? -
Could this be added as an example in the DataFrameMeta documentation?
On a related note: I have mixed feelings about Julia. The syntax is very readable in many cases and multiple dispatch is a fun concept - but error messages can be very hard to get clues from as a new user, it is hard to guess how to correct code, and it can be hard to discover how to do things both packages. Eg. using Plots, there is no way to “autocomplete” and find how to do different things, you have to return to the docs and fint the right symbol. The latter seems like a hard problem to solve since there is no way to do something like plot.{PRESS TAB}
do discover properties. I literally used OpenAI ChatGPT today to try to figure out how to do things
(If I would use Julia on a daily basis, I guess that many things would become second nature and discoverability would not be so much of a problem. But that is not the use-case for me. I think of Julia as a lang in my toolbelt to use from time to time whenever suitable)
It is as if the most simple things are very easy in Julia, but after that there is a big jump from beginner to advanced. The learning curve feels discontinuous. Maybe my experience is coloured by inheriting an existing code base (without colleague) with a lot of implicit imports (using XYZ populating the scope with things) and a lot of DataFrame
-stuff with its own mini-language. I have not solved the problem of how to self-guide myself in Julia in an effective way. Rust and rust-analyzer is like the opposite spectrum. The syntax is more intricate, but when there are errors it is often easy to figure out what the next step is.