Confusing/misleading error message for a beginner

Note: This was originally meant to be a question, but I solved it along the way. However, I want to post it here anyway because it feels as a common beginner mistake and I wonder what could be done to improve the error message, so that the next person can have a better experience.

If I have a dataframe like this

4×2 DataFrame
 Row │ a      b      
     │ Int64  String 
─────┼───────────────
   1 │     1  foo
   2 │     2  bar
   3 │     3  foo
   4 │     4  baz

and I have a vector bs = ["bar", "baz", "zap", "zip"]

And I want to take a subset using the @subset macro, “give me all rows where :b is one of the elements in the vector”

My intuition (as for many others, probably) is to do

@subset df :b .∈ bs

but that gives an error message

use occursin(needle, haystack) for string containment

error(::String)@error.jl:35
in(::String, ::String)@search.jl:644
_broadcast_getindex_evalf@broadcast.jl:670[inlined]
_broadcast_getindex@broadcast.jl:643[inlined]
getindex@broadcast.jl:597[inlined]
copy@broadcast.jl:899[inlined]
materialize@broadcast.jl:860[inlined]
...

Okey… :confused: :confounded:

So I try "foo" ∈ ["foo", "bar"] - that works!

Well, whatever, I alter my code and try to follow the “advice” of the error

@subset df occursin.(:b, bs)

And that gives an empty DataFrame as a result, but no error message.

Hmm… have I misunderstood occursin?

I try: occursin("foo", bs)

But that doesn’t work! Aha!

 no method matching occursin(::String, ::Vector{String})

Closest candidates are:
occursin(::Union{AbstractChar, AbstractString}
...

Alright, so occursin is not even the function I am looking for! The error gave me a misleading advice. occursin seems to have to do with finding substrings in strings. Well well. I look through the docs and find a contains function as well, but that seems to be pretty much the same function with other argument order.

Well, I return to the in operator and remember seeing something about that broadcasting can be tricky in some cases, and that you might have to “protect” it whatever that means.

After looking through the in docs I see there is this Ref thing, and voila it works.

Solved!

Okey, so why am I posting this?

  1. Could the error message be improved so that a) it is not as misleading and b) maybe gives a better hint - not using Ref seems to me to be common error “pattern” for a new user, whereas occursin sounds more specific?

  2. Could this be added as an example in the DataFrameMeta documentation?


On a related note: I have mixed feelings about Julia. The syntax is very readable in many cases and multiple dispatch is a fun concept - but error messages can be very hard to get clues from as a new user, it is hard to guess how to correct code, and it can be hard to discover how to do things both packages. Eg. using Plots, there is no way to “autocomplete” and find how to do different things, you have to return to the docs and fint the right symbol. The latter seems like a hard problem to solve since there is no way to do something like plot.{PRESS TAB} do discover properties. I literally used OpenAI ChatGPT today to try to figure out how to do things :joy:

(If I would use Julia on a daily basis, I guess that many things would become second nature and discoverability would not be so much of a problem. But that is not the use-case for me. I think of Julia as a lang in my toolbelt to use from time to time whenever suitable)

It is as if the most simple things are very easy in Julia, but after that there is a big jump from beginner to advanced. The learning curve feels discontinuous. Maybe my experience is coloured by inheriting an existing code base (without colleague) with a lot of implicit imports (using XYZ populating the scope with things) and a lot of DataFrame-stuff with its own mini-language. I have not solved the problem of how to self-guide myself in Julia in an effective way. Rust and rust-analyzer is like the opposite spectrum. The syntax is more intricate, but when there are errors it is often easy to figure out what the next step is.

5 Likes

For a reference:

This example is given in Working with DataFrames · DataFrames.jl as last example (in a bit different form but essentially the same).

It is also covered in Julia for Data Analysis book in chapter 5 here. (of course there is no need to buy the book - you can run through code or notebooks to see the examples)

1 Like

Right, I guess you learn to read the error messages over time and understand them better, but I want to point out JuliaSyntax.jl the new parser which does actually have better error messages!

I haven’t actually tried it myself, since I’m adjusted and ok enough by now with Julia’s errors, but let me know it it helps. I also see in its readme a link to:

The paper P2429 - Concepts Error Messages for Humans is C++ centric, but has a nice review of quality error reporting in various compilers including Elm, ReasonML, Flow, D and Rust.

Julia (with or) without that package may not have the best error messages, but by far not the worst of all languages. C++ has terrible error messages at least if you use templates (which is it generic feature, and Julia is generic by default, so competing with such).

There are also other tools:

that I haven’t tried (might conflict with the other package?) and JET.jl, I’m just barely tried, and Aqua.jl and a linter also available. More to be aware of in this context?

Well and I forget:

Not sure if supports VS Code too, if it does and is useful, maybe it and some of the best tools should be bundled with?

And how did that work for you? See my post on it and question in the off-topic category. I was asking about AI tools for writing code, but I can see such tools, even that one, helping for decoding error messages. Feel free to answer here about it and/or there. Maybe we should add a new topic about such tools to use with error messages. Or I’m happy to add a question to my post there. I kind of regret putting it under off-topic, I think AI tools are very much on-topic for Julia, or will be, and at least already relevant for other languages.

Potentially tab-completion is also an area for AI. There are already topics here about that and new possible syntax (in part to help with TAB-completion), in the internal category, and maybe elsewhere. I look forward to what’s possible with or without AI, or learning what’s maybe already possible. There is e.g. a package ObjectOriented.jl for single-dispatch OO, as in Python, but I don’t want to use a package/alternative syntax to get TAB-completion (I believe it doesn’t offer that, and I doubt someone would make a tool just for a specific package/style). Or at least I would want it for both styles (at least for the default idiomatic Julia code).

I only sort of follow the syntax posts (this is 3rd and latest proposal, tab-completion may have been discussed more in the older two posts, and there actually even one more, linked from some of them…):

1 Like

What’s funny, is that we already do have autocomplete for it because property dot-autocomplete works in our tooling. What we don’t have is autocomplete for generic methods, which is Julia’s idiomatic style.

In other words, we are being encouraged by our tools to betray Julian style in favor of OO style. Thankfully the community hasn’t done that, so there is instead, broadly speaking, poor method discoverability. :sweat_smile:

And these are some of the better-documented language features; some of the other packages are always under such rapid development, that their documentation is in a continuously-mildly-broken state and you have to experiment to find what you’re after. Some have set up RSS feeds so users could keep abreast of updates. And some of the most sophisticated packages use args...; kwargs... for their arguments so you can’t even figure out which arguments are valid from their method signatures :stuck_out_tongue_winking_eye:

We discussed some ideas for how method autocomplete could work here, in the context of how a chaining syntax could help the issue; @Palli linked the latest chaining syntax proposal.

1 Like

Hi Bogumił, thanks for answering!

Yes, you are right - it is there.

Two quick documentation suggestions from a newcomer:

  1. Add one extra example with this case here since what a person in my situation will do is to look at the @subset documentation. Just one more row?
    @subset(df, :category .∈ Ref(categories))

  2. Add a short sentence explaining why Ref is needed (and provide link for further reading), in the place above and also in the place where you linked (in the DataFrame-docs).

You are the expert regarding this package, but I am more expert in being a newcomer than you are :stuck_out_tongue_winking_eye:

2 Likes

Glad to hear about JuliaSyntax.jl.

You are right that there are many languages with terrible error messages. I guess that in Julias case, it is more that I… expected it to be even nicer:)

Regarding ChatGPT… well, I recommend you to try it yourself while it is free. It is pretty impressive, but not perfect. What is most fascinating is the possibility to feedback and “help it help you”. It will often provide code that is not really right, but if you describe what you think is going wrong it may figure it out.

Regarding Plots and discoverability, here is an example:



5 Likes

Fixed in Improve examples in the manual in basics.md by bkamins · Pull Request #3236 · JuliaData/DataFrames.jl · GitHub and improve subsetting explanations by bkamins · Pull Request #345 · JuliaData/DataFramesMeta.jl · GitHub.

Also notice that the intended way to write it is to use @rsubset which is cleanest:

@rsubset(df, :category ∈ categories)
2 Likes

Cool! That was quick!:slight_smile:

You’ve already had some good feedback and thanks to your detailed original post helped improve the documentation (thank you), but just two more cents from me (both of which maybe slightly opinionated):

For one, I believe you’re only likely to have a good Julia experience if you get reasonably comfortable with the base language (which includes things like broadcasting, “protecting” collections from being iterated over in broadcasting etc.) Related to that I think it is most of the time detrimental when working with DataFrames to try to reason about issues with specific macro calls or think about a “DataFrames issue”, when what you’re really doing is calling a function on vectors. I consider myself a reasonably proficient DataFrames user but I struggled to reason about what @subset df :b .∈ bs would actually do (tbf I don’t tend to use the macros). In contrast, writing it like this:

["foo", "bar"] .∈ ["bar", "baz"]

makes it immediately obvious what is going on (and btw also reveals that you only got the “misleading” error message by chance, because in your example df.b and bs have the same length - although I can’t say whether you would have found the DimensionMismatch error thrown if length(bs) != nrow(df) more helpful!)

Which bring me to my second point: in my opinion the way to use is almost always as a Fix2 function like this:

julia> in(bs).(df.b)
4-element BitVector:
 0
 1
 0
 1

(although I have no idea how that would work in the @subset macro)

1 Like

And this example is also given in the DataFrames.jl manual :smile:.

2 Likes

Guys, newbie here.

Lesson “Factorizations” from “Introduction to Julia (for programmers)” course teaches you about lu(matrix) function. Then I got this message:

julia> A = randn(3,3)
3×3 Matrix{Float64}:
0.755352 -1.38128 0.0734177
0.507111 2.30187 1.01948
-0.833079 1.09402 -2.687

julia> l,u,p = lu(A)
ERROR: UndefVarError: lu not defined
Stacktrace:
[1] top-level scope
@ REPL[188]:1

> Blockquote

The lesson didn’t mention you’re now using a LinearAlgebra package function. I just wanted to add this somewhere as I couldn’t find a topic related to this course. This might be obvious, but I needed to look up on the internet to proceed.

Anyway, I hope this helps