Julia equivalent of 'rm *' (almost)

compleat · May 14, 2020, 3:43pm

Hello, let me begin by saying this is not a gripe, but rather a warning to users.

I have lost lots of time on the following ‘fat finger’ error in Julia.

Consider below.

julia> x=[1;2;3];

julia> y=x[x.==1]
1-element Array{Int64,1}:
1

This is fine. But what if I slip and do

julia> y=x[x.=1]
3-element Array{Int64,1}:
1
1
1

julia> x
3-element Array{Int64,1}:
1
1
1

x is all gone!

A typo of 1 character (a missing ‘=’) and all my data is gone!

It took me a long time to figure out what was going wrong.

In unix, you can erase everything with 4 characters ‘rm *’, and many
worried about that.

Julia beats it. One-character missing and your matrix (or DataFrame)
of data is gone!

[Just a warning to other users who may have made this mistake and wondered why all their data was gone]

Henrique_Becker · May 14, 2020, 8:53pm

I have to say that in many (most?) C-like languages the difference between variable == constant and variable = constant will be losing your data. It is not something very Julia-specific. Some programmers of the past called for something called yoda conditions, that is basically: never do variable == constant if you can do constant == variable, this way a missing = will give you a compilation time error (on most C-like languages).

Not that I do not think your cautionary tale is not useful (the existence of something like yoda conditionals corroborate it), but it is a little more general problem, and one of the few thing that make me thing if the syntax = (assign) vs == (test) thing was not a major mistake from earlier language designers.

compleat · May 14, 2020, 9:01pm

Hi. Thanks for the suggestion. I will try to remember your suggestion for coding. It seems safer.

hendri54 · May 15, 2020, 1:09am

The question in my mind is: Is there a good reason not to disallow assignment inside an indexing expression, such as y=x[x.=1]?
I find it hard to think of a case where this would be intended.

Henrique_Becker · May 15, 2020, 2:27am

hmmm, maybe it is my inner C programmer talking, but vector[a+=1] is something I already did sometimes and I am found of.

To be fair, I remember seeing a Julia issue thread that discussing the fact that they would need a deprecation cycle to use the end keyword like vector[end] == last(vector) just because there was code out there that had things like vector[begin ... multiple lines of computation ... end] = x and they would break with the parser change, XD.

compleat · May 15, 2020, 8:27am

That was in my mind, but I was afraid to ask it on here.
You are very brave.

Tamas_Papp · May 15, 2020, 8:43am

Technically, .= is not assignment (that is =) but broadcasting. That said, both are valid expressions so one can use their values (even if that would be considered bad style by some people in some contexts).

Generally, even with the best intentions, the parser can’t really protect you from typos like this. It could be in a function, eg

mask(x) = x .= 1 # I meant x .== 1
x[mask(x)]       # ouch

compleat · May 15, 2020, 9:11am

But even if the parser could protect you from from the more likely scenario (such as I mentioned), wouldn’t it be worth considering?

pfitzseb · May 15, 2020, 9:22am

A parser can’t protect your from valid code. A linter can, though.

compleat · May 15, 2020, 9:27am

That’s if the linter is programmed to look for it. The one in Atom (where I lost my data a few times already) obviously isn’t.

pfitzseb · May 15, 2020, 9:28am

Well, Juno doesn’t ship with a linter.
But yes, you’re right, of course.

orialb · May 15, 2020, 9:30am

What do you mean by “gone”?
Presumably if this data was a result of a long an expensive computation and you are now analyzing it interactively, you loaded it from some file so you can just reload it from the file.

If it was a result of a cheap computation you can just re-run the computation.

(I agree that it can be annoying if “cheap” here means ~ 15 mins. , but still I wouldn’t call this “gone” as in rm *)

StefanKarpinski · May 15, 2020, 1:40pm

Please quote your code: Please read: make it easier to help you - #11

compleat · May 15, 2020, 2:22pm

As I said above

[a slip when intending y=x[x.==1] ]

Sets all components of x to 1.

It might be convenient if this were disallowed. I have overwritten datasets which took a long time to compile with this error.

Tamas_Papp · May 15, 2020, 3:27pm

The same argument applies to just

x .= 1  # typo, in place of x .== 1

which can also overwrite data accidentally.

In general there is a trade-off between compact syntax and typos doing something unexpected. Eg the following could be a valid use case (if somewhat contrived, and also bad style):

a = rand(Bool, 50)              # want a .| b for flags
b = rand(Bool, 50)
x = rand(Int, 50)
y = rand(Int, 50)
x[a .= a .| b]                  # save result in a
y[a]                            # reuse

mbauman · May 15, 2020, 3:43pm

Note that this can also be used to reuse storage for the computed result — which can sometimes be beneficial. E.g., you can easily transform f(x .== y) to f(cache .= (x .== y)) to save on allocations in some loops.

julia> A = rand(1:10000, 10, 10);

julia> function f(A)
           x = 0
           for i in 1:10000
               x += sum(A[A .> i])
           end
           return x
       end
f (generic function with 1 method)

julia> f(A); @time f(A);
  0.003767 seconds (30.00 k allocations: 5.840 MiB)

julia> function g(A)
           cache = similar(A, Bool)
           x = 0
           for i in 1:10000
               x += sum(A[cache .= (A .> i)])
           end
           return x
       end
g (generic function with 1 method)

julia> g(A); @time g(A);
  0.002144 seconds (10.00 k allocations: 4.467 MiB)

Here it didn’t completely help us as indexing (and even views of logical indices) require allocations themselves, but reusing a cache like this can be helpful at times. That’s actually the primary reason why this is supported at all.

hendri54 · May 15, 2020, 3:57pm

Personally, I would prefer

for i in 1:10000
  cache .= (A .> i)
  x += sum(A[cache])
end

over

for i in 1:10000
  x += sum(A[cache .= (A .> i)])
end

because (for me) it is too easy to overlook the assignment inside the [].

orialb · May 15, 2020, 7:22pm

If you are using a DataFrame you can use the @where macro from DataFramesMeta:

y = @where(df, :x .== 1) # @where(df,:x.=1) errors

I actually make this typo when using @where quite a lot
(I also accidentally overwritten a dataframe by running the wrong Jupyter notebook cell several times today and had to re-create it which took a few minutes every time I made this mistake, I guess that is my punishment for making the comment above )

Palli · May 20, 2020, 6:51pm

The first ever Linter (for any language) I’ve tried, flagged it, but for the wrong reason:

(@v1.5) pkg> add https://github.com/tonyhffong/Lint.jl

lintstr("x=[1;2;3]; y=x[x .= 1]")
1-element Array{LintMessage,1}:
 none:23 E321 .: use of undeclared symbol

It seems like it doesn’t understand broadcasting (but at least the Linter has been updated to ‘’‘run’‘’ in Julia 1.0, just needs to be improved).

It doesn’t really matter if code is flagged for the right or wrong reason, or if it flags a bit too much. But it probably flags way to much, e.g. the yoda conditions. And I’m not up-to-speed on the Linter situation in Atom (where available for) or VSCode (I think not, but that editor seems to be the future). Microsoft has some new liner support where code gets underlined as you write it.

Topic		Replies	Views
My Brain Is Hard-Wired Against ==; Help Me, Julia New to Julia question , syntax	97	14434	July 5, 2021
Difference between 'x.=y' and 'x=y' New to Julia question	6	446	January 5, 2024
On using `=` vs `:=` for assignment Offtopic	75	2457	October 21, 2024
Aliases for `=` and `==` Internals & Design proposal	43	3180	May 25, 2021
Why assignment operators return the right-hand-side Internals & Design	19	1591	July 23, 2024

Julia equivalent of 'rm *' (almost)

Related topics