Julia equivalent of 'rm *' (almost)

Hello, let me begin by saying this is not a gripe, but rather a warning to users.

I have lost lots of time on the following ‘fat finger’ error in Julia.

Consider below.

julia> x=[1;2;3];

julia> y=x[x.==1]
1-element Array{Int64,1}:
1

This is fine. But what if I slip and do

julia> y=x[x.=1]
3-element Array{Int64,1}:
1
1
1

julia> x
3-element Array{Int64,1}:
1
1
1

x is all gone!

A typo of 1 character (a missing ‘=’) and all my data is gone!

It took me a long time to figure out what was going wrong.

In unix, you can erase everything with 4 characters ‘rm *’, and many
worried about that.

Julia beats it. One-character missing and your matrix (or DataFrame)
of data is gone!

[Just a warning to other users who may have made this mistake and wondered why all their data was gone]

1 Like

I have to say that in many (most?) C-like languages the difference between variable == constant and variable = constant will be losing your data. It is not something very Julia-specific. Some programmers of the past called for something called yoda conditions, that is basically: never do variable == constant if you can do constant == variable, this way a missing = will give you a compilation time error (on most C-like languages).

Not that I do not think your cautionary tale is not useful (the existence of something like yoda conditionals corroborate it), but it is a little more general problem, and one of the few thing that make me thing if the syntax = (assign) vs == (test) thing was not a major mistake from earlier language designers.

4 Likes

Hi. Thanks for the suggestion. I will try to remember your suggestion for coding. It seems safer.

The question in my mind is: Is there a good reason not to disallow assignment inside an indexing expression, such as y=x[x.=1]?
I find it hard to think of a case where this would be intended.

1 Like

hmmm, maybe it is my inner C programmer talking, but vector[a+=1] is something I already did sometimes and I am found of.

To be fair, I remember seeing a Julia issue thread that discussing the fact that they would need a deprecation cycle to use the end keyword like vector[end] == last(vector) just because there was code out there that had things like vector[begin ... multiple lines of computation ... end] = x and they would break with the parser change, XD.

That was in my mind, but I was afraid to ask it on here.
You are very brave. :slight_smile:

Technically, .= is not assignment (that is =) but broadcasting. That said, both are valid expressions so one can use their values (even if that would be considered bad style by some people in some contexts).

Generally, even with the best intentions, the parser can’t really protect you from typos like this. It could be in a function, eg

mask(x) = x .= 1 # I meant x .== 1
x[mask(x)]       # ouch
4 Likes

But even if the parser could protect you from from the more likely scenario (such as I mentioned), wouldn’t it be worth considering?

A parser can’t protect your from valid code. A linter can, though.

4 Likes

That’s if the linter is programmed to look for it. The one in Atom (where I lost my data a few times already) obviously isn’t.

Well, Juno doesn’t ship with a linter.
But yes, you’re right, of course.

What do you mean by “gone”?
Presumably if this data was a result of a long an expensive computation and you are now analyzing it interactively, you loaded it from some file so you can just reload it from the file.

If it was a result of a cheap computation you can just re-run the computation.

(I agree that it can be annoying if “cheap” here means ~ 15 mins. , but still I wouldn’t call this “gone” as in rm *)

4 Likes

Please quote your code: PSA: make it easier to help you

3 Likes

As I said above

[a slip when intending y=x[x.==1] ]

Sets all components of x to 1.

It might be convenient if this were disallowed. I have overwritten datasets which took a long time to compile with this error.

The same argument applies to just

x .= 1  # typo, in place of x .== 1

which can also overwrite data accidentally.

In general there is a trade-off between compact syntax and typos doing something unexpected. Eg the following could be a valid use case (if somewhat contrived, and also bad style):

a = rand(Bool, 50)              # want a .| b for flags
b = rand(Bool, 50)
x = rand(Int, 50)
y = rand(Int, 50)
x[a .= a .| b]                  # save result in a
y[a]                            # reuse
3 Likes

Note that this can also be used to reuse storage for the computed result — which can sometimes be beneficial. E.g., you can easily transform f(x .== y) to f(cache .= (x .== y)) to save on allocations in some loops.

julia> A = rand(1:10000, 10, 10);

julia> function f(A)
           x = 0
           for i in 1:10000
               x += sum(A[A .> i])
           end
           return x
       end
f (generic function with 1 method)

julia> f(A); @time f(A);
  0.003767 seconds (30.00 k allocations: 5.840 MiB)

julia> function g(A)
           cache = similar(A, Bool)
           x = 0
           for i in 1:10000
               x += sum(A[cache .= (A .> i)])
           end
           return x
       end
g (generic function with 1 method)

julia> g(A); @time g(A);
  0.002144 seconds (10.00 k allocations: 4.467 MiB)

Here it didn’t completely help us as indexing (and even views of logical indices) require allocations themselves, but reusing a cache like this can be helpful at times. That’s actually the primary reason why this is supported at all.

3 Likes

Personally, I would prefer

for i in 1:10000
  cache .= (A .> i)
  x += sum(A[cache])
end

over

for i in 1:10000
  x += sum(A[cache .= (A .> i)])
end

because (for me) it is too easy to overlook the assignment inside the [].

1 Like

If you are using a DataFrame you can use the @where macro from DataFramesMeta:

y = @where(df, :x .== 1) # @where(df,:x.=1) errors 

I actually make this typo when using @where quite a lot :slight_smile:
(I also accidentally overwritten a dataframe by running the wrong Jupyter notebook cell several times today and had to re-create it which took a few minutes every time I made this mistake, I guess that is my punishment for making the comment above :wink: )

The first ever Linter (for any language) I’ve tried, flagged it, but for the wrong reason:

(@v1.5) pkg> add https://github.com/tonyhffong/Lint.jl

lintstr("x=[1;2;3]; y=x[x .= 1]")
1-element Array{LintMessage,1}:
 none:23 E321 .: use of undeclared symbol

It seems like it doesn’t understand broadcasting (but at least the Linter has been updated to ‘’‘run’’’ in Julia 1.0, just needs to be improved).

It doesn’t really matter if code is flagged for the right or wrong reason, or if it flags a bit too much. But it probably flags way to much, e.g. the yoda conditions. And I’m not up-to-speed on the Linter situation in Atom (where available for) or VSCode (I think not, but that editor seems to be the future). Microsoft has some new liner support where code gets underlined as you write it.