DataFrame colon : vs bang ! indexing

I’m learning about DataFrames right now, and I’ve seen some cases where columns are accessed with a bang ! where I’d expect a colon. Here’s an example:

julia> df = DataFrame(name=["Sally", "Bob", "Alice", "Frank"], age=[48; 51; 36; 43], score = [9; 8; 7; 10])
4×3 DataFrame
 Row │ name    age    score 
     │ String  Int64  Int64 
─────┼──────────────────────
   1 │ Sally      48      9
   2 │ Bob        51      8
   3 │ Alice      36      7
   4 │ Frank      43     10

It’s possible to accessage column with a bang:

julia> df[!,2]
4-element Vector{Int64}:
 48
 51
 36
 43

or with a colon:

julia> df[:,2]
4-element Vector{Int64}:
 48
 51
 36
 43

Elsewhere in Julia, it’s standard to access a column with a colon:

julia> a = [1 2 3; 4 5 6; 7 8 9; 10 11 12]
4×3 Matrix{Int64}:
  1   2   3
  4   5   6
  7   8   9
 10  11  12

julia> a[:,2]
4-element Vector{Int64}:
  2
  5
  8
 11

but subbing a bang throws an error:

julia> a[!,2]
ERROR: ArgumentError: invalid index: ! of type typeof(!)
Stacktrace:
 [1] to_index(i::Function)
   @ Base ./indices.jl:300
 [2] to_index(A::Matrix{Int64}, i::Function)
   @ Base ./indices.jl:277
 [3] to_indices(A::Matrix{Int64}, inds::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, I::Tuple{typeof(!), Int64})
   @ Base ./indices.jl:333
 [4] to_indices
   @ ./indices.jl:324 [inlined]
 [5] getindex(::Matrix{Int64}, ::Function, ::Int64)
   @ Base ./abstractarray.jl:1241
 [6] top-level scope
   @ REPL[65]:1

Is there a reason for the inconsistent behavior? Is there a philosophy that I should keep in mind for when to use a colon vs when to use a bang?

From DataFrames.jl’s documentation:

https://dataframes.juliadata.org/stable/man/getting_started/

Columns can be directly (i.e. without copying) extracted using df.col, df."col", df[!, :col] or df[!, "col"] (this rule applies to getting data from a data frame, not writing data to a data frame). The two latter syntaxes are more flexible as they allow passing a variable holding the name of the column, and not only a literal name. Note that column names can be either symbols (written as :col, :var"col" or Symbol("col")) or strings (written as "col"). In the forms df."col" and :var"col" variable interpolation into a string using $ does not work. Columns can also be extracted using an integer index specifying their position.

Since df[!, :col] does not make a copy, changing the elements of the column vector returned by this syntax will affect the values stored in the original df. To get a copy of the column use df[:, :col]: changing the vector returned by this syntax does not change df.

2 Likes

Seems kind of consistent with the notational convention of using ! to annotate functions that mutate. So if you assign to something to df[!,"col"] I suppose follows on the same thread to warn you that you are mutating the original dataframe ?no?.

1 Like

Oooh, I get it now. So if I do

a = df[:,2];
a[2] = 99; 

then I have a copy of the second column of df, and I’ve changed one of the values to 99, but nothing in df has changed. But if I do

b = df[!,2];
b[2] = 99; 

then I’ve futzed with the values in the original df itself. Got it.

Thanks for your help, y’all.

1 Like