Hello, do you know an example in which a copy, df[:,:a], should be used instead of a view with df[!,:a]?
Thanks !
Please don’t revive a three-year old thread like this. Make a new post instead.
That said, if you have a function that modifies a vector in-place, you will want to do df[:, :a] instead of df[!, :a].
function make_ten!(x)
x[1] = 10
end
make_ten!(df[!, :a]) # modifies df
make_ten!(df[:, :a]) # does not modify df
your statement and comment in example contradicts?
Statement should read
if you have a function that modifies a vector in-place, you will want to do
df[!, :a]instead ofdf[:, :a].
using DataFrames
df = DataFrame(rand(10,3),["a","b","c"])
function make_ten!(x)
x[1] = 10
end
make_ten!(df[:, :a])
Using colon, df is not updated:
julia> df[!,:a]
10-element Vector{Float64}:
0.6133749814711238
0.9022335210145525
0.5930273630651568
0.25727987475397907
0.5368154177958848
0.9789575335373208
0.07748891516310152
0.8386526410191439
0.9929637176775048
0.5485057586874986
This is assuming you don’t want to modify df. If you want the changes the function makes to persist inside the data frame, then yes, use !.
Thanks @pdeffebach and @cchderrick . Sorry I revived a thread. Is it a problem because it was old? I thought that it was a very related question. Now I will always create new threads.
Regarding the question, thanks for the answers. Would it be good that the compiler gives me a warning or error if trying to do the following:
make_ten!(df[:, :a])
?
Because what I should use is either
make_ten(df[:, :a])
or
make_ten!(df[!, :a])
I mean so as to avoid mistakes.
Do you agree?
Not sure what you mean by “mistake”.
There’s nothing wrong with any of the three operations you give.
make_ten!(df[:, :a]) # Saves memory, doesn't modify data frame
make_ten(df[:, :a]) # Ultra-safe, doesn't modify data frame
make_ten!(df[!, :a]) # Saves memory, modifies data frame
There’s no “right answer”, its depends on what you want to do. And no, Julia’s compiler generally does not do this sort of thing.
Thanks. By mistake I mean that I use
make_ten!(df[:, :a])
thinking that it will mutate df because the ! in the function name.
Now, with your explanation, I see that there is a memory benefit in using that particular operation.
Thanks!
Once you do df[:, :a], that object (a vector), knows nothing about df. It has no connection at all with the data frame.
The same goes for df[!, :a] in the sense that it’s behavior does not depend on being from a data frame. However df still shares the memory with df[!, :a].
So if within the same scope I do:
df = DataFrame(x=[1,2,3], y=[4,5,6])
df[:,:x]=[7,8,9]
is like doing nothing, right? I mean because the new information of df[:,:x] is lost automatically.
No, that’s assigning (setindex) not retrieving (getindex).
df[:, :x] = [7,8,9]
does modify the data frame, since you are using setindex!, i.e. assigning the column.
df[:, :x] = ...
and df[:, :x] on it’s own do different things. But the intuition is the same with ! and : when you are doing setindex
julia> df = DataFrame(a = [1, 2, 3]);
julia> x = [5, 6, 7];
julia> y = [8,9, 10];
julia> df[:, :x] = x;
julia> df[!, :y] = y;
julia> x[1] = 100;
julia> df
3Ă—3 DataFrame
Row │ a x y
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 8
2 │ 2 6 9
3 │ 3 7 10
julia> y[1] = 100;
julia> df
3Ă—3 DataFrame
Row │ a x y
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 100
2 │ 2 6 9
3 │ 3 7 10
My recommendation is the following (of course it is only a recommendation):
- by default always use
df[:, :col]as it is safer - use
df[!, :col]if- speed or memory consumption is important for you (and you are sure that if you modify the data you extracted you will not mess up the source data)
- or you want to modify the contents of the column in-place
This applies to getting the column. For setting a column the difference is that df[!, :col] = ... replaces the column, while df[:, :col] = ... updates it in-place. The difference mostly matters when you want to assign values of other type than originally stored in the given column.
Thanks. What is the meaning of “updates in place” ?
If I do df[!, :col] = y
and then I modify y, then df will be modified also?
This is a bit nuanced, but yes df[:, :x] = ... will modify the vector stored in column :x directly. The reasoning for this behavior is a complicated, but derives from the fact that this is how base julia matrices behave.
julia> df = DataFrame(x=[1,2,3], y=[4,5,6])
3Ă—2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6
julia> t = df[!, :x];
julia> df[:, :x] = [40, 50, 60];
julia> t
3-element Vector{Int64}:
40
50
60
Thanks, I will have to practice all of this in the REPL because its all new for me.
Thanks, I think I am starting to understand but have some problem with the behavior in the assignments:
df[ ,:x] = ...
Could you give an example of this affirmation: “The difference mostly matters when you want to assign values of other type than originally stored in the given column.” ?
Best.
It’s very niche, but the most dramatic example is characters and integers. df[:, :b] = ... will auto-promote to preserve type (like julia arrays), while df[!, :b] = ... will preserve the type of the new addition. See:
julia> using DataFrames
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> x = ['p', 'u', 'z'];
julia> df[:, :x] = x;
julia> df[!, :b] = x;
julia> df[:, :a] = x;
julia> df
3Ă—3 DataFrame
Row │ a b x
│ Int64 Char Char
─────┼───────────────────
1 │ 112 p p
2 │ 117 u u
3 │ 122 z z
Thanks, I see.