Hello, do you know an example in which a copy, df[:,:a
], should be used instead of a view with df[!,:a]
?
Thanks !
Please donβt revive a three-year old thread like this. Make a new post instead.
That said, if you have a function that modifies a vector in-place, you will want to do df[:, :a]
instead of df[!, :a]
.
function make_ten!(x)
x[1] = 10
end
make_ten!(df[!, :a]) # modifies df
make_ten!(df[:, :a]) # does not modify df
your statement and comment in example contradicts?
Statement should read
if you have a function that modifies a vector in-place, you will want to do
df[!, :a]
instead ofdf[:, :a]
.
using DataFrames
df = DataFrame(rand(10,3),["a","b","c"])
function make_ten!(x)
x[1] = 10
end
make_ten!(df[:, :a])
Using colon, df
is not updated:
julia> df[!,:a]
10-element Vector{Float64}:
0.6133749814711238
0.9022335210145525
0.5930273630651568
0.25727987475397907
0.5368154177958848
0.9789575335373208
0.07748891516310152
0.8386526410191439
0.9929637176775048
0.5485057586874986
This is assuming you donβt want to modify df
. If you want the changes the function makes to persist inside the data frame, then yes, use !
.
Thanks @pdeffebach and @cchderrick . Sorry I revived a thread. Is it a problem because it was old? I thought that it was a very related question. Now I will always create new threads.
Regarding the question, thanks for the answers. Would it be good that the compiler gives me a warning or error if trying to do the following:
make_ten!(df[:, :a])
?
Because what I should use is either
make_ten(df[:, :a])
or
make_ten!(df[!, :a])
I mean so as to avoid mistakes.
Do you agree?
Not sure what you mean by βmistakeβ.
Thereβs nothing wrong with any of the three operations you give.
make_ten!(df[:, :a]) # Saves memory, doesn't modify data frame
make_ten(df[:, :a]) # Ultra-safe, doesn't modify data frame
make_ten!(df[!, :a]) # Saves memory, modifies data frame
Thereβs no βright answerβ, its depends on what you want to do. And no, Juliaβs compiler generally does not do this sort of thing.
Thanks. By mistake I mean that I use
make_ten!(df[:, :a])
thinking that it will mutate df
because the !
in the function name.
Now, with your explanation, I see that there is a memory benefit in using that particular operation.
Thanks!
Once you do df[:, :a]
, that object (a vector), knows nothing about df
. It has no connection at all with the data frame.
The same goes for df[!, :a]
in the sense that itβs behavior does not depend on being from a data frame. However df
still shares the memory with df[!, :a]
.
So if within the same scope I do:
df = DataFrame(x=[1,2,3], y=[4,5,6])
df[:,:x]=[7,8,9]
is like doing nothing, right? I mean because the new information of df[:,:x]
is lost automatically.
No, thatβs assigning (setindex
) not retrieving (getindex
).
df[:, :x] = [7,8,9]
does modify the data frame, since you are using setindex!
, i.e. assigning the column.
df[:, :x] = ...
and df[:, :x]
on itβs own do different things. But the intuition is the same with !
and :
when you are doing setindex
julia> df = DataFrame(a = [1, 2, 3]);
julia> x = [5, 6, 7];
julia> y = [8,9, 10];
julia> df[:, :x] = x;
julia> df[!, :y] = y;
julia> x[1] = 100;
julia> df
3Γ3 DataFrame
Row β a x y
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 5 8
2 β 2 6 9
3 β 3 7 10
julia> y[1] = 100;
julia> df
3Γ3 DataFrame
Row β a x y
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 5 100
2 β 2 6 9
3 β 3 7 10
My recommendation is the following (of course it is only a recommendation):
- by default always use
df[:, :col]
as it is safer - use
df[!, :col]
if- speed or memory consumption is important for you (and you are sure that if you modify the data you extracted you will not mess up the source data)
- or you want to modify the contents of the column in-place
This applies to getting the column. For setting a column the difference is that df[!, :col] = ...
replaces the column, while df[:, :col] = ...
updates it in-place. The difference mostly matters when you want to assign values of other type than originally stored in the given column.
Thanks. What is the meaning of βupdates in placeβ ?
If I do df[!, :col] = y
and then I modify y,
then df
will be modified also?
This is a bit nuanced, but yes df[:, :x] = ...
will modify the vector stored in column :x
directly. The reasoning for this behavior is a complicated, but derives from the fact that this is how base julia matrices behave.
julia> df = DataFrame(x=[1,2,3], y=[4,5,6])
3Γ2 DataFrame
Row β x y
β Int64 Int64
ββββββΌββββββββββββββ
1 β 1 4
2 β 2 5
3 β 3 6
julia> t = df[!, :x];
julia> df[:, :x] = [40, 50, 60];
julia> t
3-element Vector{Int64}:
40
50
60
Thanks, I will have to practice all of this in the REPL because its all new for me.
Thanks, I think I am starting to understand but have some problem with the behavior in the assignments:
df[ ,:x] = ...
Could you give an example of this affirmation: βThe difference mostly matters when you want to assign values of other type than originally stored in the given column.β ?
Best.
Itβs very niche, but the most dramatic example is characters and integers. df[:, :b] = ...
will auto-promote to preserve type (like julia arrays), while df[!, :b] = ...
will preserve the type of the new addition. See:
julia> using DataFrames
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> x = ['p', 'u', 'z'];
julia> df[:, :x] = x;
julia> df[!, :b] = x;
julia> df[:, :a] = x;
julia> df
3Γ3 DataFrame
Row β a b x
β Int64 Char Char
ββββββΌβββββββββββββββββββ
1 β 112 p p
2 β 117 u u
3 β 122 z z
Thanks, I see.