In the following DataFrame, how do I change the age of Sally without knowing the row number and without changing any other values in the DataFrame?
I have looked at the DataFrames, DataframesMeta and Query documentation and could not figure out a clean way of doing it.
df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])
3×3 DataFrame
│ Row │ name │ age │ children │
│ │ String │ Float64 │ Int64 │
├─────┼────────┼─────────┼──────────┤
│ 1 │ John │ 23.0 │ 3 │
│ 2 │ Sally │ 42.0 │ 5 │
│ 3 │ Kirk │ 59.0 │ 2 │
1 Like
This should work.
df[(df[:name] .== "Sally"),:age]=99
Wait a bit, here is something strange.
I have a DataFrame dx
julia> dx = collab_matrix()
249×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD │ AE │ AF │ AG │ AI │ AL │ AM │ AO │ AQ │ AR │ AS │ AT │ AU │ AW │
│ │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ AD │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 2 │ AE │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 3 │ AF │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4 │ AG │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 5 │ AI │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 6 │ AL │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 7 │ AM │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 8 │ AO │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 9 │ AQ │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 10 │ AR │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 11 │ AS │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 12 │ AT │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 13 │ AU │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
⋮
│ 236 │ VA │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 237 │ VC │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 238 │ VE │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 239 │ VG │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 240 │ VI │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 241 │ VN │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 242 │ VU │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 243 │ WF │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 244 │ WS │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 245 │ YE │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 246 │ YT │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 247 │ ZA │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 248 │ ZM │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 249 │ ZW │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
julia> d = first(dx, 5)
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD │ AE │ AF │ AG │ AI │ AL │ AM │ AO │ AQ │ AR │ AS │ AT │ AU │ AW │
│ │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ AD │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 2 │ AE │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 3 │ AF │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4 │ AG │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 5 │ AI │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
julia> d
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD │ AE │ AF │ AG │ AI │ AL │ AM │ AO │ AQ │ AR │ AS │ AT │ AU │ AW │
│ │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ AD │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 2 │ AE │ 0 │ 0 │ 0 │ 6 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 3 │ AF │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4 │ AG │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 5 │ AI │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
This works as expected, but
julia> dx[(dx[:country] .== "AE"),:AG]=6
6
julia> first(dx, 6)
6×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD │ AE │ AF │ AG │ AI │ AL │ AM │ AO │ AQ │ AR │ AS │ AT │ AU │ AW │
│ │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1 │ AD │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 2 │ AE │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │ 6 │
│ 3 │ AF │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4 │ AG │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 5 │ AI │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 6 │ AL │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
It changes the whole row as soon as I work with the bigger DataFrame!
A bug?
I have worked on quite big dataframes and haven’t noticed such an issue. Check if are using the latest version of packages.
johann.spies:
A bug?
This should not happen. Can you please submit a reproducible example as an Issue in DataFrames.jl?
As a side note - in the future we will probably make DataFrames.jl consistent with Base and assigning a scalar to a collection of row entries will require broadcasting using .=
, but this is still being discussed.
How is collab_matrix()
defined? I think the columns are all pointing to the same array, which happens if you do something like this:
julia> using DataFrames
julia> a = [0, 0, 0];
julia> dx = DataFrame(AD = a, AE = a);
julia> dx[1, 1] = 1;
julia> dx
3×2 DataFrame
│ Row │ AD │ AE │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 1 │
│ 2 │ 0 │ 0 │
│ 3 │ 0 │ 0 │
2 Likes
Thank you kdyrhage , you are on the money!
In my collab_matrix()
I used exactly that method to populate each column with a series of zeros.
I had
l = repeat(0:0, inner = r)
for i in 1:r
_,z = size(dx)
e = c[:code][i]
insertcols!(dx, z+1, Symbol(e) => l))
in my function. Changing insertcols!
to
insertcols!(dx, z+1, Symbol(e) => repeat(0:0, inner = r))
did the trick.
Valuable lesson learnt.
ampbj
February 1, 2021, 11:43am
9
@bkamins I am wondering if there is a faster way to replace value of a cell in place?
I am working on an algorithm that does a lot of dataframes manipulation in place and does not seem to be running fast? or perhaps dataframes is not designed for these operations?
one operation seems like to be taking around 3 ms.
@btime data[data.Timestamp .== "20201231 165800", :Close] = [100]
2.665 ms (8 allocations: 50.03 KiB)
Thank you
bkamins
February 1, 2021, 1:11pm
10
This is not related to DataFrames.jl. Use GitHub - andyferris/AcceleratedArrays.jl: Arrays with acceleration indices for :Timestamp
column for such use cases if you need to be fast.
3 Likes
Expanding on @Ajaychat3 ’s answer for those wondering how to do this with multiple conditions, you can do this:
df[((df.name .== "Sally") .& (df.ancient .== true)), :age] .= 99
You have to use &
because you are creating a BitArray
, and you need to broadcast it because there are two of them. You may also need to broadcast the assignment (.=
) to change multiple values.
nilshg
May 12, 2021, 9:14pm
12
You’ve got one redundant set of brackets there,
df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= 99
will do.
2 Likes
it is allowing me change age to a number. im intersted to know how can i change “sally” age to missing ?
Thank you
nilshg
February 14, 2022, 4:25pm
14
df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= missing
(actually given I complained about extra brackets above, now with broadcasted &&
we can do
df[df.name .== "Sally" .&& df.ancient .== true, :age] .= missing
but for this the age
column has to be a Union{Missing, Int64}
, so one might have to do allowmissing!(df, :age)
first.
2 Likes
bkamins
February 14, 2022, 4:29pm
16
df.age = @. ifelse(df.name == "Sally" && df.ancient, missing, df.age)
or with DataFramesMeta.jl
@rtransform :age = ifelse(:name == "Sally" && :ancient, missing, :age)
2 Likes
ancient column is not given in the initial query, can i ask how it there in the code ?
bkamins
February 14, 2022, 4:37pm
18
I am not sure what you ask about exactly. I have re-written the code that @nilshg proposed in his answer.
1 Like
sorry my mistake. i have got the answer from you and @nilshg replies. @johann.spies has only three columns in the dataframe when he created query. suddenly df.ancient started to appear in solutions. i did not understand the source , from whre df.ancient did come from. so i asked is that any reserved term or any other rationale behind that.
Thank you for clearing my doubt. this is just an curious question. but i got solution for my actual problem from your previous post.
1 Like
Maybe a better solution in terms of performance and how to handle missings.
@rtransform df @passmissing :age = :name == "Sally" ? missing : :age