DataFrame: how to change value of a cell without knowing the row number

johann.spies · March 19, 2019, 8:15am

In the following DataFrame, how do I change the age of Sally without knowing the row number and without changing any other values in the DataFrame?

I have looked at the DataFrames, DataframesMeta and Query documentation and could not figure out a clean way of doing it.

df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])
3×3 DataFrame
│ Row │ name   │ age     │ children │
│     │ String │ Float64 │ Int64    │
├─────┼────────┼─────────┼──────────┤
│ 1   │ John   │ 23.0    │ 3        │
│ 2   │ Sally  │ 42.0    │ 5        │
│ 3   │ Kirk   │ 59.0    │ 2        │

Ajaychat3 · March 19, 2019, 8:18am

This should work.

df[(df[:name] .== "Sally"),:age]=99

johann.spies · March 19, 2019, 8:34am

Thanks @ Ajaychat3

johann.spies · March 19, 2019, 9:00am

Wait a bit, here is something strange.

I have a DataFrame dx

julia> dx = collab_matrix()
249×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 6   │ AL      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 7   │ AM      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 8   │ AO      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 9   │ AQ      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 10  │ AR      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 11  │ AS      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 12  │ AT      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 13  │ AU      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
⋮
│ 236 │ VA      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 237 │ VC      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 238 │ VE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 239 │ VG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 240 │ VI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 241 │ VN      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 242 │ VU      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 243 │ WF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 244 │ WS      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 245 │ YE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 246 │ YT      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 247 │ ZA      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 248 │ ZM      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 249 │ ZW      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

julia> d = first(dx, 5)
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

julia> d
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 6     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

This works as expected, but

julia> dx[(dx[:country] .== "AE"),:AG]=6
6
julia> first(dx, 6)
6×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 6   │ AL      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

It changes the whole row as soon as I work with the bigger DataFrame!

A bug?

Ajaychat3 · March 19, 2019, 10:31am

I have worked on quite big dataframes and haven’t noticed such an issue. Check if are using the latest version of packages.

bkamins · March 19, 2019, 10:35am

This should not happen. Can you please submit a reproducible example as an Issue in DataFrames.jl?

As a side note - in the future we will probably make DataFrames.jl consistent with Base and assigning a scalar to a collection of row entries will require broadcasting using .=, but this is still being discussed.

kdyrhage · March 19, 2019, 11:55am

How is collab_matrix() defined? I think the columns are all pointing to the same array, which happens if you do something like this:

julia> using DataFrames
julia> a = [0, 0, 0];
julia> dx = DataFrame(AD = a, AE = a);
julia> dx[1, 1] = 1;
julia> dx
3×2 DataFrame
│ Row │ AD    │ AE    │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 0     │ 0     │
│ 3   │ 0     │ 0     │

johann.spies · March 19, 2019, 2:26pm

Thank you kdyrhage, you are on the money!

In my collab_matrix() I used exactly that method to populate each column with a series of zeros.

I had

l = repeat(0:0, inner = r)
for i in 1:r
        _,z = size(dx)
        e = c[:code][i]
        insertcols!(dx, z+1, Symbol(e) => l))

in my function. Changing insertcols! to

 insertcols!(dx, z+1, Symbol(e) => repeat(0:0, inner = r))

did the trick.

Valuable lesson learnt.

ampbj · February 1, 2021, 11:43am

@bkamins I am wondering if there is a faster way to replace value of a cell in place?
I am working on an algorithm that does a lot of dataframes manipulation in place and does not seem to be running fast? or perhaps dataframes is not designed for these operations?

one operation seems like to be taking around 3 ms.

@btime data[data.Timestamp .== "20201231 165800", :Close] = [100]
  2.665 ms (8 allocations: 50.03 KiB)

Thank you

bkamins · February 1, 2021, 1:11pm

This is not related to DataFrames.jl. Use GitHub - andyferris/AcceleratedArrays.jl: Arrays with acceleration indices for :Timestamp column for such use cases if you need to be fast.

jakewilliami · May 12, 2021, 9:08pm

Expanding on @Ajaychat3’s answer for those wondering how to do this with multiple conditions, you can do this:

df[((df.name .== "Sally") .& (df.ancient .== true)), :age] .= 99

You have to use & because you are creating a BitArray, and you need to broadcast it because there are two of them. You may also need to broadcast the assignment (.=) to change multiple values.

nilshg · May 12, 2021, 9:14pm

You’ve got one redundant set of brackets there,

df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= 99

will do.

sai_matcha · February 14, 2022, 4:23pm

it is allowing me change age to a number. im intersted to know how can i change “sally” age to missing ?
Thank you

nilshg · February 14, 2022, 4:25pm

df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= missing

(actually given I complained about extra brackets above, now with broadcasted && we can do

df[df.name .== "Sally" .&& df.ancient .== true, :age] .= missing

but for this the age column has to be a Union{Missing, Int64}, so one might have to do allowmissing!(df, :age) first.

sai_matcha · February 14, 2022, 4:29pm

@nilshg thank you

bkamins · February 14, 2022, 4:29pm

df.age = @. ifelse(df.name == "Sally" && df.ancient, missing, df.age)

or with DataFramesMeta.jl

@rtransform :age = ifelse(:name == "Sally" && :ancient, missing, :age)

sai_matcha · February 14, 2022, 4:35pm

ancient column is not given in the initial query, can i ask how it there in the code ?

bkamins · February 14, 2022, 4:37pm

I am not sure what you ask about exactly. I have re-written the code that @nilshg proposed in his answer.

sai_matcha · February 14, 2022, 4:41pm

sorry my mistake. i have got the answer from you and @nilshg replies. @johann.spies has only three columns in the dataframe when he created query. suddenly df.ancient started to appear in solutions. i did not understand the source , from whre df.ancient did come from. so i asked is that any reserved term or any other rationale behind that.

Thank you for clearing my doubt. this is just an curious question. but i got solution for my actual problem from your previous post.

pdeffebach · February 14, 2022, 5:17pm

Maybe a better solution in terms of performance and how to handle missings.

@rtransform df @passmissing :age = :name == "Sally" ? missing : :age

Topic		Replies	Views
How to replace/set/change value in array/dataframe if value is less/greater than? New to Julia	3	5357	December 1, 2020
Mutate a new variable with row numbers Data	4	1385	November 12, 2019
How to add/subtract values in a dataframe or array column? New to Julia	5	2771	August 24, 2020
Why is it so complicated to access a row in a DataFrame? General Usage dataframes	13	1958	August 25, 2023
Change column to row using conditions New to Julia dataframes	4	455	June 23, 2022

DataFrame: how to change value of a cell without knowing the row number

Related topics