DataFrame: how to change value of a cell without knowing the row number

In the following DataFrame, how do I change the age of Sally without knowing the row number and without changing any other values in the DataFrame?

I have looked at the DataFrames, DataframesMeta and Query documentation and could not figure out a clean way of doing it.

df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])
3×3 DataFrame
│ Row │ name   │ age     │ children │
│     │ String │ Float64 │ Int64    │
├─────┼────────┼─────────┼──────────┤
│ 1   │ John   │ 23.0    │ 3        │
│ 2   │ Sally  │ 42.0    │ 5        │
│ 3   │ Kirk   │ 59.0    │ 2        │

1 Like

This should work.

df[(df[:name] .== "Sally"),:age]=99

Thanks @ Ajaychat3

Wait a bit, here is something strange.

I have a DataFrame dx

julia> dx = collab_matrix()
249×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 6   │ AL      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 7   │ AM      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 8   │ AO      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 9   │ AQ      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 10  │ AR      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 11  │ AS      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 12  │ AT      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 13  │ AU      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
⋮
│ 236 │ VA      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 237 │ VC      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 238 │ VE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 239 │ VG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 240 │ VI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 241 │ VN      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 242 │ VU      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 243 │ WF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 244 │ WS      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 245 │ YE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 246 │ YT      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 247 │ ZA      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 248 │ ZM      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 249 │ ZW      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
julia> d = first(dx, 5)
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

julia> d
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 6     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

This works as expected, but

julia> dx[(dx[:country] .== "AE"),:AG]=6
6
julia> first(dx, 6)
6×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 6   │ AL      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

It changes the whole row as soon as I work with the bigger DataFrame!

A bug?

I have worked on quite big dataframes and haven’t noticed such an issue. Check if are using the latest version of packages.

This should not happen. Can you please submit a reproducible example as an Issue in DataFrames.jl?

As a side note - in the future we will probably make DataFrames.jl consistent with Base and assigning a scalar to a collection of row entries will require broadcasting using .=, but this is still being discussed.

How is collab_matrix() defined? I think the columns are all pointing to the same array, which happens if you do something like this:

julia> using DataFrames
julia> a = [0, 0, 0];
julia> dx = DataFrame(AD = a, AE = a);
julia> dx[1, 1] = 1;
julia> dx
3×2 DataFrame
│ Row │ AD    │ AE    │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 0     │ 0     │
│ 3   │ 0     │ 0     │
2 Likes

Thank you kdyrhage, you are on the money!

In my collab_matrix() I used exactly that method to populate each column with a series of zeros.

I had

l = repeat(0:0, inner = r)
for i in 1:r
        _,z = size(dx)
        e = c[:code][i]
        insertcols!(dx, z+1, Symbol(e) => l))

in my function. Changing insertcols! to

 insertcols!(dx, z+1, Symbol(e) => repeat(0:0, inner = r))

did the trick.

Valuable lesson learnt.

@bkamins I am wondering if there is a faster way to replace value of a cell in place?
I am working on an algorithm that does a lot of dataframes manipulation in place and does not seem to be running fast? or perhaps dataframes is not designed for these operations?

one operation seems like to be taking around 3 ms.

@btime data[data.Timestamp .== "20201231 165800", :Close] = [100]
  2.665 ms (8 allocations: 50.03 KiB)

Thank you

This is not related to DataFrames.jl. Use GitHub - andyferris/AcceleratedArrays.jl: Arrays with acceleration indices for :Timestamp column for such use cases if you need to be fast.

3 Likes

Expanding on @Ajaychat3’s answer for those wondering how to do this with multiple conditions, you can do this:

df[((df.name .== "Sally") .& (df.ancient .== true)), :age] .= 99

You have to use & because you are creating a BitArray, and you need to broadcast it because there are two of them. You may also need to broadcast the assignment (.=) to change multiple values.

You’ve got one redundant set of brackets there,

df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= 99

will do.

2 Likes

it is allowing me change age to a number. im intersted to know how can i change “sally” age to missing ?
Thank you

df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= missing

(actually given I complained about extra brackets above, now with broadcasted && we can do

df[df.name .== "Sally" .&& df.ancient .== true, :age] .= missing

but for this the age column has to be a Union{Missing, Int64}, so one might have to do allowmissing!(df, :age) first.

2 Likes

@nilshg thank you

df.age = @. ifelse(df.name == "Sally" && df.ancient, missing, df.age)

or with DataFramesMeta.jl

@rtransform :age = ifelse(:name == "Sally" && :ancient, missing, :age)
2 Likes

ancient column is not given in the initial query, can i ask how it there in the code ?

I am not sure what you ask about exactly. I have re-written the code that @nilshg proposed in his answer.

1 Like

sorry my mistake. i have got the answer from you and @nilshg replies. @johann.spies has only three columns in the dataframe when he created query. suddenly df.ancient started to appear in solutions. i did not understand the source , from whre df.ancient did come from. so i asked is that any reserved term or any other rationale behind that.

Thank you for clearing my doubt. this is just an curious question. but i got solution for my actual problem from your previous post.

1 Like

Maybe a better solution in terms of performance and how to handle missings.

@rtransform df @passmissing :age = :name == "Sally" ? missing : :age