DataFrame: how to change value of a cell without knowing the row number

In the following DataFrame, how do I change the age of Sally without knowing the row number and without changing any other values in the DataFrame?

I have looked at the DataFrames, DataframesMeta and Query documentation and could not figure out a clean way of doing it.

df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])
3×3 DataFrame
│ Row │ name   │ age     │ children │
│     │ String │ Float64 │ Int64    │
├─────┼────────┼─────────┼──────────┤
│ 1   │ John   │ 23.0    │ 3        │
│ 2   │ Sally  │ 42.0    │ 5        │
│ 3   │ Kirk   │ 59.0    │ 2        │

This should work.

df[(df[:name] .== "Sally"),:age]=99

Thanks @ Ajaychat3

Wait a bit, here is something strange.

I have a DataFrame dx

julia> dx = collab_matrix()
249×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 6   │ AL      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 7   │ AM      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 8   │ AO      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 9   │ AQ      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 10  │ AR      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 11  │ AS      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 12  │ AT      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 13  │ AU      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
⋮
│ 236 │ VA      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 237 │ VC      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 238 │ VE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 239 │ VG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 240 │ VI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 241 │ VN      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 242 │ VU      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 243 │ WF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 244 │ WS      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 245 │ YE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 246 │ YT      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 247 │ ZA      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 248 │ ZM      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 249 │ ZW      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
julia> d = first(dx, 5)
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

julia> d
5×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 0     │ 0     │ 0     │ 6     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

This works as expected, but

julia> dx[(dx[:country] .== "AE"),:AG]=6
6
julia> first(dx, 6)
6×250 DataFrame. Omitted printing of 235 columns
│ Row │ country │ AD    │ AE    │ AF    │ AG    │ AI    │ AL    │ AM    │ AO    │ AQ    │ AR    │ AS    │ AT    │ AU    │ AW    │
│     │ String⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ AD      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 2   │ AE      │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │ 6     │
│ 3   │ AF      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 4   │ AG      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 5   │ AI      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │
│ 6   │ AL      │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │ 0     │

It changes the whole row as soon as I work with the bigger DataFrame!

A bug?

I have worked on quite big dataframes and haven’t noticed such an issue. Check if are using the latest version of packages.

This should not happen. Can you please submit a reproducible example as an Issue in DataFrames.jl?

As a side note - in the future we will probably make DataFrames.jl consistent with Base and assigning a scalar to a collection of row entries will require broadcasting using .=, but this is still being discussed.

How is collab_matrix() defined? I think the columns are all pointing to the same array, which happens if you do something like this:

julia> using DataFrames
julia> a = [0, 0, 0];
julia> dx = DataFrame(AD = a, AE = a);
julia> dx[1, 1] = 1;
julia> dx
3×2 DataFrame
│ Row │ AD    │ AE    │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 0     │ 0     │
│ 3   │ 0     │ 0     │
2 Likes

Thank you kdyrhage, you are on the money!

In my collab_matrix() I used exactly that method to populate each column with a series of zeros.

I had

l = repeat(0:0, inner = r)
for i in 1:r
        _,z = size(dx)
        e = c[:code][i]
        insertcols!(dx, z+1, Symbol(e) => l))

in my function. Changing insertcols! to

 insertcols!(dx, z+1, Symbol(e) => repeat(0:0, inner = r))

did the trick.

Valuable lesson learnt.

@bkamins I am wondering if there is a faster way to replace value of a cell in place?
I am working on an algorithm that does a lot of dataframes manipulation in place and does not seem to be running fast? or perhaps dataframes is not designed for these operations?

one operation seems like to be taking around 3 ms.

@btime data[data.Timestamp .== "20201231 165800", :Close] = [100]
  2.665 ms (8 allocations: 50.03 KiB)

Thank you

This is not related to DataFrames.jl. Use GitHub - andyferris/AcceleratedArrays.jl: Arrays with acceleration indices for :Timestamp column for such use cases if you need to be fast.

1 Like

Expanding on @Ajaychat3’s answer for those wondering how to do this with multiple conditions, you can do this:

df[((df.name .== "Sally") .& (df.ancient .== true)), :age] .= 99

You have to use & because you are creating a BitArray, and you need to broadcast it because there are two of them. You may also need to broadcast the assignment (.=) to change multiple values.

You’ve got one redundant set of brackets there,

df[(df.name .== "Sally") .& (df.ancient .== true), :age] .= 99

will do.

1 Like