I want to filter a dataframe by a column value and then modify another column in the original dataframe based on the filtered condition. How would I do this?

Hi how’s it going? Let me give a simple example

Say we have a dataframe df with columns A,B and C. Column B holds integer values between 1 and 40. We then have a list K of a specific subset of column B values [1,2,3]

I want to modify the values of column C, only where the column B value of that same row is in list K. I’ve tried using filter but the original dataframe df does not get modified. It’s ironic because usually I’m looking for solutions of how to not modify the original dataframe. But can someone help me with this?

Thanks

Can you post an example of the kind of DataFrame you are talking about? It’ll be easier to help you then.

You can do this using a view, as long as you only want to modify existing columns and not create new ones.

julia> df = DataFrame(rand(10,2))
10×2 DataFrame
│ Row │ x1        │ x2       │
│     │ Float64   │ Float64  │
├─────┼───────────┼──────────┤
│ 1   │ 0.671046  │ 0.349475 │
│ 2   │ 0.146962  │ 0.360163 │
│ 3   │ 0.0722192 │ 0.682262 │
│ 4   │ 0.215604  │ 0.640527 │
│ 5   │ 0.108089  │ 0.278778 │
│ 6   │ 0.634965  │ 0.145913 │
│ 7   │ 0.756691  │ 0.687655 │
│ 8   │ 0.269197  │ 0.726351 │
│ 9   │ 0.87518   │ 0.989163 │
│ 10  │ 0.0548732 │ 0.54023  │

julia> sdf = view(df, [1, 3, 5, 7, 9], :)
5×2 SubDataFrame
│ Row │ x1        │ x2       │
│     │ Float64   │ Float64  │
├─────┼───────────┼──────────┤
│ 1   │ 0.671046  │ 0.349475 │
│ 2   │ 0.0722192 │ 0.682262 │
│ 3   │ 0.108089  │ 0.278778 │
│ 4   │ 0.756691  │ 0.687655 │
│ 5   │ 0.87518   │ 0.989163 │

julia> sdf.x2 .= [1, 2, 3, 4, 5]
5-element view(::Array{Float64,1}, [1, 3, 5, 7, 9]) with eltype Float64:
 1.0
 2.0
 3.0
 4.0
 5.0

julia> df
10×2 DataFrame
│ Row │ x1        │ x2       │
│     │ Float64   │ Float64  │
├─────┼───────────┼──────────┤
│ 1   │ 0.671046  │ 1.0      │
│ 2   │ 0.146962  │ 0.360163 │
│ 3   │ 0.0722192 │ 2.0      │
│ 4   │ 0.215604  │ 0.640527 │
│ 5   │ 0.108089  │ 3.0      │
│ 6   │ 0.634965  │ 0.145913 │
│ 7   │ 0.756691  │ 4.0      │
│ 8   │ 0.269197  │ 0.726351 │
│ 9   │ 0.87518   │ 5.0      │
│ 10  │ 0.0548732 │ 0.54023  │

As you say this is pretty much the normal behaviour using standard indexing, so this will work:

julia> using DataFrames

julia> df = DataFrame(a = 1:5, b = rand(5))
5×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.757122 │
│ 2   │ 2     │ 0.858397 │
│ 3   │ 3     │ 0.599429 │
│ 4   │ 4     │ 0.95401  │
│ 5   │ 5     │ 0.838269 │

julia> list_K = [2, 3]

julia> df[in(list_K).(df.a), :b] .= 10;

julia> df
5×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.757122 │
│ 2   │ 2     │ 10.0     │
│ 3   │ 3     │ 10.0     │
│ 4   │ 4     │ 0.95401  │
│ 5   │ 5     │ 0.838269 │
1 Like