I want to filter a dataframe by a column value and then modify another column in the original dataframe based on the filtered condition. How would I do this?

Julia1 · March 25, 2020, 4:09pm

Hi how’s it going? Let me give a simple example

Say we have a dataframe df with columns A,B and C. Column B holds integer values between 1 and 40. We then have a list K of a specific subset of column B values [1,2,3]

I want to modify the values of column C, only where the column B value of that same row is in list K. I’ve tried using filter but the original dataframe df does not get modified. It’s ironic because usually I’m looking for solutions of how to not modify the original dataframe. But can someone help me with this?

Thanks

bashonubuntu · March 25, 2020, 4:26pm

Can you post an example of the kind of DataFrame you are talking about? It’ll be easier to help you then.

pdeffebach · March 25, 2020, 4:38pm

You can do this using a view, as long as you only want to modify existing columns and not create new ones.

julia> df = DataFrame(rand(10,2))
10×2 DataFrame
│ Row │ x1        │ x2       │
│     │ Float64   │ Float64  │
├─────┼───────────┼──────────┤
│ 1   │ 0.671046  │ 0.349475 │
│ 2   │ 0.146962  │ 0.360163 │
│ 3   │ 0.0722192 │ 0.682262 │
│ 4   │ 0.215604  │ 0.640527 │
│ 5   │ 0.108089  │ 0.278778 │
│ 6   │ 0.634965  │ 0.145913 │
│ 7   │ 0.756691  │ 0.687655 │
│ 8   │ 0.269197  │ 0.726351 │
│ 9   │ 0.87518   │ 0.989163 │
│ 10  │ 0.0548732 │ 0.54023  │

julia> sdf = view(df, [1, 3, 5, 7, 9], :)
5×2 SubDataFrame
│ Row │ x1        │ x2       │
│     │ Float64   │ Float64  │
├─────┼───────────┼──────────┤
│ 1   │ 0.671046  │ 0.349475 │
│ 2   │ 0.0722192 │ 0.682262 │
│ 3   │ 0.108089  │ 0.278778 │
│ 4   │ 0.756691  │ 0.687655 │
│ 5   │ 0.87518   │ 0.989163 │

julia> sdf.x2 .= [1, 2, 3, 4, 5]
5-element view(::Array{Float64,1}, [1, 3, 5, 7, 9]) with eltype Float64:
 1.0
 2.0
 3.0
 4.0
 5.0

julia> df
10×2 DataFrame
│ Row │ x1        │ x2       │
│     │ Float64   │ Float64  │
├─────┼───────────┼──────────┤
│ 1   │ 0.671046  │ 1.0      │
│ 2   │ 0.146962  │ 0.360163 │
│ 3   │ 0.0722192 │ 2.0      │
│ 4   │ 0.215604  │ 0.640527 │
│ 5   │ 0.108089  │ 3.0      │
│ 6   │ 0.634965  │ 0.145913 │
│ 7   │ 0.756691  │ 4.0      │
│ 8   │ 0.269197  │ 0.726351 │
│ 9   │ 0.87518   │ 5.0      │
│ 10  │ 0.0548732 │ 0.54023  │

nilshg · March 26, 2020, 6:34am

As you say this is pretty much the normal behaviour using standard indexing, so this will work:

julia> using DataFrames

julia> df = DataFrame(a = 1:5, b = rand(5))
5×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.757122 │
│ 2   │ 2     │ 0.858397 │
│ 3   │ 3     │ 0.599429 │
│ 4   │ 4     │ 0.95401  │
│ 5   │ 5     │ 0.838269 │

julia> list_K = [2, 3]

julia> df[in(list_K).(df.a), :b] .= 10;

julia> df
5×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.757122 │
│ 2   │ 2     │ 10.0     │
│ 3   │ 3     │ 10.0     │
│ 4   │ 4     │ 0.95401  │
│ 5   │ 5     │ 0.838269 │

Topic		Replies	Views
Dataframe filtering based on conditions comparing one column to other columns New to Julia indexing , dataframes	6	2045	November 5, 2021
Dataframe Filter New to Julia question , dataframes	6	6264	March 26, 2022
Is it possible to use filter(:col => myFilter, df) with additional constant input variables? New to Julia question , dataframes	12	788	January 5, 2022
DataFrame: how to change value of a cell without knowing the row number Data dataframes	23	7154	January 25, 2023
Easiest way to do "replace col1 = col2 if col3" in a dataframe Data	3	429	December 16, 2020

I want to filter a dataframe by a column value and then modify another column in the original dataframe based on the filtered condition. How would I do this?

Related topics