Julia1
March 25, 2020, 4:09pm
1
Hi how’s it going? Let me give a simple example
Say we have a dataframe df with columns A,B and C. Column B holds integer values between 1 and 40. We then have a list K of a specific subset of column B values [1,2,3]
I want to modify the values of column C, only where the column B value of that same row is in list K. I’ve tried using filter but the original dataframe df does not get modified. It’s ironic because usually I’m looking for solutions of how to not modify the original dataframe. But can someone help me with this?
Thanks
Can you post an example of the kind of DataFrame you are talking about? It’ll be easier to help you then.
You can do this using a view
, as long as you only want to modify existing columns and not create new ones.
julia> df = DataFrame(rand(10,2))
10×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Float64 │
├─────┼───────────┼──────────┤
│ 1 │ 0.671046 │ 0.349475 │
│ 2 │ 0.146962 │ 0.360163 │
│ 3 │ 0.0722192 │ 0.682262 │
│ 4 │ 0.215604 │ 0.640527 │
│ 5 │ 0.108089 │ 0.278778 │
│ 6 │ 0.634965 │ 0.145913 │
│ 7 │ 0.756691 │ 0.687655 │
│ 8 │ 0.269197 │ 0.726351 │
│ 9 │ 0.87518 │ 0.989163 │
│ 10 │ 0.0548732 │ 0.54023 │
julia> sdf = view(df, [1, 3, 5, 7, 9], :)
5×2 SubDataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Float64 │
├─────┼───────────┼──────────┤
│ 1 │ 0.671046 │ 0.349475 │
│ 2 │ 0.0722192 │ 0.682262 │
│ 3 │ 0.108089 │ 0.278778 │
│ 4 │ 0.756691 │ 0.687655 │
│ 5 │ 0.87518 │ 0.989163 │
julia> sdf.x2 .= [1, 2, 3, 4, 5]
5-element view(::Array{Float64,1}, [1, 3, 5, 7, 9]) with eltype Float64:
1.0
2.0
3.0
4.0
5.0
julia> df
10×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Float64 │
├─────┼───────────┼──────────┤
│ 1 │ 0.671046 │ 1.0 │
│ 2 │ 0.146962 │ 0.360163 │
│ 3 │ 0.0722192 │ 2.0 │
│ 4 │ 0.215604 │ 0.640527 │
│ 5 │ 0.108089 │ 3.0 │
│ 6 │ 0.634965 │ 0.145913 │
│ 7 │ 0.756691 │ 4.0 │
│ 8 │ 0.269197 │ 0.726351 │
│ 9 │ 0.87518 │ 5.0 │
│ 10 │ 0.0548732 │ 0.54023 │
nilshg
March 26, 2020, 6:34am
4
As you say this is pretty much the normal behaviour using standard indexing, so this will work:
julia> using DataFrames
julia> df = DataFrame(a = 1:5, b = rand(5))
5×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Float64 │
├─────┼───────┼──────────┤
│ 1 │ 1 │ 0.757122 │
│ 2 │ 2 │ 0.858397 │
│ 3 │ 3 │ 0.599429 │
│ 4 │ 4 │ 0.95401 │
│ 5 │ 5 │ 0.838269 │
julia> list_K = [2, 3]
julia> df[in(list_K).(df.a), :b] .= 10;
julia> df
5×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Float64 │
├─────┼───────┼──────────┤
│ 1 │ 1 │ 0.757122 │
│ 2 │ 2 │ 10.0 │
│ 3 │ 3 │ 10.0 │
│ 4 │ 4 │ 0.95401 │
│ 5 │ 5 │ 0.838269 │
1 Like