Hmz
1
Hi everyone,
I am trying to create a new dataframe column based on numeric conditions, but cannot seem to figure out the correct syntax to achieve this.
For example, to reproduce the issue
Source df
ptable = DataFrame( Number = [1, 2, 6, 8, 26 ])
Goal Add a new column named new_number with the value 7 when number < 10, otherwise value should be be missing
Current attempt
ptable[ptable[:Number] .< 10, :][!, :New_number] .= 7
While this shows 4 rows with the value 7th as an output of this command, this does not create a column named new_number in the ptable dataframe
julia> ptable.New_number = [x < 10 ? 7 : missing for x in ptable.Number]
5-element Array{Union{Missing, Int64},1}:
7
7
7
7
missing
You can also use DataFramesMeta
julia> @transform(ptable, New_number = [x < 10 ? 7 : missing for x in :Number])
can I ask what software you are most familiar with using for data cleaning like this?
Hmz
4
Thanks. I am used to doing this in pandas (in python), where the equivalent syntax would be
ptable.loc[ptable[‘Number’]<10 , ‘New_number’]=7
Hmz
5
@pdeffebach: Can this syntax be extended for checking conditions from multiple columns eg x <10 from Number and x <2 from Number2?
yes. for multiple columns maybe it is best to use eachrow
ptable.New_number = map(eachrow(df)) do r
if r.Number < 10 || r.otherVariable > 50
return missing
else
7
end
end
Making this kind of thing work easier is definitely something DataFrames could work on.
3 Likes
Hmz
7
Great - thank you so much for pointing me in the right direction !