How to conditionally select values in a dataframe and insert to a new column

Hi everyone,

I am trying to create a new dataframe column based on numeric conditions, but cannot seem to figure out the correct syntax to achieve this.

For example, to reproduce the issue

Source df

ptable = DataFrame( Number = [1, 2, 6, 8, 26 ])

Goal Add a new column named new_number with the value 7 when number < 10, otherwise value should be be missing

Current attempt

ptable[ptable[:Number] .< 10, :][!, :New_number] .= 7

While this shows 4 rows with the value 7th as an output of this command, this does not create a column named new_number in the ptable dataframe

julia> ptable.New_number = [x < 10 ? 7 : missing for x in ptable.Number]
5-element Array{Union{Missing, Int64},1}:
 7
 7
 7
 7
  missing

You can also use DataFramesMeta

julia> @transform(ptable, New_number = [x < 10 ? 7 : missing for x in :Number])

can I ask what software you are most familiar with using for data cleaning like this?

Thanks. I am used to doing this in pandas (in python), where the equivalent syntax would be

ptable.loc[ptable[‘Number’]<10 , ‘New_number’]=7

@pdeffebach: Can this syntax be extended for checking conditions from multiple columns eg x <10 from Number and x <2 from Number2?

yes. for multiple columns maybe it is best to use eachrow

ptable.New_number = map(eachrow(df)) do r
    if r.Number < 10 || r.otherVariable > 50 
        return missing
    else
        7 
    end
end

Making this kind of thing work easier is definitely something DataFrames could work on.

3 Likes

Great - thank you so much for pointing me in the right direction !