Replace Values in a DataFrames Column with Another Value

Lamis_Njeh · April 15, 2022, 7:26pm

In my dataframe I have several columns which values are string
I would like to extract those values and then doing it’s preprocessing
Thnak you for your help

TheCedarPrince · April 15, 2022, 7:35pm

Hey @Lamis_Njeh ,

Could you provide an example of what your data frame looks like (don’t need actual values, there - just examples).
Thanks!

Lamis_Njeh · April 15, 2022, 7:41pm

I would like to replace “negative” with 0 and “positive” with 1

TheCedarPrince · April 15, 2022, 7:53pm

You could do something like this:

df["SARS-Cov-2 exam result"] = [item == "negative" ? 0 : 1 for item in df["SARS-Cov-2 exam result"]]

Where df is your DataFrame.

bkamins · April 15, 2022, 8:16pm

df."SARS-Cov-2 exam result" = ifelse.(df."SARS-Cov-2 exam result" .== "negative", 0, 1)

zweiglimmergneis · April 15, 2022, 8:21pm

The package includes functions for data manipulation, e.g. transform

map_to_Int(s) = s == "negativ" ? 0 : (s == "positiv" ? 1 : missing)
col = Symbol("SARS-Cov-2 exam result")
transform!(df, col => ByRow(map_to_Int) => col)

Lamis_Njeh · April 15, 2022, 9:08pm

thanks sir it’s very helpful
sir there is a column of string but it contains a lot of different values and I’m not sure about doing that to all values ( an other column)
how could I be sure about all values in a column ?
thanks sir for your help

TheCedarPrince · April 15, 2022, 9:29pm

No problem!
Could you provide more information on these values?

My solution would be something like this based on what you are saying:

value_map = Dict("positive" => 1, "negative" => 0, "another_string" => 42)
df["SARS-Cov-2 exam result"] = [value_map[item] for item in df["SARS-Cov-2 exam result"]]

This is the most simple, straightforward approach that comes to my mind.
You would have to define the mappings in the dictionary.
If you provide an example of the other column, we could perhaps help better.

zweiglimmergneis · April 15, 2022, 9:44pm

I’m not sure if I understand your question. It’s possible though to pass several arguments to the function that is applied to each row. E.g. the result could depend on some condition to column c2

function map_to_Int(c1, c2)
    if c2 == some_condition
        return c1 == "negative" ? 0 : (c1 == "positive" ? 1 : missing)
    else
        return 9
    end
end

transform!(df, [:c1, :c2] => ByRow(map_to_Int) => :r)

pdeffebach · April 15, 2022, 10:12pm

using DataFramesMeta.jl

@rtransform df @passmissing $"SARS-Cov-2 exam result" == "positive" ? true : false

Lamis_Njeh · April 15, 2022, 10:17pm

this is my goal but with python

TheCedarPrince · April 16, 2022, 12:21am

This seems like a different goal than what you had stated at the start of the post which is fine.
Initially, it appeared you wanted to map the string values found in one columns, “SARS-Cov-2 exam result”, to 1’s and 0’s.
I think the previous answers provided work well for what you were trying to do initially.

If you want to find every single unique item in all columns of a dataframe, you could do something like this:

> df = DataFrame(rand(["Kyrylo", "Layla", "Bob", "Noelle", "Deepak"], 3, 5), :auto)
> 3×5 DataFrame
 Row │ x1      x2      x3      x4      x5
     │ String  String  String  String  String
─────┼────────────────────────────────────────
   1 │ Noelle  Layla   Kyrylo  Bob     Kyrylo
   2 │ Noelle  Noelle  Deepak  Bob     Bob
   3 │ Noelle  Kyrylo  Deepak  Bob     Layla

> for column in propertynames(df)
      println("Column $column:  $(df[:, column])")
  end

Column x1:  ["Noelle", "Noelle", "Noelle"]
Column x2:  ["Layla", "Noelle", "Kyrylo"]
Column x3:  ["Kyrylo", "Deepak", "Deepak"]
Column x4:  ["Bob", "Bob", "Bob"]
Column x5:  ["Kyrylo", "Bob", "Layla"]

So, using my previous answer, you could then replace values as needed:

>value_map = Dict("Kyrylo" => 1, "Layla" => 0, "Bob" => 3, "Noelle" => 5, "Deepak" => 17)
>for column in propertynames(df)
        df[!, column] = [value_map[item] for item in df[:, column]]
end
> df
> 3×5 DataFrame
 Row │ x1     x2     x3     x4     x5
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     5      0      1      3      1
   2 │     5      5     17      3      3
   3 │     5      1     17      3      0

So in summary, if I can make sure I know what you are asking, you want to know:

How to find all unique values in a column of a dataframe
Map values in a dataframe to another value
Replace values in a dataframe given a mapping

Is that right?
I think this should answer all the questions you have had.
Let us know if that helped!

rocco_sprmnt21 · April 16, 2022, 1:40pm

I try a different interpretation of the OP’s request.
He would like to apply the solution Negative => 0, Positive => 1, to other columns (BUT NOT ALL) for “dected” non-detected "strings.
Perhaps he fears that for the columns that have some values in common, changing one column can propagate it on the others, without the possibility of control.

PS
Is it possible (and how?) In julia a syntax of the print () function that produces a result like the one shown in the figure?

Lamis_Njeh · April 17, 2022, 6:48pm

I would like first of all to detect all string values in my dataframe
then I whould like to replace it

Lamis_Njeh · April 18, 2022, 8:28pm

thank you it’s very helpfull

Lamis_Njeh · April 18, 2022, 8:42pm

it’s right but I’m sorry that I have some troubles with english

rocco_sprmnt21 · April 19, 2022, 8:46am

i tried to simulate python output in julia.
I found a db with similar data to yours in this site

and I produced the following output …

julia> H
29×2 Matrix{String}:
 "SARS-Cov-2 exam result----------------------------"  "[negative, positive]"
 "Respiratory Syncytial Virus-----------------------"  "[missing, not_detected, detected]"
 "Influenza A---------------------------------------"  "[missing, not_detected, detected]"
 "Influenza B---------------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 1-----------------------------------"  "[missing, not_detected, detected]"
 "CoronavirusNL63-----------------------------------"  "[missing, not_detected, detected]"
 "Rhinovirus/Enterovirus----------------------------"  "[missing, detected, not_detected]"
 "Coronavirus HKU1----------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 3-----------------------------------"  "[missing, not_detected, detected]"
 "Chlamydophila pneumoniae--------------------------"  "[missing, not_detected, detected]"
 "Adenovirus----------------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 4-----------------------------------"  "[missing, not_detected, detected]"
 "Coronavirus229E-----------------------------------"  "[missing, not_detected, detected]"
 "CoronavirusOC43-----------------------------------"  "[missing, not_detected, detected]"
 "Inf A H1N1 2009-----------------------------------"  "[missing, not_detected, detected]"
 "Bordetella pertussis------------------------------"  "[missing, not_detected, detected]"
 "Metapneumovirus-----------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 2-----------------------------------"  "[missing, not_detected]"
 "Influenza B, rapid test---------------------------"  "[missing, negative, positive]"
 "Influenza A, rapid test---------------------------"  "[missing, negative, positive]"
 "Strepto A-----------------------------------------"  "[missing, positive, negative, not_done]"
 "Urine - Esterase----------------------------------"  "[missing, absent, not_done]"
 "Urine - Hemoglobin--------------------------------"  "[missing, absent, present, not_done]"
 "Urine - Bile pigments-----------------------------"  "[missing, absent, not_done]"
 "Urine - Ketone Bodies-----------------------------"  "[missing, absent, not_done]"
 "Urine - Nitrite-----------------------------------"  "[missing, not_done]"
 "Urine - Urobilinogen------------------------------"  "[missing, normal, not_done]"
 "Urine - Protein-----------------------------------"  "[missing, absent, not_done]"
 "Urine - Color-------------------------------------"  "[missing, light_yellow, yellow, orange, citrus_yellow]"

using DataFrames, CSV

df=CSV.read("dataset.csv",decimal=',', DataFrame)

describe(df)

df1=df[:,Not(["Patient ID","Urine - pH"])]

colstr=names(df1, eltype.(eachcol(df1)) .>: String15)

un=unique.(eachcol(df[:,colstr]))

re=r"\[([^]]+)\]"

m= @. replace(getproperty(match(re,string(un)),:match), "\""=>"")

H= hcat(rpad.(colstr,50,"-"),m)

below is the list of unique values present in the “strings” columns.
How do you want to replace these values with numbers?

julia> domain= union(un...)
13-element Vector{Union{Missing, String15}}:
 "negative"
 "positive"
 missing
 "not_detected"
 "detected"
 "not_done"
 "absent"
 "present"
 "normal"
 "light_yellow"
 "yellow"
 "orange"
 "citrus_yellow"

TheCedarPrince · April 19, 2022, 2:31pm

It’s no problem at all!
I am glad it helped.

Also, you may be interested in the package, PrettyTables.jl, as well for printing your tables.
For example, this could be printed:

Lamis_Njeh · April 19, 2022, 6:36pm

Thank you sir it’s very helpful

nikit_goku · May 23, 2023, 9:21am

You could do something like this:

df["SARS-Cov-2 exam result"] = Int64.(replace(df["SARS-Cov-2 exam result"], "negative"=>0, "positive"=>1))