Replace Values in a DataFrames Column with Another Value

In my dataframe I have several columns which values are string
I would like to extract those values and then doing it’s preprocessing
Thnak you for your help

Hey @Lamis_Njeh ,

Could you provide an example of what your data frame looks like (don’t need actual values, there - just examples).
Thanks!

1 Like


I would like to replace β€œnegative” with 0 and β€œpositive” with 1

You could do something like this:

df["SARS-Cov-2 exam result"] = [item == "negative" ? 0 : 1 for item in df["SARS-Cov-2 exam result"]]

Where df is your DataFrame.

1 Like
df."SARS-Cov-2 exam result" = ifelse.(df."SARS-Cov-2 exam result" .== "negative", 0, 1)

The package includes functions for data manipulation, e.g. transform

map_to_Int(s) = s == "negativ" ? 0 : (s == "positiv" ? 1 : missing)
col = Symbol("SARS-Cov-2 exam result")
transform!(df, col => ByRow(map_to_Int) => col)

thanks sir it’s very helpful
sir there is a column of string but it contains a lot of different values and I’m not sure about doing that to all values ( an other column)
how could I be sure about all values in a column ?
thanks sir for your help

No problem!
Could you provide more information on these values?

My solution would be something like this based on what you are saying:

value_map = Dict("positive" => 1, "negative" => 0, "another_string" => 42)
df["SARS-Cov-2 exam result"] = [value_map[item] for item in df["SARS-Cov-2 exam result"]]

This is the most simple, straightforward approach that comes to my mind.
You would have to define the mappings in the dictionary.
If you provide an example of the other column, we could perhaps help better.

I’m not sure if I understand your question. It’s possible though to pass several arguments to the function that is applied to each row. E.g. the result could depend on some condition to column c2

function map_to_Int(c1, c2)
    if c2 == some_condition
        return c1 == "negative" ? 0 : (c1 == "positive" ? 1 : missing)
    else
        return 9
    end
end

transform!(df, [:c1, :c2] => ByRow(map_to_Int) => :r)

using DataFramesMeta.jl

@rtransform df @passmissing $"SARS-Cov-2 exam result" == "positive" ? true : false

this is my goal but with python

This seems like a different goal than what you had stated at the start of the post which is fine.
Initially, it appeared you wanted to map the string values found in one columns, β€œSARS-Cov-2 exam result”, to 1’s and 0’s.
I think the previous answers provided work well for what you were trying to do initially.

If you want to find every single unique item in all columns of a dataframe, you could do something like this:

> df = DataFrame(rand(["Kyrylo", "Layla", "Bob", "Noelle", "Deepak"], 3, 5), :auto)
> 3Γ—5 DataFrame
 Row β”‚ x1      x2      x3      x4      x5
     β”‚ String  String  String  String  String
─────┼────────────────────────────────────────
   1 β”‚ Noelle  Layla   Kyrylo  Bob     Kyrylo
   2 β”‚ Noelle  Noelle  Deepak  Bob     Bob
   3 β”‚ Noelle  Kyrylo  Deepak  Bob     Layla

> for column in propertynames(df)
      println("Column $column:  $(df[:, column])")
  end

Column x1:  ["Noelle", "Noelle", "Noelle"]
Column x2:  ["Layla", "Noelle", "Kyrylo"]
Column x3:  ["Kyrylo", "Deepak", "Deepak"]
Column x4:  ["Bob", "Bob", "Bob"]
Column x5:  ["Kyrylo", "Bob", "Layla"]

So, using my previous answer, you could then replace values as needed:

>value_map = Dict("Kyrylo" => 1, "Layla" => 0, "Bob" => 3, "Noelle" => 5, "Deepak" => 17)
>for column in propertynames(df)
        df[!, column] = [value_map[item] for item in df[:, column]]
end
> df
> 3Γ—5 DataFrame
 Row β”‚ x1     x2     x3     x4     x5
     β”‚ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 β”‚     5      0      1      3      1
   2 β”‚     5      5     17      3      3
   3 β”‚     5      1     17      3      0

So in summary, if I can make sure I know what you are asking, you want to know:

  1. How to find all unique values in a column of a dataframe
  2. Map values in a dataframe to another value
  3. Replace values in a dataframe given a mapping

Is that right?
I think this should answer all the questions you have had.
Let us know if that helped!

1 Like

I try a different interpretation of the OP’s request.
He would like to apply the solution Negative => 0, Positive => 1, to other columns (BUT NOT ALL) for β€œdected” non-detected "strings.
Perhaps he fears that for the columns that have some values in common, changing one column can propagate it on the others, without the possibility of control.

PS
Is it possible (and how?) In julia a syntax of the print () function that produces a result like the one shown in the figure?

I would like first of all to detect all string values in my dataframe
then I whould like to replace it

thank you it’s very helpfull

it’s right but I’m sorry that I have some troubles with english

i tried to simulate python output in julia.
I found a db with similar data to yours in this site

and I produced the following output …

julia> H
29Γ—2 Matrix{String}:
 "SARS-Cov-2 exam result----------------------------"  "[negative, positive]"
 "Respiratory Syncytial Virus-----------------------"  "[missing, not_detected, detected]"
 "Influenza A---------------------------------------"  "[missing, not_detected, detected]"
 "Influenza B---------------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 1-----------------------------------"  "[missing, not_detected, detected]"
 "CoronavirusNL63-----------------------------------"  "[missing, not_detected, detected]"
 "Rhinovirus/Enterovirus----------------------------"  "[missing, detected, not_detected]"
 "Coronavirus HKU1----------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 3-----------------------------------"  "[missing, not_detected, detected]"
 "Chlamydophila pneumoniae--------------------------"  "[missing, not_detected, detected]"
 "Adenovirus----------------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 4-----------------------------------"  "[missing, not_detected, detected]"
 "Coronavirus229E-----------------------------------"  "[missing, not_detected, detected]"
 "CoronavirusOC43-----------------------------------"  "[missing, not_detected, detected]"
 "Inf A H1N1 2009-----------------------------------"  "[missing, not_detected, detected]"
 "Bordetella pertussis------------------------------"  "[missing, not_detected, detected]"
 "Metapneumovirus-----------------------------------"  "[missing, not_detected, detected]"
 "Parainfluenza 2-----------------------------------"  "[missing, not_detected]"
 "Influenza B, rapid test---------------------------"  "[missing, negative, positive]"
 "Influenza A, rapid test---------------------------"  "[missing, negative, positive]"
 "Strepto A-----------------------------------------"  "[missing, positive, negative, not_done]"
 "Urine - Esterase----------------------------------"  "[missing, absent, not_done]"
 "Urine - Hemoglobin--------------------------------"  "[missing, absent, present, not_done]"
 "Urine - Bile pigments-----------------------------"  "[missing, absent, not_done]"
 "Urine - Ketone Bodies-----------------------------"  "[missing, absent, not_done]"
 "Urine - Nitrite-----------------------------------"  "[missing, not_done]"
 "Urine - Urobilinogen------------------------------"  "[missing, normal, not_done]"
 "Urine - Protein-----------------------------------"  "[missing, absent, not_done]"
 "Urine - Color-------------------------------------"  "[missing, light_yellow, yellow, orange, citrus_yellow]"
using DataFrames, CSV

df=CSV.read("dataset.csv",decimal=',', DataFrame)

describe(df)

df1=df[:,Not(["Patient ID","Urine - pH"])]

colstr=names(df1, eltype.(eachcol(df1)) .>: String15)

un=unique.(eachcol(df[:,colstr]))

re=r"\[([^]]+)\]"

m= @. replace(getproperty(match(re,string(un)),:match), "\""=>"")

H= hcat(rpad.(colstr,50,"-"),m)

below is the list of unique values present in the β€œstrings” columns.
How do you want to replace these values with numbers?

julia> domain= union(un...)
13-element Vector{Union{Missing, String15}}:
 "negative"
 "positive"
 missing
 "not_detected"
 "detected"
 "not_done"
 "absent"
 "present"
 "normal"
 "light_yellow"
 "yellow"
 "orange"
 "citrus_yellow"
1 Like

It’s no problem at all!
I am glad it helped. :slight_smile:

Also, you may be interested in the package, PrettyTables.jl, as well for printing your tables.
For example, this could be printed:

1 Like

Thank you sir it’s very helpful

You could do something like this:

df["SARS-Cov-2 exam result"] = Int64.(replace(df["SARS-Cov-2 exam result"], "negative"=>0, "positive"=>1))