In my dataframe I have several columns which values are string
I would like to extract those values and then doing itβs preprocessing
Thnak you for your help
Hey @Lamis_Njeh ,
Could you provide an example of what your data frame looks like (donβt need actual values, there - just examples).
Thanks!
You could do something like this:
df["SARS-Cov-2 exam result"] = [item == "negative" ? 0 : 1 for item in df["SARS-Cov-2 exam result"]]
Where df
is your DataFrame.
df."SARS-Cov-2 exam result" = ifelse.(df."SARS-Cov-2 exam result" .== "negative", 0, 1)
The package includes functions for data manipulation, e.g. transform
map_to_Int(s) = s == "negativ" ? 0 : (s == "positiv" ? 1 : missing)
col = Symbol("SARS-Cov-2 exam result")
transform!(df, col => ByRow(map_to_Int) => col)
thanks sir itβs very helpful
sir there is a column of string but it contains a lot of different values and Iβm not sure about doing that to all values ( an other column)
how could I be sure about all values in a column ?
thanks sir for your help
No problem!
Could you provide more information on these values?
My solution would be something like this based on what you are saying:
value_map = Dict("positive" => 1, "negative" => 0, "another_string" => 42)
df["SARS-Cov-2 exam result"] = [value_map[item] for item in df["SARS-Cov-2 exam result"]]
This is the most simple, straightforward approach that comes to my mind.
You would have to define the mappings in the dictionary.
If you provide an example of the other column, we could perhaps help better.
Iβm not sure if I understand your question. Itβs possible though to pass several arguments to the function that is applied to each row. E.g. the result could depend on some condition to column c2
function map_to_Int(c1, c2)
if c2 == some_condition
return c1 == "negative" ? 0 : (c1 == "positive" ? 1 : missing)
else
return 9
end
end
transform!(df, [:c1, :c2] => ByRow(map_to_Int) => :r)
using DataFramesMeta.jl
@rtransform df @passmissing $"SARS-Cov-2 exam result" == "positive" ? true : false
This seems like a different goal than what you had stated at the start of the post which is fine.
Initially, it appeared you wanted to map the string values found in one columns, βSARS-Cov-2 exam resultβ, to 1βs and 0βs.
I think the previous answers provided work well for what you were trying to do initially.
If you want to find every single unique item in all columns of a dataframe, you could do something like this:
> df = DataFrame(rand(["Kyrylo", "Layla", "Bob", "Noelle", "Deepak"], 3, 5), :auto)
> 3Γ5 DataFrame
Row β x1 x2 x3 x4 x5
β String String String String String
ββββββΌββββββββββββββββββββββββββββββββββββββββ
1 β Noelle Layla Kyrylo Bob Kyrylo
2 β Noelle Noelle Deepak Bob Bob
3 β Noelle Kyrylo Deepak Bob Layla
> for column in propertynames(df)
println("Column $column: $(df[:, column])")
end
Column x1: ["Noelle", "Noelle", "Noelle"]
Column x2: ["Layla", "Noelle", "Kyrylo"]
Column x3: ["Kyrylo", "Deepak", "Deepak"]
Column x4: ["Bob", "Bob", "Bob"]
Column x5: ["Kyrylo", "Bob", "Layla"]
So, using my previous answer, you could then replace values as needed:
>value_map = Dict("Kyrylo" => 1, "Layla" => 0, "Bob" => 3, "Noelle" => 5, "Deepak" => 17)
>for column in propertynames(df)
df[!, column] = [value_map[item] for item in df[:, column]]
end
> df
> 3Γ5 DataFrame
Row β x1 x2 x3 x4 x5
β Int64 Int64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββ
1 β 5 0 1 3 1
2 β 5 5 17 3 3
3 β 5 1 17 3 0
So in summary, if I can make sure I know what you are asking, you want to know:
- How to find all unique values in a column of a dataframe
- Map values in a dataframe to another value
- Replace values in a dataframe given a mapping
Is that right?
I think this should answer all the questions you have had.
Let us know if that helped!
I try a different interpretation of the OPβs request.
He would like to apply the solution Negative => 0, Positive => 1, to other columns (BUT NOT ALL) for βdectedβ non-detected "strings.
Perhaps he fears that for the columns that have some values in common, changing one column can propagate it on the others, without the possibility of control.
PS
Is it possible (and how?) In julia a syntax of the print () function that produces a result like the one shown in the figure?
I would like first of all to detect all string values in my dataframe
then I whould like to replace it
thank you itβs very helpfull
itβs right but Iβm sorry that I have some troubles with english
i tried to simulate python output in julia.
I found a db with similar data to yours in this site
and I produced the following output β¦
julia> H
29Γ2 Matrix{String}:
"SARS-Cov-2 exam result----------------------------" "[negative, positive]"
"Respiratory Syncytial Virus-----------------------" "[missing, not_detected, detected]"
"Influenza A---------------------------------------" "[missing, not_detected, detected]"
"Influenza B---------------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 1-----------------------------------" "[missing, not_detected, detected]"
"CoronavirusNL63-----------------------------------" "[missing, not_detected, detected]"
"Rhinovirus/Enterovirus----------------------------" "[missing, detected, not_detected]"
"Coronavirus HKU1----------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 3-----------------------------------" "[missing, not_detected, detected]"
"Chlamydophila pneumoniae--------------------------" "[missing, not_detected, detected]"
"Adenovirus----------------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 4-----------------------------------" "[missing, not_detected, detected]"
"Coronavirus229E-----------------------------------" "[missing, not_detected, detected]"
"CoronavirusOC43-----------------------------------" "[missing, not_detected, detected]"
"Inf A H1N1 2009-----------------------------------" "[missing, not_detected, detected]"
"Bordetella pertussis------------------------------" "[missing, not_detected, detected]"
"Metapneumovirus-----------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 2-----------------------------------" "[missing, not_detected]"
"Influenza B, rapid test---------------------------" "[missing, negative, positive]"
"Influenza A, rapid test---------------------------" "[missing, negative, positive]"
"Strepto A-----------------------------------------" "[missing, positive, negative, not_done]"
"Urine - Esterase----------------------------------" "[missing, absent, not_done]"
"Urine - Hemoglobin--------------------------------" "[missing, absent, present, not_done]"
"Urine - Bile pigments-----------------------------" "[missing, absent, not_done]"
"Urine - Ketone Bodies-----------------------------" "[missing, absent, not_done]"
"Urine - Nitrite-----------------------------------" "[missing, not_done]"
"Urine - Urobilinogen------------------------------" "[missing, normal, not_done]"
"Urine - Protein-----------------------------------" "[missing, absent, not_done]"
"Urine - Color-------------------------------------" "[missing, light_yellow, yellow, orange, citrus_yellow]"
using DataFrames, CSV
df=CSV.read("dataset.csv",decimal=',', DataFrame)
describe(df)
df1=df[:,Not(["Patient ID","Urine - pH"])]
colstr=names(df1, eltype.(eachcol(df1)) .>: String15)
un=unique.(eachcol(df[:,colstr]))
re=r"\[([^]]+)\]"
m= @. replace(getproperty(match(re,string(un)),:match), "\""=>"")
H= hcat(rpad.(colstr,50,"-"),m)
below is the list of unique values present in the βstringsβ columns.
How do you want to replace these values with numbers?
julia> domain= union(un...)
13-element Vector{Union{Missing, String15}}:
"negative"
"positive"
missing
"not_detected"
"detected"
"not_done"
"absent"
"present"
"normal"
"light_yellow"
"yellow"
"orange"
"citrus_yellow"
Itβs no problem at all!
I am glad it helped.
Also, you may be interested in the package, PrettyTables.jl, as well for printing your tables.
For example, this could be printed:
Thank you sir itβs very helpful
You could do something like this:
df["SARS-Cov-2 exam result"] = Int64.(replace(df["SARS-Cov-2 exam result"], "negative"=>0, "positive"=>1))