In my dataframe I have several columns which values are string
I would like to extract those values and then doing it’s preprocessing
Thnak you for your help
Hey @Lamis_Njeh ,
Could you provide an example of what your data frame looks like (don’t need actual values, there - just examples).
Thanks!
You could do something like this:
df["SARS-Cov-2 exam result"] = [item == "negative" ? 0 : 1 for item in df["SARS-Cov-2 exam result"]]
Where df is your DataFrame.
df."SARS-Cov-2 exam result" = ifelse.(df."SARS-Cov-2 exam result" .== "negative", 0, 1)
The package includes functions for data manipulation, e.g. transform
map_to_Int(s) = s == "negativ" ? 0 : (s == "positiv" ? 1 : missing)
col = Symbol("SARS-Cov-2 exam result")
transform!(df, col => ByRow(map_to_Int) => col)
thanks sir it’s very helpful
sir there is a column of string but it contains a lot of different values and I’m not sure about doing that to all values ( an other column)
how could I be sure about all values in a column ?
thanks sir for your help
No problem!
Could you provide more information on these values?
My solution would be something like this based on what you are saying:
value_map = Dict("positive" => 1, "negative" => 0, "another_string" => 42)
df["SARS-Cov-2 exam result"] = [value_map[item] for item in df["SARS-Cov-2 exam result"]]
This is the most simple, straightforward approach that comes to my mind.
You would have to define the mappings in the dictionary.
If you provide an example of the other column, we could perhaps help better.
I’m not sure if I understand your question. It’s possible though to pass several arguments to the function that is applied to each row. E.g. the result could depend on some condition to column c2
function map_to_Int(c1, c2)
if c2 == some_condition
return c1 == "negative" ? 0 : (c1 == "positive" ? 1 : missing)
else
return 9
end
end
transform!(df, [:c1, :c2] => ByRow(map_to_Int) => :r)
using DataFramesMeta.jl
@rtransform df @passmissing $"SARS-Cov-2 exam result" == "positive" ? true : false
This seems like a different goal than what you had stated at the start of the post which is fine.
Initially, it appeared you wanted to map the string values found in one columns, “SARS-Cov-2 exam result”, to 1’s and 0’s.
I think the previous answers provided work well for what you were trying to do initially.
If you want to find every single unique item in all columns of a dataframe, you could do something like this:
> df = DataFrame(rand(["Kyrylo", "Layla", "Bob", "Noelle", "Deepak"], 3, 5), :auto)
> 3×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ String String String String String
─────┼────────────────────────────────────────
1 │ Noelle Layla Kyrylo Bob Kyrylo
2 │ Noelle Noelle Deepak Bob Bob
3 │ Noelle Kyrylo Deepak Bob Layla
> for column in propertynames(df)
println("Column $column: $(df[:, column])")
end
Column x1: ["Noelle", "Noelle", "Noelle"]
Column x2: ["Layla", "Noelle", "Kyrylo"]
Column x3: ["Kyrylo", "Deepak", "Deepak"]
Column x4: ["Bob", "Bob", "Bob"]
Column x5: ["Kyrylo", "Bob", "Layla"]
So, using my previous answer, you could then replace values as needed:
>value_map = Dict("Kyrylo" => 1, "Layla" => 0, "Bob" => 3, "Noelle" => 5, "Deepak" => 17)
>for column in propertynames(df)
df[!, column] = [value_map[item] for item in df[:, column]]
end
> df
> 3×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 5 0 1 3 1
2 │ 5 5 17 3 3
3 │ 5 1 17 3 0
So in summary, if I can make sure I know what you are asking, you want to know:
- How to find all unique values in a column of a dataframe
- Map values in a dataframe to another value
- Replace values in a dataframe given a mapping
Is that right?
I think this should answer all the questions you have had.
Let us know if that helped!
I try a different interpretation of the OP’s request.
He would like to apply the solution Negative => 0, Positive => 1, to other columns (BUT NOT ALL) for “dected” non-detected "strings.
Perhaps he fears that for the columns that have some values in common, changing one column can propagate it on the others, without the possibility of control.
PS
Is it possible (and how?) In julia a syntax of the print () function that produces a result like the one shown in the figure?
I would like first of all to detect all string values in my dataframe
then I whould like to replace it
thank you it’s very helpfull
it’s right but I’m sorry that I have some troubles with english
i tried to simulate python output in julia.
I found a db with similar data to yours in this site
and I produced the following output …
julia> H
29×2 Matrix{String}:
"SARS-Cov-2 exam result----------------------------" "[negative, positive]"
"Respiratory Syncytial Virus-----------------------" "[missing, not_detected, detected]"
"Influenza A---------------------------------------" "[missing, not_detected, detected]"
"Influenza B---------------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 1-----------------------------------" "[missing, not_detected, detected]"
"CoronavirusNL63-----------------------------------" "[missing, not_detected, detected]"
"Rhinovirus/Enterovirus----------------------------" "[missing, detected, not_detected]"
"Coronavirus HKU1----------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 3-----------------------------------" "[missing, not_detected, detected]"
"Chlamydophila pneumoniae--------------------------" "[missing, not_detected, detected]"
"Adenovirus----------------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 4-----------------------------------" "[missing, not_detected, detected]"
"Coronavirus229E-----------------------------------" "[missing, not_detected, detected]"
"CoronavirusOC43-----------------------------------" "[missing, not_detected, detected]"
"Inf A H1N1 2009-----------------------------------" "[missing, not_detected, detected]"
"Bordetella pertussis------------------------------" "[missing, not_detected, detected]"
"Metapneumovirus-----------------------------------" "[missing, not_detected, detected]"
"Parainfluenza 2-----------------------------------" "[missing, not_detected]"
"Influenza B, rapid test---------------------------" "[missing, negative, positive]"
"Influenza A, rapid test---------------------------" "[missing, negative, positive]"
"Strepto A-----------------------------------------" "[missing, positive, negative, not_done]"
"Urine - Esterase----------------------------------" "[missing, absent, not_done]"
"Urine - Hemoglobin--------------------------------" "[missing, absent, present, not_done]"
"Urine - Bile pigments-----------------------------" "[missing, absent, not_done]"
"Urine - Ketone Bodies-----------------------------" "[missing, absent, not_done]"
"Urine - Nitrite-----------------------------------" "[missing, not_done]"
"Urine - Urobilinogen------------------------------" "[missing, normal, not_done]"
"Urine - Protein-----------------------------------" "[missing, absent, not_done]"
"Urine - Color-------------------------------------" "[missing, light_yellow, yellow, orange, citrus_yellow]"
using DataFrames, CSV
df=CSV.read("dataset.csv",decimal=',', DataFrame)
describe(df)
df1=df[:,Not(["Patient ID","Urine - pH"])]
colstr=names(df1, eltype.(eachcol(df1)) .>: String15)
un=unique.(eachcol(df[:,colstr]))
re=r"\[([^]]+)\]"
m= @. replace(getproperty(match(re,string(un)),:match), "\""=>"")
H= hcat(rpad.(colstr,50,"-"),m)
below is the list of unique values present in the “strings” columns.
How do you want to replace these values with numbers?
julia> domain= union(un...)
13-element Vector{Union{Missing, String15}}:
"negative"
"positive"
missing
"not_detected"
"detected"
"not_done"
"absent"
"present"
"normal"
"light_yellow"
"yellow"
"orange"
"citrus_yellow"
It’s no problem at all!
I am glad it helped. ![]()
Also, you may be interested in the package, PrettyTables.jl, as well for printing your tables.
For example, this could be printed:
Thank you sir it’s very helpful
You could do something like this:
df["SARS-Cov-2 exam result"] = Int64.(replace(df["SARS-Cov-2 exam result"], "negative"=>0, "positive"=>1))


