I want to replace all the numerical (Int) values of a column in a dataframe with a string (“ASD”). I am though getting an error that this isn’t possible. I also tried to use “1”, in which case it doesn’t return an error but it doesn’t do the convertion (I suppose because 1 isn’t a string in the first place).
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
Closest candidates are:
convert(::Type{T}, ::T) where T<:Number at number.jl:6
convert(::Type{T}, ::Number) where T<:Number at number.jl:7
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
...
Stacktrace:
[1] setindex!(A::Vector{Int64}, x::String, i1::Int64)
@ Base ./array.jl:843
[2] _replace!(new::Base.var"#new#295"{Tuple{Pair{Int64, String}}}, res::Vector{Int64}, A::Vector{Int64}, count::Int64)
@ Base ./set.jl:665
[3] replace_pairs!
@ ./set.jl:488 [inlined]
[4] #replace!#294
@ ./set.jl:478 [inlined]
[5] replace!(A::Vector{Int64}, old_new::Pair{Int64, String})
@ Base ./set.jl:478
[6] top-level scope
@ REPL[38]:1
julia> df = DataFrame(DX_GROUP = [1,2,2,2,1,1,1,], TEST = [3,4,6,2,2,3,2]);
julia> typeof(df.DX_GROUP)
Vector{Int64} (alias for Array{Int64, 1})
replace! operates in place (hence the !), so you are trying to store a string in a vector of Floats. You need to allocate a new vector which can hold floats and strings, so replace without ! and assign to your existing column:
Actually I should have added that from a performance perspective this isn’t exactly innocuous: as you see from my post above, replacing some floats with strings ends up creating a Vector{Any}. This is a dreaded object in Julia, as it basically means you are hiding the type of your data from the compiler which prevents it from generating efficient machine code (essentially you are then dropping down to the performance of “true” dynamic languages without type inference like Python).
If you know that your vector will have to hold strings and floats, you should probably be explicit about this from the get-go:
julia> df = DataFrame(DX_GROUP = Union{Int64, String}[1,2,2,2,1,1,1], TEST = [3,4,6,2,2,3,2]);
if you do this, you can actually replace! in place:
the Julia compiler can optimize small union types fairly well, so a Vector{String, Int64} should have decent performance (although as always ensure you are benchmarking for your use case).
It’s a bit hard to give more concrete advice without knowing what you are looking to do, but I will say that having a mixed integer/string vector seems slightly odd. Are you maybe after something like a CategoricalArray?
Well that was just a minimal example to replicate the issue I was facing. The actual DataFrame I am using is of a mixed type. It basically has about 1000 rows and 123 columns with various phenotypical description of my dataset.
It is a csv file that I imported as DataFrame, where I just wanted to replace the values of 1/2 with ASD/TD, and every value set to -9999 to replace with “missing” (data type) in order to create more descriptive groupings for my analysis and my plots.
or sometimes in these situations I like to write an explicit function to apply which makes the code more legible:
julia> function encode_dx_group(x)
if x == 1
return "ASD"
elseif x == 2
return "TD"
elseif x == -9999
return missing
else
return "Unexpected value encountered!"
end
end
encode_dx_group (generic function with 1 method)
julia> encode_dx_group.([1, 2, -9999, 5])
4-element Vector{Union{Missing, String}}:
"ASD"
"TD"
missing
"Unexpected value encountered!"
the second option also has the advantage that it has a built-in check for the case where your data has a value that shouldn’t be there.