Can't convert numerical values to string in a DataFrame (Cannot convert String to Int64)

Hi everyone,

I want to replace all the numerical (Int) values of a column in a dataframe with a string (“ASD”). I am though getting an error that this isn’t possible. I also tried to use “1”, in which case it doesn’t return an error but it doesn’t do the convertion (I suppose because 1 isn’t a string in the first place).

Here’s a minimal example demonstrating my issue:

df = DataFrame(DX_GROUP = [1,2,2,2,1,1,1,], TEST = [3,4,6,2,2,3,2])
replace!(df.DX_GROUP, 1 => "ASD")

I am getting the error

ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  ...
Stacktrace:
 [1] setindex!(A::Vector{Int64}, x::String, i1::Int64)
   @ Base ./array.jl:843
 [2] _replace!(new::Base.var"#new#295"{Tuple{Pair{Int64, String}}}, res::Vector{Int64}, A::Vector{Int64}, count::Int64)
   @ Base ./set.jl:665
 [3] replace_pairs!
   @ ./set.jl:488 [inlined]
 [4] #replace!#294
   @ ./set.jl:478 [inlined]
 [5] replace!(A::Vector{Int64}, old_new::Pair{Int64, String})
   @ Base ./set.jl:478
 [6] top-level scope
   @ REPL[38]:1

Thanks! :slight_smile:

Your problem is this:

julia> df = DataFrame(DX_GROUP = [1,2,2,2,1,1,1,], TEST = [3,4,6,2,2,3,2]);

julia> typeof(df.DX_GROUP)
Vector{Int64} (alias for Array{Int64, 1})

replace! operates in place (hence the !), so you are trying to store a string in a vector of Floats. You need to allocate a new vector which can hold floats and strings, so replace without ! and assign to your existing column:

julia> df.DX_GROUP = replace(df.DX_GROUP, 1 => "ASD")
7-element Vector{Any}:
  "ASD"
 2
 2
 2
  "ASD"
  "ASD"
  "ASD"
2 Likes

Thanks for the explanation, that works for me :slight_smile: :pray:t3:

Actually I should have added that from a performance perspective this isn’t exactly innocuous: as you see from my post above, replacing some floats with strings ends up creating a Vector{Any}. This is a dreaded object in Julia, as it basically means you are hiding the type of your data from the compiler which prevents it from generating efficient machine code (essentially you are then dropping down to the performance of “true” dynamic languages without type inference like Python).

If you know that your vector will have to hold strings and floats, you should probably be explicit about this from the get-go:

julia> df = DataFrame(DX_GROUP = Union{Int64, String}[1,2,2,2,1,1,1], TEST = [3,4,6,2,2,3,2]);

if you do this, you can actually replace! in place:

julia> replace!(df.DX_GROUP, "1" => "ASD")
7-element Vector{Union{Int64, String}}:
 1
 2
 2
 2
 1
 1
 1

the Julia compiler can optimize small union types fairly well, so a Vector{String, Int64} should have decent performance (although as always ensure you are benchmarking for your use case).

It’s a bit hard to give more concrete advice without knowing what you are looking to do, but I will say that having a mixed integer/string vector seems slightly odd. Are you maybe after something like a CategoricalArray?

1 Like

Well that was just a minimal example to replicate the issue I was facing. The actual DataFrame I am using is of a mixed type. It basically has about 1000 rows and 123 columns with various phenotypical description of my dataset.

It is a csv file that I imported as DataFrame, where I just wanted to replace the values of 1/2 with ASD/TD, and every value set to -9999 to replace with “missing” (data type) in order to create more descriptive groupings for my analysis and my plots.

As an example:

	Column1	Unnamed: 0	SUB_ID	X	subject	SITE_ID	FILE_ID	DX_GROUP	DSM_IV_TR	AGE_AT_SCAN	SEX	HANDEDNESS_CATEGORY	HANDEDNESS_SCORES	FIQ	VIQ	PIQ	FIQ_TEST_TYPE	VIQ_TEST_TYPE	PIQ_TEST_TYPE	ADI_R_SOCIAL_TOTAL_A	ADI_R_VERBAL_TOTAL_BV	ADI_RRB_TOTAL_C	ADI_R_ONSET_TOTAL_D	ADI_R_RSRCH_RELIABLE	ADOS_MODULE	ADOS_TOTAL	ADOS_COMM	ADOS_SOCIAL	ADOS_STEREO_BEHAV	ADOS_RSRCH_RELIABLE	ADOS_GOTHAM_SOCAFFECT	ADOS_GOTHAM_RRB	ADOS_GOTHAM_TOTAL	ADOS_GOTHAM_SEVERITY	SRS_VERSION	SRS_RAW_TOTAL	SRS_AWARENESS	SRS_COGNITION	more
Int64?	Int64?	Int64?	Int64?	Int64?	String15?	String31?	Any	Int64?	Float64?	Int64?	String7?	Float64?	Float64?	Int64?	Int64?	String31?	String15?	String15?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	Int64?	
1	0	1	50002	1	50002	"PITT"	"no_filename"	"ASD"	1	16.77	1	"Ambi"	missing	103.0	116	89	"WASI"	"WASI"	"WASI"	16	9	5	4	1	4	12	4	8	3	1	missing	missing	missing	missing	missing	missing	missing	missing	
2	1	2	50003	2	50003	"PITT"	"Pitt_0050003"	"ASD"	1	24.45	1	"R"	missing	124.0	128	115	"WASI"	"WASI"	"WASI"	27	22	5	3	1	4	13	5	8	1	1	missing	missing	missing	missing	missing	missing	missing	missing	
3	2	3	50004	3	50004	"PITT"	"Pitt_0050004"	"ASD"	1	19.09	1	"R"	missing	113.0	108	117	"WASI"	"WASI"	"WASI"	19	12	5	3	1	4	18	6	12	2	1	missing	missing	missing	missing	missing	missing	missing	missing	
4	3	4	50005	4	50005	"PITT"	"Pitt_0050005"	"ASD"	1	13.73	2	"R"	missing	119.0	117	118	"WASI"	"WASI"	"WASI"	23	19	3	4	1	4	12	4	8	1	1	missing	missing	missing	missing	missing	missing	missing	missing	
5	4	5	50006	5	50006	"PITT"	"Pitt_0050006"	"ASD"	1	13.37	1	"L"	missing	109.0	99	119	"WASI"	"WASI"	"WASI"	13	10	4	3	1	4	12	4	8	4	1	missing	missing	missing	missing	missing	missing	missing	missing	
6	5	6	50007	6	50007	"PITT"	"Pitt_0050007"	"ASD"	1	17.78	1	"R"	missing	110.0	106	112	"WASI"	"WASI"	"WASI"	21	14	9	1	1	3	17	5	12	2	1	missing	missing	missing	missing	missing	missing	missing	missing	
7	6	7	50008	7	50008	"PITT"	"Pitt_0050008"	"ASD"	1	32.45	1	"R"	missing	123.0	123	114	"WASI"	"WASI"	"WASI"	24	20	10	2	1	missing	16	4	12	missing	1	missing	missing	missing	missing	missing	missing	missing	missing	
8	7	8	50009	8	50009	"PITT"	"Pitt_0050009"	"ASD"	1	33.86	1	"R"	missing	126.0	118	128	"WASI"	"WASI"	"WASI"	20	11	3	2	1	missing	10	4	6	missing	1	missing	missing	missing	missing	missing	missing	missing	missing	
9	8	9	50010	9	50010	"PITT"	"Pitt_0050010"	"ASD"	1	35.2	1	"L"	missing	81.0	81	93	"WASI"	"WASI"	"WASI"	missing	missing	missing	missing	1	missing	missing	missing	missing	missing	1	missing	missing	missing	missing	missing	missing	missing	missing	
10	9	10	50011	10	50011	"PITT"	"Pitt_0050011"	"ASD"	1	16.93	1	"L"	missing	111.0	101	120	"WASI"	"WASI"	"WASI"	24	14	6	1	1	missing	13	4	9	missing	1	missing	missing	missing	missing	missing	missing	missing	missing	

You are right, I can see that the type of the column is now of “Any”. Is there a way to change that just for this column? :thinking:

1 Like

I would probably do:


julia> df = DataFrame(DX_GROUP = [1, 2, -9999])
3×1 DataFrame
 Row │ DX_GROUP 
     │ Int64    
─────┼──────────
   1 │        1
   2 │        2
   3 │    -9999

julia> df.DX_GROUP = ifelse.(df.DX_GROUP .== 1, "AST", ifelse.(df.DX_GROUP .== 2, "TD", missing)); df
3×1 DataFrame
 Row │ DX_GROUP 
     │ String?  
─────┼──────────
   1 │ AST
   2 │ TD
   3 │ missing 

or sometimes in these situations I like to write an explicit function to apply which makes the code more legible:

julia> function encode_dx_group(x)
           if x == 1
               return "ASD"
           elseif x == 2
               return "TD"
           elseif x == -9999
               return missing
           else
               return "Unexpected value encountered!"
           end
       end
encode_dx_group (generic function with 1 method)

julia> encode_dx_group.([1, 2, -9999, 5])
4-element Vector{Union{Missing, String}}:
 "ASD"
 "TD"
 missing
 "Unexpected value encountered!"

the second option also has the advantage that it has a built-in check for the case where your data has a value that shouldn’t be there.

1 Like