DataFrames: convert column data type

Usor · March 4, 2020, 3:03pm

Was looking around, but didn’t find an answer.

Specifically, I have a DataFrame and one column has data type Int64; it has the unique values 0 and 1 (meaning obviously false and true). What is the easiest quickest way to have the column including its entries converted to the boolean data type?

I thought about a kind of loop, but are there already existing functions available that I should know of?

More generally, please explain how to best tackle this (especially conceptually), if using a self-made solution.

My thought then might be to take the whole array/column, check every value, make a new array based on set conditions (if 0, make false; if 1, make true, etc.), mutate or add the new array into the dataframe.

nilshg · March 4, 2020, 3:13pm

df.int_col .== 1 will return a BitArray column

Usor · March 4, 2020, 3:20pm

Very useful. Thanks.

tbeason · March 4, 2020, 4:47pm

A more general way to do this is (assuming the column is called x)

df[!,:x] = convert.(Bool,df[!,:x])

Usor · March 5, 2020, 10:51am

Thanks for the reply.

Usor · March 13, 2020, 3:57pm

I suppose I can post this here, since it concerns a similar issue.

There is a dataframe. It has a String column with missing values. Its values are actually integers.

What is the most direct and easiest way to convert this whole column of String (with missing) to one of Int64 (with missing)?

I thought of your generic way, @tbeason, but it seems it requires more in this case. I thought there was a function to convert such strings to int, but I could be mistaken.

Usor · March 13, 2020, 4:05pm

I’m getting some progress. I found the function parse().

pdeffebach · March 13, 2020, 5:07pm

Unfortunately parse doesn’t work with missings. You are looking for passmissing from Missings.jl.

julia> df.col = passmissing(parse).(Int, df.col)

Usor · March 14, 2020, 12:55pm

Thanks.

kailukowiak · May 26, 2020, 10:51pm

I believe that you could just have

df[:,:x] = convert.(Bool,df[!,:x])

(notice that : instead of !) to avoid making two copies.

kailukowiak · July 29, 2020, 5:47pm

My mistake, it has to be the other way around:

df[!,:x] = convert.(Bool,df[:,:x])

wakimchris · January 23, 2021, 7:50pm

When I do
df[!,:x] = convert.(Int64,df[:,:x])
I get

ERROR: MethodError: Cannot 'convert' and object of type String to an object of type Int64

How to get this resolved ?

pdeffebach · January 23, 2021, 7:53pm

You want parse.(Int64, df[:, :x])

wakimchris · January 24, 2021, 6:22pm

My column is called
Id set as String, need to change it to Int64
and my column have 10 digit codes as entries

df[!,:Id] = parse.(Int64,df[:,:Id])

I get
ERROR: ArgumentError: invalid base 10 digit 'I' in "Id"

pdeffebach · January 24, 2021, 6:32pm

Could you copy and paste par of your column into this thread? You probably want tryparse which will return nothing if parse finds a column like "Id209.4", which can’t be parsed as a float.

wakimchris · January 24, 2021, 6:41pm

my cloumns are:

Id_internal   |     Date
7483947898    |     2020-11-28
7475629104    |     2021-01-23
7384881913    |     2020-12-28

Both columns are set as integers but I would want to push the first into an Int64 and the second into a Date

pdeffebach · January 24, 2021, 7:46pm

Your problem is that the first row of your data frame is "Id_internal" and "Date".

How are you reading in your data? Perhaps you can change it so that your data doesn’t accidentally include the names of your variables.

Did you do my tryparse idea? That should fix it in the meantime. You can also do

df = df[2:end, :]

to get rid of the first row.

wakimchris · January 24, 2021, 8:44pm

I tried
df[:2,:Id_internal ] = tryparse.(Int64,df[:2,:Id_internal ])
and
df[:2,:Id_internal ] = parse.(Int64,df[:2,:Id_internal ])

both gave me
ERROR: setindex! not defined for WeakRefStrings.StringArray{String,1}

I read my data as follow:
df_all = CSV.File("file.csv", delim = '\t' |> DataFrame
I then I create a df with what I need
df = df_all[[:Id_internal, :Date]]

bkamins · January 24, 2021, 9:44pm

What is the version of CSV.jl and DataFrames.jl you are using?

pdeffebach · January 24, 2021, 9:44pm

That is indeed a very odd error message. To be honest I don’t know exactly why you are getting it. But note that you should be writing df[:, :Id_internal], not df[:2, :Id_internal]

cc @quinnj for why the user might have gotten such an odd error. I can’t replicate it.

This is old, deprecated, syntax. Its a concern that people are still finding this syntax in tutorials. Can you please post a link to the guide you are using to learn DataFrames?

Topic		Replies	Views
How to parse/convert integers in DataFrame to float numbers New to Julia dataframes	30	1951	March 19, 2021
Assignment of a `missing` value fails in DataFrames 0.11.1 Data	28	4737	June 28, 2018
How to convert the String15 datatype in DataFrames.jl to Float64 General Usage type , dataframes , convert	9	1038	March 12, 2023
Type conversion driving me crazy Data dataframes	10	1155	September 14, 2022
Can't convert numerical values to string in a DataFrame (Cannot convert String to Int64) General Usage question , error , dataframes	5	1403	November 17, 2021

DataFrames: convert column data type

Related topics