DataFrames: convert column data type

The syntax is not deprecated but it is not allowed and it errors, so most likely @wakimchris is using old versions of CSV.jl and DataFrames.jl. That is why I have asked about it.

CSV v0.5.7
DataFrames v0.18.4

When I tried
df[:,:Id_internal ] = parse.(Int64,df[:2,:Id_internal ])
and
df[:,:Id_internal ] = tryparse.(Int64,df[:2,:Id_internal ])

I got
ERROR: MethodError: Cannot 'convert' and object of type Int64 to an object of type String
This was a strange message because I want to do to other way around

I am not using a guide to learn DataFrames but rather trying to debug it, but can’t figure it out

Would my syntax work on a newer version?

Sorry, you need one more change I should have mentioned earlier

df[!,:Id_internal ] = tryparse.(Int64,df[:,:Id_internal ])

The ! means your replace the column entirely.

the syntax for selecting columns using df[[:col1, :col2]] will not work. But everything else you are doing will be the same.

It is crucial to update DataFrames.jl and CSV.jl to the newest versions. You have very old versions of both that contained many bugs.

The reason why df[[:col1, :col2]] does not work any more is that data frame is a 2-dimensional object, so we strictly require to pass both row and column index when subsetting a data frame, e.g. like df[:, [:col1, :col2]].

2 Likes

I understand that I have to DataFrames.jl and CSV.jl to the newest versions, but I am limited to the cluster on which I am running.
For my current version the
df[[:col1, :col2]] or df[:, [:col1, :col2]] both works (I changed it to the second one though).

What I fail to do is, to push from one df that have string for columns to another df that have Int64 and Date for types.

I thought I can change the data type of column in between creating my df and pushing it into my new df_types for the two columns.

Would pushing it into a new df work as a work around for changing the type?

As I wrote in my answer above, you want to use !

df[!,:Id_internal ] = tryparse.(Int64,df[:,:Id_internal ])

Also, did you fix your initial problem of having the headers being accidentally treated as data?

Why not call parse when you push!?

You can have separate Project.toml and Manifest.toml for your project and they can have newest versions of the packages.

1 Like

I found a solution:
trying to change data type seemed a bit difficult so I changed the data type as I am pushing the items

df_new = DataFrame(Id_internal=String[ ], Date=Date[ ])
#there is no space in between “[” and “]”

for i in 1:length(df[:Id_internal])
push!(df_new, [(string(df[i, :Id_internal])), (df[i, : Date])])
end

Thank you @pdeffebach and @bkamins

I wanted to convert the type of a column from Int64 to String, but none of above would work. Eg. for

df[!,:B] = convert.(String,df[:,:B])

I would get

ERROR: MethodError: Cannot convert an object of type Int64 to an object of type String

Surprisingly I found the solution on an old (2014) google forum:

df[!,:B] = string.(df[:,:B])

Other types don’t seem to work this way. So you can’t write

df[!,:B] = int64.(df[:,:B])

(Or Int64, etc)

So, to summarise, if I convert a column of type Int64 to one of type String, I need to use
string.(column).

If I want to convert a column of type String to one of type Float32, I need to use
parse.(type, column).

If I want to then convert the column of type Float32 to one of type Int64, then, surprisingly,
parse.(type,column) will not work. Instead, I need to use
convert.(type,column).

What is going on?

This makes sense.

  • string turns things into strings
  • parse turns a string into a Julia type. It only works on strings and not other types.
  • convert converts from one Julia type to another, for things that “behave the same”. Float32 and Int64 are both numbers, for the most part. Whereas a String and an Int64 are different things entirely.
2 Likes

It should also be noted that none of this actually has anything to do with DataFrames, it’s just about how changing from one type representation to another works for certain objects everywhere in Julia.

This is false, actually Int.(df[!,:B]) will convert a String array into an Int64 array.

Thanks @pdeffebach. One issue is that I don’t know what you mean by “Julia Type” in

From a limited reading of the julialang discourse, my understanding is that Base.parse maps strings to numbers (whilst attempting to determine the type of number).

Also, convert(T, x) seems to simply be a conservative version of T(x) (from Conversion and Promotion · The Julia Language).

What remains surprising is the lower case in string(x) vs the upper case of the type String. (String(x) does not work.) In contrast, int(x) does not work whereas Int(x) does. And, Float(x) does not work whereas float(x) does! I mean why should I be expected to remember the exceptional case for integers?

In any case, it can be fixed by defining int = Int. Then the function int behaves like the functions string and float, but I was hoping the default T class of functions would be less quirky. (I guess my hopes were elevated by the beauty of the REPL and the syntax elsewhere in Julia.)