XLSX and DataFrame column type

Hello everyone :grin:,
I want to load an excel file with the XLSX.jl package and then convert it to DataFrame with DataFrame.jl with the following code:

data = DataFrame(XLSX.readtable(path_data, "Sheet1"))

However, the type of all columns is automatically set to Any. This is very annoying when, for example, you want to plot with Makie, which doesn’t accept Any types (even though the columns used are composed of a single type, such as FLoat64). This means you have to convert the input to the correct type every time, and give it to Makie.jl.
How can I set the correct column type directly when loading an Excel file with XLSX.jl / DataFrame.jl?

Thanks in advance
fdekerm

Check this post and thread.

3 Likes

Thanks, I indeed read the doc too quickly! You need to add the infer_eltypes argument to XLSX.readtable

1 Like

To take advantage of CSV.read’s eltype capabilities (such as conversion to InlineStrings) and other keyword arguments, something similar to this function may be useful:

function exceltodf(path, file, sheetname)
    inp = joinpath(path, file)
    io = IOBuffer()
    CSV.write(io, DataFrame(XLSX.readtable(inp, sheetname)))
    df = CSV.read(seekstart(io), DataFrame; normalizenames = true, dateformat = "m/d/Y", missingstring = "")
    close(io)
    return df
end
1 Like