Reading in xlsx with multiple header rows using the XLSX Pkg

I’m attempting to use API Reference · XLSX.jl to (called from a custom wrapper tool) to read in an excel workbook and create a dataframe.

However, the header is split across the top two rows → is there an alternative package/approach in Julia to specify the double header rows?

Thanks

I don’t think there’s anything pre-built to handle this, but you should be able to pre-process the data to mangle the column headers together. As an example with this XLSX file

image

# f = path to file
julia> parse_dual(f)
4×4 DataFrame
 Row │ a.a    a.b    b.c    b.d   
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      5     10     15
   2 │     2      6     11     16
   3 │     3      7     12     17
   4 │     4      8     13     18
import XLSX
using DataFrames: DataFrame

function parse_dual(f, sheet=1)
    xf = XLSX.readxlsx(f)
    data = xf[sheet][:] # copy to matrix

    header = Symbol[]
    last_seen_row1 = nothing
    for (row1, row2) in zip(data[1, :], data[2, :])
        if !ismissing(row1)
            last_seen_row1 = row1
        end
        push!(header, Symbol(last_seen_row1, ".", row2))
    end

    columns = AbstractArray[]
    for col in eachcol(data[3:end, :])
        push!(columns, [x for x in col]) # comprehension reinfers type
    end
    return DataFrame(columns, header)
end
3 Likes

Thanks Chris!