Excel data to dataframes

Here is something I think might be a bit more robust

julia> using XLSX, DataFrames;

julia> function num_trailing_missing(x)
           n = length(x)
           s = 0
           while true
               n == 0 && break
               !ismissing(x[n]) && break
               s += 1
               n -= 1
           end
           s
       end;

julia> mat = XLSX.readxlsx("testdata.xlsx")[1][:];

julia> inds = [1:1, 3:3]; # You know the columns but not rows

julia> dfs = map(inds) do is
           data = mat[2:end, is]
           nms = mat[1, is]
           df = DataFrame(data, string.(nms))
           min_num_trailing_missings = minimum(num_trailing_missing.(eachcol(df)))
           df = df[1:(end - min_num_trailing_missings), :]
           # narrow the types
           transform(df, names(df) .=> ByRow(identity); renamecols = false)
       end
2-element Vector{DataFrame}:
 2×1 DataFrame
 Row │ SECTOR  
     │ String  
─────┼─────────
   1 │ IT
   2 │ FINANCE
 4×1 DataFrame
 Row │ INDIVIDUAL 
     │ Int64      
─────┼────────────
   1 │          1
   2 │          2
   3 │          3
   4 │          4

Overall this was harder than I thought. I don’t think it’s too different from @rafael.guerra 's answer, actually. However

  1. I take advantage of the fact that you know the starting and ending indices
  2. I narrow the types of the output so they are no longer Any
  3. I drop trailing missing rather than all missing values in the data frame.
1 Like