Here is something I think might be a bit more robust
julia> using XLSX, DataFrames;
julia> function num_trailing_missing(x)
n = length(x)
s = 0
while true
n == 0 && break
!ismissing(x[n]) && break
s += 1
n -= 1
end
s
end;
julia> mat = XLSX.readxlsx("testdata.xlsx")[1][:];
julia> inds = [1:1, 3:3]; # You know the columns but not rows
julia> dfs = map(inds) do is
data = mat[2:end, is]
nms = mat[1, is]
df = DataFrame(data, string.(nms))
min_num_trailing_missings = minimum(num_trailing_missing.(eachcol(df)))
df = df[1:(end - min_num_trailing_missings), :]
# narrow the types
transform(df, names(df) .=> ByRow(identity); renamecols = false)
end
2-element Vector{DataFrame}:
2×1 DataFrame
Row │ SECTOR
│ String
─────┼─────────
1 │ IT
2 │ FINANCE
4×1 DataFrame
Row │ INDIVIDUAL
│ Int64
─────┼────────────
1 │ 1
2 │ 2
3 │ 3
4 │ 4
Overall this was harder than I thought. I don’t think it’s too different from @rafael.guerra 's answer, actually. However
- I take advantage of the fact that you know the starting and ending indices
- I narrow the types of the output so they are no longer
Any - I drop trailing
missingrather than allmissingvalues in the data frame.