Read data from excel starting from second row

I’m trying to read data in an excel workbook that starts on the second row. Using ExcelFiles, I can I read into a dataframe if the data starts on the first row

DataFrame(load(fname, "sheetname")

But I’m not sure what to do when the data headers begin on the second (or later) row.

If you know the size of your data, you can explicitly specify the input range:

julia> DataFrame(load("test.xlsx", "Sheet1!A2:B5"))
3×2 DataFrames.DataFrame
│ Row │ test1 │ test2 │
├─────┼───────┼───────┤
│ 1   │ 1.0   │ 4.0   │
│ 2   │ 2.0   │ 5.0   │
│ 3   │ 3.0   │ 6.0   │

edit: looks like it might be better to use ExcelReaders.jl, which skips blank starting lines by default, or allows you to skip some number of lines:
readxlsheet("test.xlsx", "Sheet1", skipstartrows=1) |> DataFrame

1 Like

load(fname, "sheetname") should skip empty rows at the beginning. If not, it is a bug and it would be great if you could file an issue with a replication case.

You can also do things like load(fname, "sheetname", skipstartrows=1) to skip a row at the beginning. You would do that if the row actually contains stuff, i.e. if it is not blank.

The whole documentation for this is a bit in flux. I’m starting to consolidate things here, if you want to take a look at an early version of the new documentation.

2 Likes

Thanks all.

@davidanthoff This first row of my spreadsheet does have data, just not actually part of a tabular dataset, so I think this is the intended behavior. However, the skipstartrows keyword only works with readxlsheet, not load in my environment. Is this something that’s available in master but not version 0.3?

Ah, that is a bug! Fixed on master, once https://github.com/JuliaLang/METADATA.jl/pull/14692 is merged it will go out widely. Thanks for reporting!

Thanks @davidanthoff! And thanks for the great packages :slight_smile: