Read data from excel starting from second row


#1

I’m trying to read data in an excel workbook that starts on the second row. Using ExcelFiles, I can I read into a dataframe if the data starts on the first row

DataFrame(load(fname, "sheetname")

But I’m not sure what to do when the data headers begin on the second (or later) row.


#2

If you know the size of your data, you can explicitly specify the input range:

julia> DataFrame(load("test.xlsx", "Sheet1!A2:B5"))
3×2 DataFrames.DataFrame
│ Row │ test1 │ test2 │
├─────┼───────┼───────┤
│ 1   │ 1.0   │ 4.0   │
│ 2   │ 2.0   │ 5.0   │
│ 3   │ 3.0   │ 6.0   │

edit: looks like it might be better to use ExcelReaders.jl, which skips blank starting lines by default, or allows you to skip some number of lines:
readxlsheet("test.xlsx", "Sheet1", skipstartrows=1) |> DataFrame


#3

load(fname, "sheetname") should skip empty rows at the beginning. If not, it is a bug and it would be great if you could file an issue with a replication case.

You can also do things like load(fname, "sheetname", skipstartrows=1) to skip a row at the beginning. You would do that if the row actually contains stuff, i.e. if it is not blank.

The whole documentation for this is a bit in flux. I’m starting to consolidate things here, if you want to take a look at an early version of the new documentation.


#4

Thanks all.

@davidanthoff This first row of my spreadsheet does have data, just not actually part of a tabular dataset, so I think this is the intended behavior. However, the skipstartrows keyword only works with readxlsheet, not load in my environment. Is this something that’s available in master but not version 0.3?


#5

Ah, that is a bug! Fixed on master, once https://github.com/JuliaLang/METADATA.jl/pull/14692 is merged it will go out widely. Thanks for reporting!


#6

Thanks @davidanthoff! And thanks for the great packages :slight_smile: