ExcelReaders package: how to close an .xlsx file after opening it with: "openxl"?

#1

Hello,

I am using ExcelReaders package to read several sheets from a very large .xlsx file.
Thus, instead of repeating foo_n = readxlsheet("data.xlsx", "sheetname_n") for n times, it seemed to me to be more convenient and faster to use:

main_file = openxl("data.xlsx")
foo_1 = readxlsheet(main_file, "sheetname_1")
foo_2 = readxlsheet(main_file, "sheetname_2")
...

Which, if I got it right, keeps the whole large file "data.xlsx" in memory, so to permit a faster access whenever e.g. a new sheet needs to be read. Now I wonder: is there a way to close such a large file once not needed anymore in the program?
I presume that by freeing up some memory, the latter could be used to run faster a very long script for instance.

Thank you in advance!

0 Likes

#2

@davidanthoff would xlfile.workbook[:release_resources]() make any difference, or is memory released through PyCall unavailable to Julia?

0 Likes

#3

I think that would probably help! My understanding right now is that we actually load the entire content of an Excel file into memory on the Python side initially, and while we are holding onto that python object (https://github.com/queryverse/ExcelReaders.jl/blob/301e68d7973cf611e86cab14a78fb65bf6e64c2f/src/ExcelReaders.jl#L23) we are keeping that stuff in memory. I’m not sure how the PyCall story works on that end, but I assume that there might be a finalizer running for that PyObject that then reduces the ref count on the Python side? So essentially when the main_file var goes out of scope and is handled by the GC, it probably also frees the memory on the python side.

I think it might well be worth adding a method to close that accepts main_file, and that would trigger this closing earlier. @ale, do you want to open an issue on the repo? Thanks!

1 Like

#4

Maybe you should give it a try and use XLSX.jl. There’s a specific mode for reading large files. See https://felipenoris.github.io/XLSX.jl/latest/tutorial.html#Reading-Large-Excel-Files-and-Caching-1.

3 Likes

#5

Yeah, I would second XLSX.jl! If you can avoid the Python dependency of ExcelReaders.jl you might also just generally have an easier time.

1 Like