I have not experience with hosting docs on github, but maybe you could add another (temporary ?) shield for your docs if you already build them ?
The changes documented above and a host of others have now been merged into the #master branch of XLSX.jl. The revised docs are also linked as the Dev docs, too.
I have one further PR that is waiting to merge which adds a setFormula function, allowing you to add any Excel function to a cell or cell range from XLSX.jl and properly respects absolute and relative cell addressing.
julia> setFormula(sheet, "B2:J10", "=A2+B1") # Adds simple formula to a range of cells.
julia> setFormula(sheet, "E1:G1", "=sort(unique(A2:A7),,-1)") # adds a dynamic array function with a spill range.
julia> setFormula(sheet, "C1", " = max(B1#)") # Applies max() to the collected values in the spill range of B1.
@felipenoris has tagged the #master branch as v0.11.0 so Iām hoping a new release is coming soon!
Iāve been looking at memory usage in XLSX.jl and have now made a PR to reduce the memory footprint by about 40% compared with the current release (v0.10.4).
I illustrate with a simple spreadsheet of 10,000 rows x 21 columns containing a subset of data downloaded from this site (it downloads as a CSV, which I cut back and convert to an .xlsx file in Excel). This file has no formulas but it does have a large number of large shared strings and many numbers and dates - it has no magical properties! If I use About.jl to compare the memory footprint of the XLSXFile created when this file is loaded in "rw" mode, I see:
v0.10.4
about(f) - 41MB
about(wb.sst) - 13MB (included in f)
about(s.cache.cells) - 27MB (included in f)
However, v0.10.4 uses EzXML.jl to access the .xlsx file, and then stores the xml data it loads into the XLSXFile inside a C wrapper. I believe this is not seen by About.jl but, using simple code suggested by Claude, I calculate this to be an additional 10MB.
This brings the total memory footprint in v0.10.4 to 51MB!
In comparison, v0.11.0 uses XML.jl, so all of its memory use is seen by About.jl. For v0.11.0 + this PR, I see a total memory footprint of 30MB, as follows:
v0.11.0 + PR
about(f) - 30MB
about(wb.sst) - 9MB (included in f)
about(s.cache.cells) - 15MB (included in f)
There are three principal changes that have brought about this reduction. The first was already implemented (in v0.11.0, the current dev branch of XLSX.jl), and reduces the 10MB used previously by EzXML.jl to few hundred KB. The second tackles the sst cache while the third tackles the cell cache and is the biggest and most significant in terms of changes to the code base. These last two are the subject of the latest PR.
- In v0.10.4 (in
"rw"mode), all the internal .xml files are loaded and stored. Mostly these are small but, where there are many cells and/or many ssts, this results in large files which substantially duplicate the information held in the cell cache and sst cache respectively. In v0.11.0, the XML nodes holding the ssts and cell data are stripped out of the respective xml files as they are read in to their caches. On write, these nodes are created from the data in the cache anyway and always have been, so the duplicate data in v0.10.4 serve no purpose. - In v0.10.4 when sharedStrings are read in, they are eagerly converted to plain text strings and both the XML representation and the plain text string representation are stored in the sst cache. The XML representation may contain complex rich text formatting information, so canāt simply be discarded. In v0.11.0+PR, only the XML representation is stored in the sst cache and plain text strings are lazily generated when needed. Since, for most time-critical use cases (eg reading to a
DataFrame), this still results in only one conversion per sst, there is no time penalty, but the memory saving is very significant. - Cells are the largest users of memory in an
XLSXFile. In v0.10.4, the cell cache stores information as strings, so that a number is stored as itās string representation, as are dates and datetimes (using their internal Excel representation). AnXLSX.Cellstructure would typically be around 100B in size, and this is obviously repeated for every non-empty cell. In v0.11.0+PR, Iāve converted anXLSX.Cellto always include only isbits types and to always and consistently take only 24B. The price for this has been the need to create a separate cache of cell formulas outside the cell (it is held in theWorkbookin the same way as the sst cache). Since most cells do not contain formulas, the net result is a considerable saving.
Although these changes have required a major overhaul of the code, I donāt believe there are any changes to any aspect of the public API. Moreover, I donāt think there is any regression in terms of performance compared with v0.11.0-dev, either (no gains though.
)
Iāll be very glad to hear your experiences of using this new version if you want to give it a go, and I will endeavour to fix any issues you uncover!
Here are a couple of small updates.
- Iāve added support for
AnnotatedStringsvia a small package extension. Using the code in this PR, you can now create rich text cell values from annotated strings directly:
julia> using XLSX, StyledStrings
julia> f=XLSX.newxlsx()
XLSXFile("blank.xlsx") containing 1 Worksheet
sheetname size range
-------------------------------------------------
Sheet1 1x1 A1:A1
julia> s=f[1]
1Ć1 Worksheet: ["Sheet1"](A1:A1)
julia> s["A1"] = styled"{yellow:hello} {blue:there}"
"hello there"
julia> s["A2"] = styled"The {bold:{italic:quick {(foreground=#cd853f):brown} fox} jumps over the {(foreground=#FFC000):lazy} dog}"
"The quick brown fox jumps over the lazy dog"

- Iāve created an (
unregisteredupdate: now registered) experimental package to decrypt password protected Excel files so they can be read by XLSX.jl. This package was almost entirely created by Claude with very little input from me but it does seem to work OK. Please treat with due caution.
julia> using XLSXDecrypt, XLSX
julia> buf = decrypt_xlsx("protected.xlsx", "password")
julia> f=openxlsx(buf, mode="rw")
XLSXFile("IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=8554, maxsize=Inf, ptr=8555, mark=-1)") containing 1 Worksheet
sheetname size range
-------------------------------------------------
Sheet1 3x1 A1:A3
Edit: Iām not allowed to make three replies in succession without any intermediate replies from others, so Iām resorting to this edit. Maybe this is a sign that if/when I make a new release I should start a new topic!
XLSX.jl has now successfully moved to the JuliaData org. Thanks to @felipenoris and @quinnj for making that happen.
Iāve merged the latest PRs into the main dev branch on the repo. This includes support for AnnotatedStrings and support to set Excel formulas in cells using a setFormula function.
Iāll wait a few days at least before I attempt a new release, but I donāt think it will be very long unless someone reports a bunch of issues on the dev branch. If/when I make a new release there will be a bunch of issues, some very old, that I will be able to close at the same time.