Issues Importing Excel Files with XLSX.jl Package

Dear all,

I’m currently using the XLSX.jl package in Julia to import several Excel files. Unfortunately, I don’t have access to the original sources of these files and have to work with the existing versions.

I’ve encountered an issue where I must open the files in Excel and save them again before I can import them into Julia. If I don’t, I receive the following error:

AssertionError: Couldn’t find xl/worksheets/sheet1.xml in not_working.xlsx. (see full stacktrace below)

I suspect the files might be corrupted or not properly formatted. Here is a snippet of my code:

using XLSX

# Attempt to open the original file
f1 = XLSX.openxlsx("not_working.xlsx")

# Open and save the file in Excel, then try to open again
f2 = XLSX.openxlsx("working.xlsx")  # This file was opened and saved again in Excel

I’ve uploaded the problematic Excel files here: https://we.tl/t-agL1yFP38l

Has anyone experienced similar issues or have any suggestions on how to handle these files without manually opening and saving them in Excel? Any insights or alternative approaches would be greatly appreciated.

Thank you!

Blockquote
ERROR: AssertionError: Couldn’t find xl/worksheets/sheet1.xml in not_working.xlsx.
Stacktrace:
[1] open_internal_file_stream
@ C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\stream.jl:46 [inlined]
[2] read_worksheet_dimension(xf::XLSX.XLSXFile, relationship_id::String, name::String)
@ XLSX C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\worksheet.jl:32
[3] XLSX.Worksheet(xf::XLSX.XLSXFile, sheet_element::EzXML.Node)
@ XLSX C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\worksheet.jl:8
[4] parse_workbook!(xf::XLSX.XLSXFile)
@ XLSX C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\read.jl:353
[5] open_or_read_xlsx(source::String, read_files::Bool, enable_cache::Bool, read_as_template::Bool)
@ XLSX C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\read.jl:236
[6] openxlsx(source::String; mode::String, enable_cache::Bool)
@ XLSX C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\read.jl:172
[7] openxlsx(source::String)
@ XLSX C:\Users\phil25.julia\packages\XLSX\U2Bcm\src\read.jl:164
[8] top-level scope
@ c:\Users\phil25\Gits\Research\dsa\julia\LoadResults.jl:44

Hello,

I think the issue is somehow “xl/worksheets/sheet1.xml” got renamed to “xl/worksheets/Sheet1.xml” with a capital S

julia> using ZipArchives

julia> r = ZipReader(read("not_working.xlsx"))
6645 byte, 13 entry ZipReader{Vector{UInt8}}
total uncompressed size: 20787 bytes
  "[Content_Types].xml"
  "_rels/"
  "docProps/"
  "xl/"

julia> zip_names(r)
13-element Vector{String}:
 "[Content_Types].xml"
 "docProps/"
 "docProps/app.xml"
 "docProps/core.xml"
 "xl/"
 "xl/sharedStrings.xml"
 "xl/workbook.xml"
 "xl/worksheets/"
 "xl/worksheets/Sheet1.xml"
 "xl/_rels/"
 "xl/_rels/workbook.xml.rels"
 "_rels/"
 "_rels/.rels"

The following function seems to fix the issue, but I’m not sure if this is a robust solution.

julia> function fix_broken_xlsx(broken_filename, new_filename)
           r = ZipReader(read(broken_filename))
           ZipWriter(new_filename) do w
               for name in zip_names(r)
                   zip_isdir(r, name) && continue
                   content = zip_readentry(r, name)
                   if name == "xl/worksheets/Sheet1.xml"
                       name = "xl/worksheets/sheet1.xml"
                   end
                   zip_writefile(w, name, content)
               end
           end
       end
fix_broken_xlsx (generic function with 1 method)

julia> fix_broken_xlsx("not_working.xlsx", "maybe_working.xlsx");

julia> f1 = XLSX.openxlsx("maybe_working.xlsx")
XLSXFile("maybe_working.xlsx") containing 1 Worksheet
            sheetname size          range
-------------------------------------------------
                   de 4x103         A1:CY4
1 Like

Thanks a lot, Nathan. I appreciate your solution. I did not manage to use ZipArchive but followed your approach like this

using ZipFile

function fix_broken_xlsx(broken_filename, new_filename)
    r = ZipFile.Reader(broken_filename)
    w = ZipFile.Writer(new_filename)
    
    for f in r.files
        # Read the content of the file
        content = read(f)
        
        # Check and rename if necessary
        if f.name == "xl/worksheets/Sheet1.xml"
            wf = ZipFile.addfile(w, "xl/worksheets/sheet1.xml")
            write(wf, content)
        else
            wf = ZipFile.addfile(w, f.name)
            write(wf, content)
        end
    end
    
    # Close the readers and writers
    close(r)
    close(w)
end
''