I am generating a report in an excel format based upon a script within Julia. However, when the output in a file is uploaded onto One Drive, the date created and modified are incorrect, sometimes a few years out.
Could this down to the security tool scanning through the Julia code and altering the meta details of the file?
Welcome. It’s really hard to say what’s happening here, but I’d be very surprised if it’s related to Julia at all. I’d just go step by step on each system and make sure that the file’s metadata are correct at every step of the process.
Are you using XLSX.jl to create your Excel files? if so, which version?
I see the same effect. For example, an Excel file that I create from scratch has a date of 22/05/2018 03:42.
Here is one I created this morning (screenshot from Windows File Explorer):
The Date modified is actually the time my Julia script created the file. I don’t know where the other two dates come from.
I also create a range of plots using Makie, saved as .png files. These show as:
Again, the Date modified corresponds to the time my script ran.
The Date created possibly corresponds to the date a file of this name was first created, but I can’t be sure of this. Twice a week, I re-run the script and the files are overwritten.
I’m not at all sure what the significance of the 2018 date is.
XLSX.jl (post v0.10.4) uses ZipArchives.jl to create and write .xlsx files. This package’s Readme lists the following limitations:
Ignores time stamps.
No way to specify the modification time, times are set to 1980-01-01 00:00:00 DOS date time
Looking inside a .xlsx file produced in this way shows all the internal xml files as:
I raised this issue on ZipArchives.jl and I also reported it through my IT department, but so far, no progress on understanding the issue, never mind resolving it. (Edit - on re-reading, this last sentence sounds critical but it is not meant to be at all.)
I’ve created three simple Excel files form XLSX.jl v 0.10.2 (using ZipFiles.jl) and v0.10.4 and (unreleased) v0.11.0, both of which use ZipArchives.jl.
using XLSX
f=XLSX.openxlsx("v0.10.4", mode="w")
s=f[1]
s["B2"]="hello"
XLSX.writexlsx("v0.10.4.xlsx",f, overwrite=true) # file name as appropriate
I also created a file directly in Excel with the same content:
Here is what my organisation’s IT team has concluded:
I appreciate you forwarding the Excel File and link to the Data Processing folder for us to investigate. Investigating this issue further our assessment is as follows:
The “date” column in File Explorer is normally representing the “Date Modified”. You have pointed out that this is incorrect or not matching.
However, File Explorer is not doing anything, nor is Excel, i.e. neither of these are the issue.
That date is being written by the source, i.e. Julia.
Julia, or the library it uses to write Excel files, can explicitly set the file’s timestamp. This could be based on a dataset timestamp, a source file timestamp, or any number of other things. It may even be that the “original” data was created in 2018 and Julia is inheriting the date from that.
The bottom line is that by the time the file is written onto his device (or OneDrive), the date has already been set by Julia. The point here is that our systems are not doing anything with the file or changing it.
To get to the root cause would involve investigation with either Julia and/or their community, but this does not fall under the scope of IT Support.
The files in question are written in Julia using XLSX.jl which in turn relies on ZipArchives.jl and (before that) ZipFiles.jl.
I don’t know how to reconcile IT’s conclusions with the comments of @mbauman and @nhz2, both of whom are vastly more expert than me.
Can anyone suggest a mechanism by which the 2018 date might be being inherited?
Julia has a function futime or touch to change the file times, but I can’t find where this is getting called, so maybe this is unrelated to the issue.
I can touch(v0.10.4.xlsx) and this updates the Date modified but doesn’t affect the other two dates.
Help tells me:
help?> futime
search: mtime ctime @time time function fullname @timed @timev
Couldn't find futime
Perhaps you meant time, @time, ctime, mtime, @timev, @timed or fullname
No documentation found.
Binding futime does not exist.
So i think I didn’t understand what you meant about this.
I’m beginning to wonder whether Windows looks elsewhere than file system metadata for these dates…?
To me it looks like the Explorer shows an extra Date column that is internal file metadata and not filesystem metadata. Like a photo carries its own creation date in exif data. I would take a file from Excel, open it in Julia and resave the contents from there. Then compare the headers of both files with a hexdump tool. Probably the julia file has some field not set or filled with dummy data and that gets interpreted as that same date that every file shows in that column
@jules is right. It turns out this date value is read from the internal docProps\core.xml file in the blank.xlsx file that is used when a new XLSXFile is created.
I’ve added this option to the exported function newxlsx. Its docstring now additionally says:
Use keyword argument `update_timestamp=false` to prevent timestamps in the file properties from being
updated to the current date/time. This ensures bit-for-bit reproducible output when the file is written.
The file `Date` will remain as `2018-05-22T02:41:32Z`.
The default is `update_timestamp=true`, resulting in the `Date` being set to the current time in the new file.
newxlsx exclusively opens a blank file to create a new .xlsx file, so this seems the best place for the option. For simplicity, I have not added this option to functions that also support reading existing files (like openxlsx).
Wow, you learn something new every day. I never would’ve guessed that Windows was looking inside an xml file inside a zip archive masquerading as .xlsx in order to display what I would’ve thought would be filesystem metadata.
Nicely done, @TimG — you should update your priors on who all “are vastly more expert than” you.