I am trying to zip files in the directory which is around 300 - 400 MB and it’s taking more than 15 mins to finish. Initially I was using custom function written using ZipFile package. I have tried using Base.zip which isn’t helpful and even tried zip_files method from ZipStreams package. This was working fine but zipped file will be having sub-directories if input path provided consists of sub-directories. I have tried to use zipsink as well but no luck. Any suggestions?
function zip(archive_name::String, files::Vector{Any}=String[])
@info "Zip all files"
isempty(files) && return
w = ZipFile.Writer(archive_name)
for file in files
@show file
ff = split(file, "/")
ff= length(ff) > 3 ? ff[4] : ff[3]
f = ZipFile.addfile(w, ff, method=ZipFile.Deflate)
write(f, read(file, String))
end
close(w)
end
This is some code I have used to call the 7z binary, which comes bundled with Julia.
using p7zip_jll: p7zip
# Explanation of 7z options:
# `a`: Add files to an archive.
# `-tzip`: Create a zip archive instead of a 7z archive.
# `-mm=deflate`: Compress with DEFLATE algorithm.
# `-mx=9`: Set compression level to maximum.
run(pipeline(`$(p7zip()) a -tzip -mm=deflate -mx=9 $(archive_name) .`,
stdout = devnull))
You can probably adapt it to your needs but you need to find the 7z documentation externally.
@Sandy45 That seems very slow. I use ZipArchives.jl and it will create an archive over 1GB in much less time than that.
function zipfiles(zip, fnames) # write files in fnames to an open zip file.
if length(fnames) > 0
for name in fnames
if !ismissing(name)
if !isfile(name)
throw(LoadError("", 0, "Specified file not found: $name"))
end
f = open(name, "r")
content = read(f, String)
close(f)
name = trimpath(name)
zip_newfile(zip, name; compress=true)
write(zip, content)
end
end
end
return nothing
end
function addtoZIP(zipname, fnames; append=false) # add or append files to a zip file
if append
if isfile(zipname)
zip_append_archive(zipname) do zip
zipfiles(zip, fnames)
end
else
throw(LoadError("", 0, "Specified file not found: $zipname"))
end
else
ZipWriter(zipname) do zip
zipfiles(zip, fnames)
end
end
return nothing
end
I chose ZipArchive specifically because it allows me to append extra files to an existing archive.
function trimpath(file) # remove the leading path to leave just the filename remaining.
l = findlast("\\", file)
if l !== nothing
file = file[nextind(file, first(l)):end]
end
return file
end
This may rely on a Windows path separator - but you seem to be on Windows…
The level can be 1 to 9 where 1 is fastest and 9 is smallest file size. By default this is 6 as a compromise.
Also, yes it is a very bad idea to save absolute paths in a ZIP archive, because if someone tries to extract that archive it may cause errors or if the zip extractor isn’t carefully written may delete unexpected files in the filesystem.
If you really know what you are doing you can disable all entry name checks with for example:
open("test.zip"; write=true) do fileio
ZipWriter(fileio; check_names=false) do zip
zip_newfile(zip, ":::")
end
end
This will lead to the following error on extracting on windows:
Thank you for the correction! I’m mostly blindly trust ZipStreams.jl’s README because I happen to know the maintainer and trust him. I have no benchmarks to substantiate my comment here
I am trying to run this and trying to check if “.” denotes all files in the working directory being zipped? Can you provide me example of how to zip all the files in a directory or if vector of file paths being passed ?
returns a vector of the names of all files in the current directory.
You can pass this array to the functions given above to put all files in pwd() into the zip.
If you want files in another folder then you might try
fname = joinpath.(dir_name, readdir(dir_name))
where dir_name is a string containing the folder path.
The code I posted zips all files in the current directory. But it was some time since I wrote it and I have never learned more of the 7z functionality than I have needed at the time.
I’ve made a simple benchmark, and p7zip_jll is much faster and results in a smaller file on an 8-core AMD Ryzen 7 7800X3D CPU. When restricted to a single thread, ZipArchives with compression level 1 is the fastest, but it creates a larger file. I’m not sure how to set the compression level using ZipStreams.