Best Julian way to copy many files?

Hello.

I’m wondering what is the best Julian way to copy files. My particular use case is I have a large number of files (of varying size) and I’m automating their data cleaning and backup process. Without going into great detail, it’s the writing/copying in the backup process that is the bottleneck. Doing a normal broadcasting copy in Julia yields inferior results compared to a manual “copy & paste” in Windows so I’m looking for alternate solutions.

Manual select-all copy takes x time.

Broadcasting Julia’s cp takes ~5x as long as manual: cp.(source_files, backup_files)

Julia’s asyncmap takes ~4x as long as manual: asyncmap(cp, source_files, backup_files)

Julia’s Threads.@spawn macro takes ~1.5x as long as manual: paired_files = zip(source_files, backup_files); tasks = [Threads.@spawn(cp(file[1], file[2])) for file in paired_files]

I will say running length(Sys.cpu_info()) gives 4 which is how many threads I run this script with so I’d assume it’s “even” with any multi-threading the copy & paste manual version does.

Any suggestions for efficiently writing/copying multiple files quickly? I’m happy with the Threads.@spawn approach performance but would like to see others’ ideas. Thanks.

You can try GitHub - shashi/FileTrees.jl: Parallel file processing made easy

3 Likes

You may also want to consider Windows’ built-in robocopy tool, which allows for over-network compression, restartable transfers, multithreading, incremental file transfer for backups, and other goodies.

3 Likes