I am still a novice in Julia. I am writing a code to import/read a total 16 data (.txt) files each having 504*1025 matrix. Essentially trying to make a 3D data cube from the 2D datasets using a For loop. But it takes 6.9 seconds for the code with 22 M allocation. I will really appreciate any suggestions and advice to make the code faster. Please see the code below:
using Gtk4
using DelimitedFiles
function LoopScans()
file_path = open_dialog("My Open dialog")
file_location, filename = dirname(file_path), basename(file_path)
Rawdata = readdlm(file_path)'
file_name, file_ext = split(filename, ".")
filestring, ScanIndex = match(r"[A-Za-z]+", file_name).match, parse(Int, match(r"\d+", file_name).match)
files = readdir(file_location)
# Filter files based on the pattern
matching_files = filter(f -> occursin(filestring * r"\d+\." * file_ext, f), files)
isempty(matching_files) && throw(ArgumentError("No files found matching pattern"))
sample_data = readdlm(joinpath(file_location, matching_files[1]))'
allScans = Array{Float64, 3}(undef, size(sample_data)..., length(matching_files))
for (ii, file) in enumerate(matching_files)
allScans[:, :, ii] = readdlm(joinpath(file_location, file))'
end
return allScans
end
@time LoopScans()
6.968473 seconds (23.50 M allocations: 961.212 MiB, 3.03% gc time, 5.37% compilation time)
1025Ă—504Ă—16 Array{Float64, 3}:
That makes no difference in this example. a[:, :, i] = function_returns_array() allocates an array for the right-hand-side and then writes it in-place into a slice of a. Changing = to .= does the same thing.
It depends on where the time is spent. If it’s spent on reading from the file system, there is little you can do. If it’s spent on parsing the input into floats, you can read in parallel, if you’ve got more than one cpu/core. Remember to start julia with threads, e.g. $ julia -t auto on linux.
using Base.Threads
...
...
@threads for i in eachindex(matching_files)
allScans[:, :, i] = readdlm(joinpath(file_location, matching_file[i]))
end
(note that @threads is a bit picky about what you loop over, it should be a vector).
On my box with 8 cpus this gets the time down from 2.7 to 0.6 seconds. But, this depends on disk speed, if the files are in the disk cache, cpu speed, etc, etc.
You should also try a more optimized file-import package like CSV.jl rather than the DelimitedFiles stdlib (which is simple and convenient but relatively slow).
Thank you all for your suggestions and advice. Using the CSV.jl and threading as suggested by @stevengj and @sgaure does indeed improve the performance. For my Intel Core i7-12700 with 20 cores, the time is reduced from 6.96 seconds to 2.18 seconds (49.92 M allocations: 1.441 GiB). The allocations and memory usage seems too high though. Below is the latest code:
using DataFrames
using CSV
using Gtk4
using Base.Threads
function LoopScans()::Tuple{Array{Float64,3}, String}
file_path = open_dialog("Select data scans")
file_location, filename = dirname(file_path), basename(file_path)
file_name, file_ext = split(filename, ".")
filestring, ScanIndex = match(r"[A-Za-z]+", file_name).match, parse(Int, match(r"\d+", file_name).match)
files = readdir(file_location)
matching_files = filter(f -> occursin(filestring * r"\d+\." * file_ext, f), files) # Filter files based on the pattern
ScanNumber=length(matching_files)
isempty(matching_files) && throw(ArgumentError("No files found matching pattern"))
Rawdata = DataFrame(CSV.File(joinpath(file_location, matching_files[1])))
if size(Rawdata, 1) > size(Rawdata, 2);
Rawdata = Rawdata;
else
Rawdata = permutedims(Rawdata);
end
allScans = Array{Float64, 3}(undef, size(Rawdata)..., ScanNumber)
@threads for (ii, file) in collect(enumerate(matching_files))
allScans[:, :, ii] .= permutedims(DataFrame(CSV.File(joinpath(file_location, file))))
end
return allScans, file_location
end
@time LoopScans()
2.183225 seconds (49.92 M allocations: 1.441 GiB, 3.95% gc time)
The package performance is tuned with two goals in mind, a) low overhead of allowing missing values everywhere, and b) the following priorities - in order of importance:
Low compilation time
Memory efficiency
High performance
I used it with good success, but if it is better for your use case I do not know.