Is there a way to speed up this code?

Luis_Manuel_Espinoza · August 16, 2023, 7:11pm

function open_xyz_file(path::AbstractString)
file = open(path)
num_atoms = parse(Int, readline(file))
comment = readline(file)
symbols = Vector{String}(undef, num_atoms)
coords = zeros(num_atoms,3)
for i in 1:num_atoms
line = readline(file)
data = split(line)
symbols[i] = data[1]
coords[i, :] = [parse(Float64, d) for d in data[2:4]]
end
close(file)
return symbols, coords, comment
end

algunion · August 16, 2023, 7:26pm

Hello @Luis_Manuel_Espinoza ,

You would make it easier for anybody willing to help if you would take a look at the following guidelines and provide a nicer code formatting.

One quick improvement can be the removal of allocations associated with: coords[i, :] = [parse(Float64, d) for d in data[2:4]] line.

A potential replacement is this:

for ii in 1:3
    coords[i, ii] = parse(Float64, data[ii+1])
end

Jeff_Emanuel · August 16, 2023, 7:27pm

Consider reading the file all at once instead of line-by-line.
It is better to use static arrays for small vectors
data[2:4] makes a copy
[parse(Float64, d) for d in data[2:4]] allocates a temporary array

stevengj · August 16, 2023, 9:26pm

There are ways to do this faster: Package to read/process lines without new allocations

But for this specific task of reading a sequence of lines containing space-delimited numbers that you want to parse, I would look at CSV.Rows in CSV.jl, or simply CSV.File with a limit=num_atoms argument.

lmiq · August 16, 2023, 10:09pm

There are also a few libraries to deal with common atom file formats, for instance: Julia interface to chemfiles · Chemfiles.jl

From their examples: Chemfiles — read and write chemistry files

using Chemfiles

file = Trajectory("filename.xyz")
frame = read(file)

println("There are $(size(frame)) atoms in the frame")
pos = positions(frame);

# Do awesome science here with the positions

if has_velocities(frame)
    vel = velocities(frame)

    # If the file contains information about the
    # velocities, you will find them here.
end

Topic		Replies	Views
Performance: read data from ascii file, replace `split` General Usage performance	13	288	November 12, 2024
Skipping a lot of lines in CSV.read() allocates too much memory Performance csv , io	77	2050	February 23, 2024
Reading tab-delimited file & memory allocation New to Julia memory-allocation , io	5	811	February 19, 2022
Making string to float conversion faster? General Usage	16	1112	March 14, 2021
Parsing a .gro file New to Julia	9	643	September 1, 2020

Is there a way to speed up this code?

Related topics