Is there a way to speed up this code?

function open_xyz_file(path::AbstractString)
file = open(path)
num_atoms = parse(Int, readline(file))
comment = readline(file)
symbols = Vector{String}(undef, num_atoms)
coords = zeros(num_atoms,3)
for i in 1:num_atoms
line = readline(file)
data = split(line)
symbols[i] = data[1]
coords[i, :] = [parse(Float64, d) for d in data[2:4]]
end
close(file)
return symbols, coords, comment
end

Hello @Luis_Manuel_Espinoza ,

You would make it easier for anybody willing to help if you would take a look at the following guidelines and provide a nicer code formatting.

One quick improvement can be the removal of allocations associated with: coords[i, :] = [parse(Float64, d) for d in data[2:4]] line.

A potential replacement is this:

for ii in 1:3
    coords[i, ii] = parse(Float64, data[ii+1])
end 
2 Likes
  1. Consider reading the file all at once instead of line-by-line.
  2. It is better to use static arrays for small vectors
  3. data[2:4] makes a copy
  4. [parse(Float64, d) for d in data[2:4]] allocates a temporary array

There are ways to do this faster: Package to read/process lines without new allocations

But for this specific task of reading a sequence of lines containing space-delimited numbers that you want to parse, I would look at CSV.Rows in CSV.jl, or simply CSV.File with a limit=num_atoms argument.

1 Like

There are also a few libraries to deal with common atom file formats, for instance: Julia interface to chemfiles ยท Chemfiles.jl

From their examples: Chemfiles โ€” read and write chemistry files

using Chemfiles

file = Trajectory("filename.xyz")
frame = read(file)

println("There are $(size(frame)) atoms in the frame")
pos = positions(frame);

# Do awesome science here with the positions

if has_velocities(frame)
    vel = velocities(frame)

    # If the file contains information about the
    # velocities, you will find them here.
end
1 Like