Performance: read data from ascii file, replace `split`

The solution had then 3 parts:

  1. using eachsplit as suggested in Performance: read data from ascii file, replace `split` - #2 by artemsolod

  2. using the solution provided by Mason here to parse and set the fields in a type-stable manner: Unroll setfield! - #3 by Mason

  3. Use InlineStrings as indicated in: Performance: read data from ascii file, replace `split` - #8 by artemsolod to reduce the memory footprint of the data structure being created.

The result is then very good. I can read now my 60M data objects in a minute:

julia> @time ats = readCIF("./all.cif")
106.647978 seconds (129.37 M allocations: 12.242 GiB, 14.24% gc time, 0.03% compilation time)
   Array{Atoms,1} with 64423983 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1   NP     PRO     7        1        1  171.946  588.581  135.200  1.00  0.00     0                 1
       2   HC     PRO     7        1        1  172.571  588.749  134.422  1.00  0.00     0                 2
       3   HC     PRO     7        1        1  171.019  588.890  134.923  1.00  0.00     0                 3
                                                       ⋮ 
64423981  CLA     CLA     I       50 20452615  104.220  615.013 -331.799  1.00  0.00     0          64423981
64423982  CLA     CLA     I       51 20452616  130.543  586.064 -347.000  1.00  0.00     0          64423982
64423983  CLA     CLA     I       52 20452617   87.912  628.908 -347.424  1.00  0.00     0          64423983

5 Likes