Nerd Sniping: How can I improve the read in of a human-readable file?

jwtkeeble · December 12, 2025, 2:37pm

Hi All,

I’m currently trying to read in a huge human-readable file, which can take up to 50% of my total walltime when running a job. I was wondering if there’s a more efficient way to read in such a file?

Naturally, I could always pre-process the file to read in the file and write it to another format, which is more efficient. However, due to the large number of files I have, this is not practical.

The layout of the file is as follows,

L lines of up to 10 integers per line. For example, I might have 15 integers over 2 lines of 10 and 5, respectively.
The following line contains a single floating point number
These L+1 number of lines repeat M times.

My current approach is to pack all the integers over the L lines into a single UInt64 and represent the floating-point number as a Float32.

My current code is this,

p = Progress(num_lines, desc=format("Reading {:>12,d} lines", num_lines), showspeed=true)

for i in 1:num_lines
  bitstr = zero(UInt64)
  sp_idx = 1

  for j in 1:nlines
    state_line = readline(io)
    state_parts = parse.(Int8, split(state_line))

    for sp_index in state_parts
      bit_index = sp_index - 1  # Convert to 0-based index
      bitstr |= UInt64(1) << bit_index
      sp_idx += 1
      if sp_idx > num_ints
        break
      end
    end
  end

  bitstrings[i] = bitstr

  coeff_line = readline(io)
  coeffs[i] = parse.(Float32, split(coeff_line))[state_idx]
  next!(p)
end

Is there a more efficient way to do this?

Any help would be appreciated.

mbauman · December 12, 2025, 3:09pm

Put it in a function!

jwtkeeble · December 12, 2025, 3:24pm

Hi @mbauman, thanks for the quick response.

I currently have this within a function, but are you saying to make a separate standalone function to get better performance? So, I make a separate read_loop function and call this within my read_file function?

mbauman · December 12, 2025, 3:51pm

Ah, ok, that was, as you might expect, just a very quick response based on what the code you posted looked like. It’s hard to fully evaluate without the complete example — how you structure your code (and what’s global) can have a dramatic impact on performance.

jwtkeeble · December 12, 2025, 4:03pm

Not a problem, I appreciate the help!

I’ll have a look at what I can do and share any updates here!

abraemer · December 12, 2025, 4:45pm

Please post a runnable example and an example file (can be dummy data and shorter than your actual files of course). The code you posted is incomplete:

you never initialize coeffs and bitstrings
state_idx is never defined and seems to mean that the final line with the float actually contains multiple floats?
Remove the progressbar to not mess up timings with unnecessary logging (you can reintroduce that after optimization of course)
From your code I suppose that you know the structure of each ‘block’ rather well and all ‘blocks’ are similar? I.e. the number of lines with integers (called L in your post and nlines in the code) seems constant. Is the distribution of integers among these lines also the same across blocks? The number of blocks seems known as well (called ‘M’ in the text and num_lines in the code)
Advice: Try to choose more descriptive and distinguishable variable names. On first read I almost confused num_lines with nlines and sp_index with sp_idx. For num_lines I would propose num_blocks.

jwtkeeble · December 12, 2025, 7:01pm

Hi @abraemer thanks for the detailed reply.

You’re correct, I copied across the snippet and should’ve use a minimal example. I’ll write up an example and share it below. I’ll also add a function to generate a fake input file to help run the example too.

Benny · December 12, 2025, 8:24pm

How confident are you that this is where you should improve efficiency, as opposed to the variable-width reads and String parsing? Have you profiled these parts?

abraemer · December 12, 2025, 9:11pm

Actually I am also confused how that even works because you can only store numbers 1-64 like that but a Int8 goes up until 127. So how are these integers in the file constrained?

jwtkeeble · December 12, 2025, 10:21pm

As promised, here’s a minimal reproducible example.

The data is sorted, random integers (up to a maximum of 40). I pack all the integers within a block into a single UInt64 as I need to do lots of bitwise operators. Then the float is just stored at the corresponding index in the coefficients array.

The idea is to speed up how I currently read in the data. The current approach reads in the integers within the block to a single UInt64 for efficient representation. The float is just stored within an array. So, within the bitstrings and coefficients arrays, the i-th index represents the i-th block in the original data file.

Here’s the minimal reproducible example,

using StatsBase
using ProgressMeter
using Format

"""
Generates fake data with `num_blocks` blocks.
Each block contains `num_integers` integers and 1 float.

The data is 1-based index (ranged from 1 to 40, inclusively)
"""
function generate_fake_data(
    filename::String,
    num_integers::Int64,
    num_blocks::Int64
)
    open(filename, "w") do io
        for line in 1:num_blocks
            all_integers = sample(1:40, num_integers; replace = false)
            all_integers = sort(all_integers)
            for idx in 1:num_integers
                print(io, all_integers[idx], " ")
                if idx % 10 == 0
                    println(io)
                end
            end 
            if num_integers % 10 != 0
                println(io)
            end
            println(io, rand())
        end 
    end
end 

function read_fake_data(
    filename::String,
    num_integers::Int64,
    num_blocks::Int64
)

    if !isfile(filename)
        error("File not found: $filename")
    end
       
    (bitstrings, coefficients) = open(filename, "r") do io

        p = Progress(num_blocks, desc=format("Reading {:>12,d} blocks...", num_blocks), showspeed=true)

        # We'll pack all integers into a UInt64 (i.e. a bitstring)
        bitstrings = zeros(UInt64, num_blocks)
        coefficients = zeros(Float64, num_blocks)
        
        num_lines_per_block = ceil(Int, num_integers / 10)

        for i in 1:num_blocks
            bitstring = zero(UInt64) # Initialize `bitstring` to store integers in a block
            bit_idx = 1 # starting index for packing bits

            for j in 1:num_lines_per_block
                line = readline(io)
                parsed_line = parse.(Int8, split(line))
                
                for integer_index in parsed_line
                    bit_index = integer_index - 1  # Convert to 0-based index
                    bitstring |= UInt64(1) << bit_index
                    bit_idx += 1
                    if bit_idx > num_integers
                        break
                    end
                end
            end

            bitstrings[i] = bitstring

            coeff_line = readline(io)
            coefficients[i] = parse.(Float64, split(coeff_line))[1]
            next!(p)
        end

        return (bitstrings, coefficients)
    end

    return bitstrings, coefficients
    
end

filename = "fake_data.txt"
num_integers = 22 # number of integers per block
num_blocks = 1_000_000 # total number of blocks

generate_fake_data(filename, num_integers, num_blocks)

bitstrings, coefficients = read_fake_data(filename, num_integers, num_blocks)

print(length(bitstrings), length(coefficients))

I appreciate the help

jwtkeeble · December 12, 2025, 10:24pm

That’s a good point. If the bottleneck is mainly due to String parsing, then I probably can’t do much more? Unless I read in the data and write it to a more efficient format?

gdalle · December 13, 2025, 8:47am

The only way to know is to profile your code, eg with @profview inside VSCode’s Julia REPL, to see where it spends the most time

Benny · December 13, 2025, 10:35am

Assuming that profiling backs this up, you can dodge String instantiation and processing by reading characters/bytes into a preallocated buffer with copyline/copyuntil, and non-allocating string-like objects can be made with StringViews.jl. You said human-readable, integers, and floating point, so I’d infer that means a text file with 0-9, ' ', '.', '\n', '\r', all ASCII and simpler to handle than the wider UTF-8. There’s going to be a limit to the read efficiency because of parse and the variable-width data on a couple levels; a binary format could make some of that easier to read, but it may not be worth making unreadable copies with even more assumptions (one very underrated reason is that text is decimal and arbitrary precision, so specifying most data types forces approximations and limits).

kevbonham · December 13, 2025, 12:20pm

You can take a look at Automa.jl, which can be used to generate very efficient parsers from (a constrained set of) regular expressions.

Jake · December 13, 2025, 2:28pm

I have not looked under the hood, but am wondering if @scanf() might be more efficient? But only if the format is consistent.

jakobnissen · December 14, 2025, 12:48pm

Fun solution. To show off my packages MemoryViews.jl and BufferIO.jl, I wrote a solution using them. On my computer, it’s 8 times faster than your code and parses the 83 MB example file in 160 miliseconds.

You can find the code here: play/ParseChallenge at master · jakobnissen/play · GitHub

Some notes and limitations with the current code

This solution uses what is currently BufferIO internals of how line_views work. However, I’ve wanted to document the specific behaviour for some time, so I’ll make a release that guarantees this behaviour.
Error handling is not great. Production code would have better error messages and a dedicated exception type.
The parsing is fairly strict. It requires
- It requires you to know the number of blocks, and number of lines of integers per block
- An integer must be exactly the regex [0-9]{1-2}, and be < 41
- An integer line must be 0-10 integers, separated by one space, with one optional trailing space, nothing less, nothing more

It could still be improved a little, by SIMD’ing the integer parsing.

Benny · December 14, 2025, 5:23pm

I’m a little lost on the conditional repeat of n = update_small_int(0x00, @inbounds(mem[1])) with unchanged inputs:

# Must be exactly 1 or 2 bytes of 0-9, and at most 40
function parse_integer(mem)
    in(length(mem), 1:2) || error("Integer must be 1 or 2 bytes")
    n = update_small_int(0x00, @inbounds(mem[1]))
    if length(mem) == 2
        n = update_small_int(0x00, @inbounds(mem[1]))
    end
    n > UInt8(40) && error("Found integer > 40")
    return unsafe_smallint(n)
end

function update_small_int(small::UInt8, byte::UInt8)
    byte -= 0x30
    byte > 0x09 && error("Invalid digit")
    return small * 0x0a + byte
end

struct SmallInt
    # Must be in 0:40
    x::UInt8

    global unsafe_smallint(x::UInt8) = new(x)
end

For example, only checking mem[1] means that many integers are parsed to the same thing:

julia> parse_integer(UInt8.(collect("22")))
SmallInt(0x02)

julia> parse_integer(UInt8.(collect("29")))
SmallInt(0x02)

julia> parse(UInt8, "22")
0x16

julia> parse(UInt8, "29")
0x1d

Now it occurs to me it wasn’t made clear we were dealing with positive integers either, the original code used Int8 parsing, and “up to 40” is only an upper bound.

jakobnissen · December 14, 2025, 7:30pm

Sorry, it should be mem[1], then mem[2]. To handle negative numbers, parse_integer would need a small modification.

I’ve changed the code.

Topic		Replies	Views
Somewhat faster text/numeric io General Usage	1	895	September 21, 2017
Reading multiple integers from the same line without string manipulation General Usage question , io , file	1	613	July 30, 2021
Julia 3 times slower than Fortran reading integer data from ASCII file Performance fortran , performance , io	14	1318	March 26, 2022
Skipping a lot of lines in CSV.read() allocates too much memory Performance csv , io	77	2380	February 23, 2024
Making string to float conversion faster? General Usage	16	1193	March 14, 2021

Nerd Sniping: How can I improve the read in of a human-readable file?

Related topics