Fastest way to parse a string of numbers

You should look into using Parsers.xparse directly, since it will avoid some overhead of using Parsers.parse. It also respects a Parsers.Options struct where you can pass the ' ' space delimiter which will be consumed automatically, as well as handling newlines. You could check out a recent example of how to do this in the upcoming PowerFlowData.jl package. Basically, Parsers.xparse is what Parsers.parse calls under the hood. It’s most efficient if you pass it a raw vector of bytes, either by calling read(filepath) or Mmap.mmap(filepath). You get back a Parsers.Result{T} object from calling Parsers.xparse which gives you a code which will signify if parsing succeeded, if a newline was encountered, etc; a val, which is the actual parsed value, and tlen, which is the total number of bytes consumed while parsing (including any delimiters). So the general usage is like:

function parsestuff(file)
    buf = read(file)
    len = length(buf)
    pos = 1
    opts = Parsers.Options(delim=' ', wh1=0x00)
    while pos <= len
        res = Parsers.xparse(Int, buf, pos, len, opts)
        if Parsers.ok(res.code)
            # parsing succeeded, do stuff with res.val
        end
        pos += res.tlen
    end
end
5 Likes
opts = Parsers.Options(delim=' ', ignorerepeated=true, wh1=0x00)
    io = IOBuffer("1.0    2.34345              7.9")
    Parsers.parse(Float64, io, opts) # returns 1.0
    Parsers.parse(Float64, io, opts) # returns 2.34345
    Parsers.parse(Float64, io, opts) # returns 7.9

This is not working with the v2.3.1 update! I will try to figure out why not. Some help is appreciated.