To concretize some on my earlier suggestions, here is code to print lines from x to y:
using Mmap
function process_file(in_fn, startline, stopline)
f = open(in_fn, "r")
mm = Mmap.mmap(f, Vector{UInt8})
l = 1
pos = 0
while l < startline
pos = last(findnext([UInt8('\n')], mm, pos+1))
l += 1
end
startpos = pos
while l <= stopline
pos = last(findnext([UInt8('\n')], mm, pos+1))
l += 1
end
stoppos = pos
write(stdout, @view mm[startpos:stoppos])
close(f)
end
Using this code, to replicate the lines from the first post:
julia> @time process_file("big_file.txt", 1501245,1501694)
20200320201743 29204.58 5900.45 -18887.11
:
:
20200320201744 29204.59 5900.47 -18887.08
0.036912 seconds (1.50 M allocations: 91.657 MiB, 10.68% gc time)
which takes 37ms vs 59ms of the sed
command:
$ time sed -n '1501245,1501694p;1501695q' big_file.txt
:
real 0m0.059s
As usual, different methods converge to same orders of magnitude.
UPDATE: A more Julian version of the above function (taking the same amount of time) returns a vector of line strings:
function process_file(in_fn, startline, stopline)
open(in_fn, "r") do f
mm = Mmap.mmap(f, Vector{UInt8})
skipline = let mm = mm
(pos, l) -> last(findnext([UInt8('\n')], mm, pos+1))
end
startpos = foldl(skipline, 1:startline; init=1)+1
stoppos = foldl(skipline, startline:stopline; init=startpos)
return readlines(IOBuffer(@view mm[startpos:stoppos]))
end
end