I have a file containing several multi-line blocks of text separate by blank lines. A blank line may contain just a line break or also some space. Is there something similar to readline()
which allows me to read in these blocks one by one, each as a string?
I am not aware of a direct way to do this. But you could readline()
anyway. To separate your blocks (i.e., to detect blank lines) you could use something like that:
f = open( "foo_data.txt" )
lines = readlines(f)
for l in lines
if isempty( filter(x -> !isspace(x), l) )
println("found blank line")
else
#do something with the read data
end
end
close(f)
The else
statement (and/or the loop) needs to be adapted depending on your desired final data format.
For example:
function readparagraph(io)
buf = IOBuffer()
while !eof(io)
line = readline(io; keep=true)
all(isspace, line) && break
print(buf, line)
end
return String(take!(buf))
end
Thanks, I’ll use it!
A question about the last line of your function: is it possible to convert Vector{Char}
to String
without copying the data and allocating memory, similar to how take!
constructs Vector{Char}
without copying? I guess that’s not possible in Julia because the vector is mutable and the string is immutable?
String(take!(buf))
already constructs the string without making a copy — from a Vector{UInt}
in the UTF-8 encoding, not a Vector{Char}
.
(Vector{Char}
requires 4 bytes per character, similar to UTF-32, which is different from the encoding used by String
, and is not generally a recommended way to store strings.)
More generally, it is possible to construct a String(vec)
from a vec::Vector{UInt8}
without making a copy of vec
, but only if the Vector{UInt8}
is specially allocated — this special allocation is done by IOBuffer
objects and also by read(io, numbytes)
as documented in the String
docstring, but can also be accomplished using the undocumented vec = Base.StringVector(numbytes)
constructor. See also Document/export copy-free string allocation? · Issue #19945 · JuliaLang/julia · GitHub and Conversion of Vector{UInt8} to String without copy
You can also use StringViews.jl to create a String
-like object (another subtype of AbstractString
) from an AbstractVector{UInt8}
(e.g. a subarray) without making a copy.
(In principle, you could make it even more efficient than this by using lower-level APIs to read bytes directly in to the IOBuffer
without allocating intermediate string objects via readline
, but it’s probably not worth the effort. Another alternative would be to use mmap
to access the file as an array of bytes, which you could then scan for ASCII newline and whitespace characters. You could then use the StringViews.jl package to create string-like views of the mmap
-ed data without making a copy. There are lots of ways to wring more speed out of Julia if you care enough.)