In the following example, the input text file contains numeric data spread over several rows.
Assuming that the number of columns (=7) is known and that all input data can be read as Float64, is there a better way to read such data into a matrix?
# 1 - WRAPPED DATA INPUT
const input = """
0.000E+00
00000000 4.999E-03 8.771E-03 00001001
5.000E-04 5.087E-01
1.500E-02
00000001 2.112E-03 3.462E-03 00001002
7.000E-04 2.186E-01
3.000E-02
00000002 3.020E-03 2.212E-03 00001003
9.000E-04 3.383E-01
"""
file = "data_wrapped.txt"
write(file, input)
# 2 - READ FILE WITH WRAPPED DATA
using Scanf
function read_n_wrapped_floats(file, n)
str = repeat("%f ",n) * '\n'
fmt = Scanf.Format(str)
m = NTuple{n+1, Float64}[]
types = Vector{Float64}(undef, n)
open(file, "r") do io
while !eof(io)
push!(m, scanf(io, fmt, types...))
end
end
return hcat(collect.(m)...)'
end
n = 7
read_n_wrapped_floats(file, n)
Thanks.
Hmm, your example is confusing my simple algo to detect time columns
julia> D = gmtread("data_wrapped.txt")
Attributes: Dict("Timecol" => "3,4,6,7")
BoundingBox: [0.0, 0.03, 0.0, 2.0, 0.002112, 0.004999, 0.002212, 0.008771, 1001.0, 1003.0, 0.0005, 0.0009, 0.2186, 0.5087] 3Γ7 GMTdataset{Float64, 2}
Row β col.1 col.2 Time Time2 col.5 Time3 Time4
β Float64 Float64 Float64 Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.0 0.0 1970-01-01T00:00:00.005 1970-01-01T00:00:00.009 1001.0 0.0005 0.5087
2 β 0.015 1.0 1970-01-01T00:00:00.002 1970-01-01T00:00:00.003 1002.0 0.0007 0.2186
3 β 0.03 2.0 1970-01-01T00:00:00.003 1970-01-01T00:00:00.002 1003.0 0.0009 0.3383
but not fatal
julia> D.data
3Γ7 Matrix{Float64}:
0.0 0.0 0.004999 0.008771 1001.0 0.0005 0.5087
0.015 1.0 0.002112 0.003462 1002.0 0.0007 0.2186
0.03 2.0 0.00302 0.002212 1003.0 0.0009 0.3383
Good enough?
Thanks Joaquim.
In Julia 1.8.0, using GMT v0.43.1, it throws an error:
ERROR
gmtread [WARNING]: Mismatch between actual (3) and expected (4) fields near line 2 in file
gmtread [ERROR]: Mismatch between actual (3) and expected (4) fields near line 3 in file data_wrapped.txt
ERROR: Failed to read file "data_wrapped.txt"
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:35
[2] gmtread(fname::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ GMT C:\Users\..\.julia\packages\GMT\mzT4h\src\gmtreadwrite.jl:149
[3] gmtread(fname::String)
@ GMT C:\Users\..\.julia\packages\GMT\mzT4h\src\gmtreadwrite.jl:68
[4] top-level scope
@ REPL[19]:1
Ultimately, what Julia functions is GMT calling to read such wrapped files?
This new function performs better than previous one (which was using Scanf):
function read_n_wrapped_floats2(file, n)
m = Vector{Float64}[]
open(file, "r") do io
while !eof(io)
k = 0
m1 = Float64[]
while k < n
x = parse.(Float64, split(readline(io)))
push!(m1, x...)
k += length(x)
end
push!(m, m1)
end
end
return reduce(hcat, m)'
end
That error means the file has first row with 3 columns and the second with 4 and that is not accepted in gmtread
where all rows must have the same number of columns. gmtread
reads the file in the C side and doesnβt use a Julia function for that.
1 Like
Okay, thank you. So thatβs not whatβs asked. In fact, there are 7 data points from 7 different curves spread over several lines (with carriage returns), and the pattern repeats for the next 7 samples from each of the 7 curves, etc.
NB: I have edited the original post to have each batch of 7 samples spreading over 3 rows in the input text file, but it could be other number.
function read_n_per_row(file, n)
dat = Float64[]
for line in eachline(file), word in split(line)
push!(dat, parse(Float64, word))
end
l = length(dat)
iszero(l % n) || error("Number of file entries ", l, " not divisible by ", n)
m = l Γ· n
return transpose(reshape(dat, (n, m)))
end
using BenchmarkTools
@btime read_n_wrapped_floats2($file, $n); # 8.074 ΞΌs (87 allocations: 5.48 KiB)
@btime read_n_per_row($file, $n) # 6.747 ΞΌs (43 allocations: 4.07 KiB)
read_n_wrapped_floats2(file, n) == read_n_per_row(file, n) # true
1 Like
Thank you Peter for such a nice, clear and efficient solution. A pleasure to read.
1 Like