After browsing through the discussions about reading fixed-width files in Julia (e.g. here and here) I still hadn’t found a solution that was general enough for my case. I wrote a quick one I’d like to share here.
Suppose you have some fixed-width data like this:
A B C
12345 SOME VERY LONG STRING T
23456 ANOTHER VERY LONG STRING T
First, initialize a DataFrame
to receive the data.
using DataFrames
df = DataFrame(A = String[], B = String[], C = String[])
Then define the ranges of each of the columns.
ranges = ((1,7), (8,36), (37,38))
You can then pass them to a function like this that reads the individual lines, extracts the data from the column and structures it into a vector, then appends that vector to the DataFrame
.
import Base.Iterators: peel
function readfwf!(source, df, ranges)
lines = readlines(source)
(names, lines) = peel(lines) # skip first line
for row in lines
data = String[]
for r in ranges
push!(data, strip(SubString(row, r[1]:r[2])))
end
push!(df, data)
end
end
There’s obviously no parsing the input strings to convert them to other data types, but that could be added in easily enough. Using this code I was able to construct a DataFrame
with 4 columns and over 5 million rows in ~1.7 seconds.
Until this functionality gets added to DelimitedFiles
or CSV.jl I hope some members of the community find this useful.