After browsing through the discussions about reading fixed-width files in Julia (e.g. here and here) I still hadn’t found a solution that was general enough for my case. I wrote a quick one I’d like to share here.
Suppose you have some fixed-width data like this:
A B C 12345 SOME VERY LONG STRING T 23456 ANOTHER VERY LONG STRING T
First, initialize a
DataFrame to receive the data.
using DataFrames df = DataFrame(A = String, B = String, C = String)
Then define the ranges of each of the columns.
ranges = ((1,7), (8,36), (37,38))
You can then pass them to a function like this that reads the individual lines, extracts the data from the column and structures it into a vector, then appends that vector to the
import Base.Iterators: peel function readfwf!(source, df, ranges) lines = readlines(source) (names, lines) = peel(lines) # skip first line for row in lines data = String for r in ranges push!(data, strip(SubString(row, r:r))) end push!(df, data) end end
There’s obviously no parsing the input strings to convert them to other data types, but that could be added in easily enough. Using this code I was able to construct a
DataFrame with 4 columns and over 5 million rows in ~1.7 seconds.
Until this functionality gets added to
DelimitedFiles or CSV.jl I hope some members of the community find this useful.