Reading fixed-width files?

Tamas_Papp · March 22, 2020, 12:18pm

I may misunderstand the issue you linked, but I don’t see the example where this happens in practice. It just raises the possibility.

Some formats, eg recent incarnations of Stata’s dta format, store UTF8 in fixed width fields. The way that works is that UTF8 is just considered a byte string, which is then padded/read as is. Eg "ηβπ" would take 6 bytes.

This is pretty much the only format that makes sense. “Fixed width” in characters coded in a variable-length encoding like UTF8 (which is pretty much all that should be practically relevant, even though it is easy to support just about anything else in Julia) makes no sense as it throws out all the actual advantages of fixed width.

Again, I am sure there is someone using that to store data. But it is not something a sane library would even consider supporting because it requires an entirely different approach.

aplavin · March 22, 2020, 12:22pm

While all the widths should (probably?) indeed be fixed in bytes, from a user POV it makes sense to specify these widths in terms of characters. I.e. if I open the table file in a text editor, I can only see and count characters - not bytes. And these character counts are what should be specified as column widths and positions.

bkamins · March 22, 2020, 1:35pm

The issue I have linked is just a summary of the discussion where if I recall correctly such files were occuring in practice (if I am not mistaken they were generated using COBOL on mainframes).

Anyway FWF.jl by default used byte width but it can be switched using a kwarg. I guess the simplest thing to do for someone interested in having a common FWF reader/writer is to make a PR to GitHub - RandomString123/FWF.jl: Fixed width file parsing in Julia to make it work on modern Julia.

In the long run probably having it in CSV.jl, if @quinnj would consider this, would be the best option as there is loads of parsing functionality already in CSV.jl that is vastly superior to FWF.jl.

Topic		Replies	Views
Is there no standard way to read files with fixed width columns in the new DataFrames ecosystem? Data	6	2597	November 29, 2017
Current solution for reading Fixed Width files (2024) General Usage data	3	94	August 26, 2024
Reading fixed-width files: a preliminary solution General Usage data	5	1891	June 19, 2021
Reading Fixed-Width Column Data General Usage data	6	3158	September 11, 2017
Julia Support for File Loading Data	22	3071	January 30, 2018

Reading fixed-width files?

Related topics