Hi @Tamas_Papp, can you elaborate on what you mean by this? I am in a similar boat as OP with my use case, the difference being that I am fairly new to Julia and programming in general, so I don’t quite understand what you’re saying (or rather, I can’t visualize what the implementation would look like). The same goes for your comment in the referenced discussion:
You can just read line by line, extract the columns using indices, then parse. The tricky part is getting the column indices, if you are lucky there is some metadata that describes that (eg US Census Bureau usually supplies such metadata).
I recently made a post about this on Reddit, the relevant part here:
With pd.read_fwf
you can set the colspecs
kwarg to an array of tuples where each tuple signifies where each column starts and ends. That’s a really nifty feature. It doesn’t seem like any current Julia package has that functionality, but I’d love to be wrong.
In other words, I think having a colspecs
-like kwarg in either CSV.jl
or TableReader.jl
would be a really nice addition to either package. Although what you’re recommending might be really simple for more advanced developers like yourself, it can be a bit intimidating for people like me. Moreover, at my job where we’re having a little tech revolution and trying to move away from Excel, most people know of Python and less so Julia. If someone was to see how easy it is to read fixed width files with Pandas vs what comes up when you Google the problem for Julia, most people might just opt for using Python for this type of use case.
For some more context, I work in insurance where (at my company at least) we rely on a decent amount of legacy technology, so fixed width formats are not uncommon. I think Julia is ideally suited for insurance/actuarial work, and it’s only a matter of time before it becomes an industry standard. However, it currently suffers from a bit of a “recognizability” problem – everyone and their mother have heard of R and Python (my mother has heard of Julia, but that’s because of me). Little niceties like colspecs
(which is very Excel/Access-like in its handling of fixed width columns) can go a long way toward driving adoption.
Now having said that I’m a novice, what I’m about to ask might seem a bit silly – how difficult would it be for someone like me to attempt to either submit a PR to Tablereader.jl
or CSV.jl
or revive FWF.jl
for v1.0 compatability? A contribution of this sort would surely help me get better at Julia, but it also seems quite beyond my current abilities. @braamvandyk also offered to package this up, so if that offer is still on the table, I can try to help in whatever way I can.