Reading '¤' separated value file with readdlm

plapplop · January 31, 2019, 9:48am

I was given a CSV file to study, it contains strings (words enclosed with ") separated by the currency sign '¤' (don’t ask me why…), and when I used readdlm to load its content, the first quote sign is considered to belong to the separator whereas the second one is considered to belong to the actual content, here is a MWE:

f = open("foo.csv", "w")
write(f, "\"a\"¤\"b\"¤\"c\"\n")
write(f, "\"here is a content\"¤\"here is another one\"¤\"this is enough\"\n")
close(f)

using DelimitedFiles

x, h = readdlm("foo.csv", '¤'; header = true)
@show x[1,1]

I can’t find a way to parse it correctly with readdlm, any idea?
Many thanks!

mkborregaard · January 31, 2019, 9:54am

FWIW,

using CSV
h = CSV.read("foo.csv", delim =  '¤')
@show x[1,1]
  #  x[1, 1] = "here is a content"

parses the file correctly

plapplop · January 31, 2019, 10:15am

Thanks a lot! I have recently been using readdlm a lot because for some unknown reason CSV.read was extremely slow on previous files with which I was working. But it’s actually fast on these new files, so this solves my problem.

quinnj · January 31, 2019, 4:34pm

Yeah, the issue here is I don’t think readdlm supports non-ascii delimiters (lots of csv readers don’t). It’s actually newish funcitonality in CSV (as of last fall). If you ever have performance issues w/ CSV.jl, please share! Post here on discourse or open an issue at the JuliaData/CSV.jl repo and I’m happy to help figure out what’s going on.

plapplop · January 31, 2019, 4:43pm

Thanks @quinnj! I’ll try to find these old files and benchmark them with CSV.read and readdlm and, if I manage to reproduce the problems I had, post the results either here and tagging you or on the CSV repo.

Tamas_Papp · January 31, 2019, 4:51pm

If it is an old file, I would suspect it is simply latin-1 0xa4, instead of an UTF8 0xc2 0xa4. Does CSV support non-UTF8 encodings?

Frankly, I would just fix the file with tr or a similar tool to have commas, instead of extending support for these cases.

Topic		Replies	Views
DelimitedFiles reading everything in one column New to Julia	1	340	January 15, 2021
Reading a txt. file with two different delimiters General Usage csv	5	2129	May 15, 2022
Reading text data: `readdlm` is deprecated, so how CSV package is used? General Usage question , csv , io , delimitedfiles , text-data	9	806	July 17, 2024
Question about DelimitedFiles Data question , package	1	325	July 30, 2020
Error reading in English pound symbol from csv file General Usage	4	3163	January 30, 2018

Reading '¤' separated value file with readdlm

Related topics