I was given a CSV file to study, it contains strings (words enclosed with ") separated by the currency sign '¤' (don’t ask me why…), and when I used readdlm to load its content, the first quote sign is considered to belong to the separator whereas the second one is considered to belong to the actual content, here is a MWE:
f = open("foo.csv", "w")
write(f, "\"a\"¤\"b\"¤\"c\"\n")
write(f, "\"here is a content\"¤\"here is another one\"¤\"this is enough\"\n")
close(f)
using DelimitedFiles
x, h = readdlm("foo.csv", '¤'; header = true)
@show x[1,1]
I can’t find a way to parse it correctly with readdlm, any idea?
Many thanks!
Thanks a lot! I have recently been using readdlm a lot because for some unknown reason CSV.read was extremely slow on previous files with which I was working. But it’s actually fast on these new files, so this solves my problem.
Yeah, the issue here is I don’t think readdlm supports non-ascii delimiters (lots of csv readers don’t). It’s actually newish funcitonality in CSV (as of last fall). If you ever have performance issues w/ CSV.jl, please share! Post here on discourse or open an issue at the JuliaData/CSV.jl repo and I’m happy to help figure out what’s going on.
Thanks @quinnj! I’ll try to find these old files and benchmark them with CSV.read and readdlm and, if I manage to reproduce the problems I had, post the results either here and tagging you or on the CSV repo.