Error reading in English pound symbol from csv file


#1

Hi all,

I’m attempting to read in a text file (.csv) using readdlm(file_path_here, ',', String), and I’m getting the following error:

at row 275, column 3 : UnicodeError: invalid character index 27265 (0xa3 is a continuation byte))
in readdlm at base\datafmt.jl:54
in #readdlm#4 at base\datafmt.jl:54 <inlined>
in readdlm at base\datafmt.jl:114
in #readdlm#8 at base\datafmt.jl:114 <inlined>
in #readdlm_auto#11 at base\datafmt.jl:134
in readdlm_string at base\datafmt.jl:343
in dlm_parse at base\datafmt.jl:610

row 275, column 3 in the associated text file contains a pound symbol (the UK currency symbol) when the file is opened in notepad++. If I delete the symbol, then the readdlm call works with no error, so it is definitely that symbol causing the issue.

Any ideas what I can do so that pound symbols don’t make my code fall over? For reference, I’m on Windows 10, Julia v0.6.2, with US keyboard, and US keyboard layout selected in Windows (I wouldn’t have thought that would make a difference, but included it just in case).

Cheers,

Colin


#2

Are you sure the CSV file is UTF-8? In Notepad++ look at the bottom of the window and it should tell you exactly what the endcoding is. I’ve had an issue where I was using normal UTF-8 characters but the encoding wasn’t what I expected. When I converted it to UTF-8 (also easy to do in Notepad++) I had no problems.

In the transition from Julia 0.5 to 0.6 (I think) there were changes made to how text files were loaded in and files that previously loaded for me (the ones with the bizarre encoding) didn’t anymore. In my case, aside from the bizarre encoding, it was a ³ causing the problems.


#3

Ah this is it! Thanks. The .csv file in question was being automatically saved by a VBA script which was using xlCSV format, which means ANSI encoding. Apparently Microsoft now have support for UTF-8 when saving via VBA (although it only came through in 2017!?!) so I guess the best solution would be to get the VBA code changed.

Cheers and thanks,

Colin


#4

You can use the StringEncodings package to decode files in non-UTF8 encodings.


#5

That is helpful to know. Thank you. -Colin