Error reading in English pound symbol from csv file

colintbowers · January 29, 2018, 4:25am

Hi all,

I’m attempting to read in a text file (.csv) using readdlm(file_path_here, ',', String), and I’m getting the following error:

at row 275, column 3 : UnicodeError: invalid character index 27265 (0xa3 is a continuation byte))
in readdlm at base\datafmt.jl:54
in #readdlm#4 at base\datafmt.jl:54 <inlined>
in readdlm at base\datafmt.jl:114
in #readdlm#8 at base\datafmt.jl:114 <inlined>
in #readdlm_auto#11 at base\datafmt.jl:134
in readdlm_string at base\datafmt.jl:343
in dlm_parse at base\datafmt.jl:610

row 275, column 3 in the associated text file contains a pound symbol (the UK currency symbol) when the file is opened in notepad++. If I delete the symbol, then the readdlm call works with no error, so it is definitely that symbol causing the issue.

Any ideas what I can do so that pound symbols don’t make my code fall over? For reference, I’m on Windows 10, Julia v0.6.2, with US keyboard, and US keyboard layout selected in Windows (I wouldn’t have thought that would make a difference, but included it just in case).

Cheers,

Colin

Daneel · January 29, 2018, 7:29am

Are you sure the CSV file is UTF-8? In Notepad++ look at the bottom of the window and it should tell you exactly what the endcoding is. I’ve had an issue where I was using normal UTF-8 characters but the encoding wasn’t what I expected. When I converted it to UTF-8 (also easy to do in Notepad++) I had no problems.

In the transition from Julia 0.5 to 0.6 (I think) there were changes made to how text files were loaded in and files that previously loaded for me (the ones with the bizarre encoding) didn’t anymore. In my case, aside from the bizarre encoding, it was a ³ causing the problems.

colintbowers · January 30, 2018, 12:07am

Ah this is it! Thanks. The .csv file in question was being automatically saved by a VBA script which was using xlCSV format, which means ANSI encoding. Apparently Microsoft now have support for UTF-8 when saving via VBA (although it only came through in 2017!?!) so I guess the best solution would be to get the VBA code changed.

Cheers and thanks,

Colin

nalimilan · January 30, 2018, 12:18pm

You can use the StringEncodings package to decode files in non-UTF8 encodings.

colintbowers · January 30, 2018, 9:12pm

That is helpful to know. Thank you. -Colin

Topic		Replies	Views
Unicode related error when reading a .csv General Usage	21	6138	November 25, 2017
Julia 0.6 Unicode Parsing Problem Data strings	4	1924	May 12, 2017
Reading '¤' separated value file with readdlm General Usage csv	5	602	January 31, 2019
Readdlm: ignore_invalid_chars option does not exist anymore General Usage	0	777	August 30, 2017
Read special characters using CSV.read New to Julia csv	22	1049	October 11, 2023

Error reading in English pound symbol from csv file

Related topics