Encoding, CSV.read / write

I have a large number of files and the encoding is ISO-8859-1. There are errors when I use CSV.read() to read the file. And the error is as follows.

error: CSV.ParsingException("Unexpected start of quote (34), use “9234” to type “34"”)

Ubuntu, 64 bit
Julia 0.6.4
CSV 0.2.5
DataFrames 0.11.7

Is there any way to define the encoding in CSV.read() ?
Thank you!

1 Like

Currently, CSV doesn’t allow non-ascii/UTF8 encodings. But StringEncodings.jl allows for easy conversion between formats.

My friend use Python and Pandas, and can open the files without any error. I have over millions of small files to process.
It would make the user experience good, if CSV.jl can support the non-ascii/UTF8 code. Encoding problem is very commend in using a programming language.

Pandas.jl can do this.

Thanks. It is the last solution if Julia cannot do it.

I have solved the problem, which is not caused by the encode. In fact this caused by the default quotechar.

vmim_timestamp,Id,ais_imo,ais_rate_of_turn,ais_nav_status,ais_speed_over_ground,ptms_type,ais_course_over_ground,ais_cargo_ship_type,ais_destination,infolink_voyage_id,mmsi,ptms_destination_name,speed,course,heading,ais_longitude,ais_latitude,ais_true_heading,ais_eta,latitude,longitude,ais_timestamp,latitude_degrees,longitude_degrees,delta_course
2017-09-11 00:00:03,9574166392,7823475,0.0,8.0,0.0,TU,3.93921,33,""“TTP”""“F2"”",54652,564887000,APTP1,0.0,0.0,48.999,62170202.0,755187.0,0.8552113334769998,1528682400,0.0219674998105,1.80845787897,2017-09-11 00:00:54.000,1.2586450256,103.61700389200001,0.0
2017-09-11 00:00:04,9574166392,7823475,0.0,8.0,0.0,TU,3.93921,33,""“TTP”""“F2"”",54652,564887000,APTP1,0.0,0.0,48.999,62170202.0,755187.0,0.8552113334769998,1528682400,0.0219674998105,1.80845787897,2017-09-11 00:00:54.000,1.2586450256,103.61700389200001,0.0

There is one column, [""“TTP”""“F2"”"], which can cause problem.

I solved it by changing the quotechar which may not be used in the field value.

Regards!