Encoding, CSV.read / write

zhangliye · August 16, 2018, 8:26am

I have a large number of files and the encoding is ISO-8859-1. There are errors when I use CSV.read() to read the file. And the error is as follows.

error: CSV.ParsingException(“Unexpected start of quote (34), use "9234" to type "34"”)

Ubuntu, 64 bit
Julia 0.6.4
CSV 0.2.5
DataFrames 0.11.7

Is there any way to define the encoding in CSV.read() ?
Thank you!

quinnj · August 16, 2018, 1:08pm

Currently, CSV doesn’t allow non-ascii/UTF8 encodings. But StringEncodings.jl allows for easy conversion between formats.

zhangliye · August 16, 2018, 2:01pm

My friend use Python and Pandas, and can open the files without any error. I have over millions of small files to process.
It would make the user experience good, if CSV.jl can support the non-ascii/UTF8 code. Encoding problem is very commend in using a programming language.

malmaud · August 16, 2018, 4:15pm

Pandas.jl can do this.

zhangliye · August 17, 2018, 12:59am

Thanks. It is the last solution if Julia cannot do it.

zhangliye · August 17, 2018, 1:57am

I have solved the problem, which is not caused by the encode. In fact this caused by the default quotechar.

vmim_timestamp,Id,ais_imo,ais_rate_of_turn,ais_nav_status,ais_speed_over_ground,ptms_type,ais_course_over_ground,ais_cargo_ship_type,ais_destination,infolink_voyage_id,mmsi,ptms_destination_name,speed,course,heading,ais_longitude,ais_latitude,ais_true_heading,ais_eta,latitude,longitude,ais_timestamp,latitude_degrees,longitude_degrees,delta_course
2017-09-11 00:00:03,9574166392,7823475,0.0,8.0,0.0,TU,3.93921,33,“”“TTP”“”“F2"”“,54652,564887000,APTP1,0.0,0.0,48.999,62170202.0,755187.0,0.8552113334769998,1528682400,0.0219674998105,1.80845787897,2017-09-11 00:00:54.000,1.2586450256,103.61700389200001,0.0
2017-09-11 00:00:04,9574166392,7823475,0.0,8.0,0.0,TU,3.93921,33,”““TTP””““F2"””,54652,564887000,APTP1,0.0,0.0,48.999,62170202.0,755187.0,0.8552113334769998,1528682400,0.0219674998105,1.80845787897,2017-09-11 00:00:54.000,1.2586450256,103.61700389200001,0.0

There is one column, [“”“TTP”“”“F2"”"], which can cause problem.

I solved it by changing the quotechar which may not be used in the field value.

Regards!

Topic		Replies	Views
Unicode related error when reading a .csv General Usage	21	6139	November 25, 2017
Read special characters using CSV.read New to Julia csv	22	1055	October 11, 2023
Problem related to CSV.read() and strings General Usage	8	2945	March 11, 2018
CSV.jl writing quoted strings General Usage question , csv	14	96	December 19, 2024
Fatal error while reading in messy data using DataFrames, CSV Data dataframes , csv	6	629	May 25, 2021

Encoding, CSV.read / write

Related topics