I also got error at 104000-th line.
If I simplify it then problem is next:
#this is good!
julia> CSV.readsplitline(IOBuffer("104652,\"Thanks \\ \",a"))
3-element Array{CSV.RawField,1}:
CSV.RawField("104652", false)
CSV.RawField("Thanks \\ ", true)
CSV.RawField("a", false)
#this is suspicios (I think that it is wrong if we like to interpret python's output)
julia> CSV.readsplitline(IOBuffer("104652,Thanks \\,a"))
2-element Array{CSV.RawField,1}:
CSV.RawField("104652", false)
CSV.RawField("Thanks \\,a", false)
# this one end with error
julia> CSV.readsplitline(IOBuffer("104652,\"Thanks \\\",a"))
ERROR: CSV.CSVError("EOF while trying to read the closing quote")
Stacktrace:
[1] readsplitline!(::Array{CSV.RawField,1}, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::UInt8, ::UInt8, ::UInt8, ::Base.AbstractIOBuffer{Array{UInt8,1}}) at /home/palo/.julia/v0.6/CSV/src/io.jl:114
[2] readsplitline(::Base.AbstractIOBuffer{Array{UInt8,1}}) at /home/palo/.julia/v0.6/CSV/src/io.jl:124
So it seems that escaping hack escape_double_quote
is not enough. We have to escape backspace before quote as well.
Next worked without error (I read and split all rows):
julia> escape_double_quote(s::String) = replace(s, "\"\"", "\\\"");
julia> escape_back_quote(s::String) = replace(s, "\\\"", "\\\\\"");
julia> esca(s::String) = escape_double_quote(escape_back_quote(s));
julia> f = open("output_bitcointalk_unicode.csv?dl=0");
julia> i=0;it="";spl=[];for i in 1:2_000_000 it=readline(f); spl=CSV.readsplitline(IOBuffer(esca(it))); eof(f) && break; end
julia> i
1063934
Warning! I am not sure that you will get good data using this escaping hack!