CSV won't read tab separated file

Zip the csv file and help save the planet.

1 Like

Line containing:

360G-ACW-20070816

is invalid CSV, description starts with a " and doesn’t close the quote:

     1	31127
     2	360G-ACW-20070816
     3	Shedding Skins
     4	"Shedding Skins project will generate a new body of work from the Selkie myths and stories.  Maria Hayes will spend time on Bardsy Island and various other areas in the UK mainly the Celtic countries collating information about Selkies.  There will also be some site-specific work on beaches as well as marketing and promotional material produced.  Maria will also explore more into the digital realm of mixed media and film to see whether this could be a new avenue for her.
     5	GBP
     6	4916
     7	2007-12-11
     8	2022-04-29 11:42:21.612355+00
     9	4
    10	1
    11	1
    12	22244
    13	13
    14	62
    15	5469

are the fields, notice field 4.
Disabling the quoting option might be a path forward.

After disabling quotes, the whole file parses:

CSV.read("newex.csv", DataFrame; quoted=false)
2 Likes

Thank you, Dan.

I’m not sure how you figured this out. Using limit, even with nthreads=1 doesn’t seem to get me anywhere close to the offending line.

My method, which read the file fine, retains the leading " in the resulting field. Doing it properly, your way, doesn’t.

I have a different problem.

PS
obviously, I hadn’t read all the messages :grinning:

this cycle ends without raising any errors

for i in 1:nrow(df)
    df[i,:]
    println(i)
end

This raise the error: access to undefined reference

for i in 1:nrow(df)
    df[i:i,:]
    println(i)
end
1
2
...
25890
25891
25892
25893
25894
25895
25896
25897
25898
25899
25900
25901
25902
25903
25904
25905
25906
25907
ERROR: UndefRefError: access to undefined reference
                                                                        

julia> df[25908,:]
DataFrameRow
   Row β”‚ id     identifier         title              description  currency  amount_awarde β‹―
       β”‚ Int64  String             String             String       String    String15      β‹―
───────┼────────────────────────────────────────────────────────────────────────────────────
 25908 β”‚ 25872  360G-ACW-20021275  mosaic workshops.  #undef       GBP       4520          β‹―
                                                                          11 columns omitted
      

julia> df[25908:25908,:]
ERROR: UndefRefError: access to undefined reference

another way to see where the problems lie

using  DelimitedFiles
julia> readdlm("newex.txt", '\t')
ERROR: unexpected character ' ' after quoted field at row 25909 column 4
Stacktrace: