CSV.jl writing quoted strings

I am wondering when CSV.write chooses to quote a field when writing a file. I know I can use quotestrings=true to force all strings to be quoted, but in my case that seems unnecessary and undesirable.

I can read a CSV file from an external source that contains the string:

"This award from the Export Development Fund will enhance the export opportunities for British films where they have been selected to appear at important international film festivals.\r"

(note the trailing \r)

CSV.file() reads this field into a DataFrame just fine.

If I subsequently write the DataFrame to another csv file, CSV.write() does not automatically quote the field, and this results in an invalid csv file.

Am I doing something wrong, or is this a bug with CSV.jl?

Simple MWE
using CSV, DataFrames, Dates, HTTP

const url = raw"https://nationallottery.dcms.gov.uk/api/v1/grants/csv-export/"
const dataloc = [data store location]
const n = now()
const outfile = (string(Dates.year(n)) * lpad(string(Dates.month(n)), 2, '0') * lpad(string(Dates.day(n)), 2, '0') * " - DCMS grants.csv")
http_response = HTTP.get(url)

dcms = DataFrame(CSV.File(http_response.body, ntasks=1))

rename!(dcms, # rename columns to (slightly) more convenient names
        "Amount Awarded" => :Amount_Awarded,
        "Award Date" => :Award_Date,
        "Recipient Org:Identifier" => :Recipient_Org_Identifier,
        "Recipient Org:Name" => :Recipient_Org_Name,
        "Recipient Org:Ward" => :Recipient_Org_Ward,
        "Recipient Org:UK Constituency" => :Recipient_Org_UK_Constituency,
        "Recipient Org:Local Authority" => :Recipient_Org_Local_Authority,
        "Recipient Org:Region" => :Recipient_Org_Region,
        "Funding Org:Identifier" => :Funding_Org_Identifier,
        "Funding Org:Name" => :Funding_Org_Name,
        "Good Cause Area" => :Good_Cause_Area,
        "Last Modified" => :Last_Modified
 )
CSV.write(joinpath(dataloc, outfile), dcms)

CSV read will not read this newly written csv file.

Thanks!

The grants.csv file you’re downloading in the MWE seems unnecessarily large (> 100 MB). Wouldn’t you be able to create (preferably within Julia code) a .csv file of at most a couple of rows, which still demonstrates the same issue?

1 Like

OK, yes, sorry! The file is very big!

However, I don’t know how to create a csv files of extracted rows without running into this problem. Here are some rows extracted from such a problematic csv file. The problem grant has Identifier = 360G-BFI201538701 and you can see how it has been split across two rows.:

Identifier,Title,Description,Currency,Amount Awarded,Award Date,Recipient Org:Identifier,Recipient Org:Name,Recipient Org:Ward,Recipient Org:UK Constituency,Recipient Org:Local Authority,Recipient Org:Region,Funding Org:Identifier,Funding Org:Name,Good Cause Area,Last Modified360G-BFI201538772,Suffragette (a/k/a The Fury),"This award for BFI Production funding will support filmmakers to create a film that takes risks in form or content, where the more commercial sector cannot, and supports diversity of representation in terms of perspective, talent and recruitment.",GBP,1833,2015-01-28,GB-COH-03682227,Ruby Films Limited,Gospel Oak,Hampstead and Highgate,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538767,"Intercourse: the life and work of Andrea Dworkin
(aka My Name is Andrea)
(aka Fury and Tenderness)","This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,13000,2015-02-18,GB-COH-08856046 ,Salon Workshop Limited,Belsize,Hampstead and Highgate,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538753,Up A Tree in the Park at Night with a Hedgehog,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,19750,2015-02-11,GB-COH-05848118,Feet Films Limited,Grove,Hammersmith and Chiswick,Hammersmith and Fulham,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538751,Crimson China,"This award supports an emerging writer and a director who is established in their field (e.g. television, theatre, books, or a digital medium such as games, VR) and moving into narrative filmmaking.",GBP,25000,2016-03-23,GB-COH-04386242,Baby Cow Films Limited,West End,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538737,Dark Horse,This award from the BFI Distribution & Exhibition Fund will support exhibitors and distributors to show bold films to diverse audiences.,GBP,72305,2015-03-11,GB-COH-10609979,Picturehouse Entertainment Limited,Brentford East,Brentford and Isleworth,Hounslow,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538736,Scalarama [2015],"This award from BFI Audience Fund (2013-2017) will support the development of meaningful and mutually beneficial partnerships, projects and courses for UK and international organisations, educational institutions, governments and brands.",GBP,40000,2015-02-18,GB-COH-08449290,Cinema Nation C.I.C.,Borough & Bankside,Bermondsey and Old Southwark,Southwark,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538731,Versus: The Life and Films of Ken Loach,This award for BFI Documentary funding will support the production of documentary films.,GBP,300000,2015-09-16,GB-COH-09559204,SIXTEEN DOCUMENTARIES LIMITED,West End,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538723,THREE MILES NORTH OF MOLKOM,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,12000,2015-02-04,GB-COH-05040196,Blueprint Pictures Limited,West End,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538719,JOY,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,16000,2015-01-21,GB-COH-07198867,Lesata Productions Ltd,Perranporth,Camborne and Redruth,Cornwall,South West,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538716,Horkmor,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,19500,2015-02-18,GB-COH-04129959,Warp Films,Manor Castle,Sheffield Heeley,Sheffield,Yorkshire and The Humber,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538712,Irene's Ghost,This award for BFI Film Pre-Production supports producers and creative teams in crewing up and completing all pre-production activities post-script development. ,GBP,10000,2015-01-14,GB-COH-06335137,FORWARD SLASH FILMS LTD,Letchworth South East,North East Hertfordshire,North Hertfordshire,East of England,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538701,The Hallow,This award from the Export Development Fund will enhance the export opportunities for British films where they have been selected to appear at important international film festivals.
,GBP,8113,2015-01-07,GB-COH-07941942,Altitude Film Sales Limited,St James's,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201438688,Onwards and Outwards,"This award from BFI Audience Fund (2013-2017) will support the development of meaningful and mutually beneficial partnerships, projects and courses for UK and international organisations, educational institutions, governments and brands.",GBP,48800,2015-03-18,GB-CHC-236848,Institute of Contemporary Arts,St James's,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201438672,Wild Tales,This award from the BFI Distribution & Exhibition Fund will support exhibitors and distributors to show bold films to diverse audiences.,GBP,92543,2015-02-18,GB-COH-01243421,Curzon Film World Limited,Holborn & Covent Garden,Holborn and St Pancras,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
1 Like

You can use my answer here to turn a couple of lines of your csv which are problematic into a string to post here:

3 Likes

Thanks @nilshg.

julia> dcms[dcms.Identifier .∈ Ref(["360G-BFI201538772", "360G-BFI201538753", "360G-BFI201538751", "360G-BFI201538737", "360G-BFI201538736", "360G-BFI201538731", "360G-BFI201538723", "360G-BFI201538719", "360G-BFI201538716", "360G-BFI201538712", "360G-BFI201538701", "360G-BFI201438688", "360G-BFI201438672"]), :]
13Γ—16 DataFrame
 Row β”‚ Identifier         Title                              Description                        Currency  Amount Awarded  Award Date  Recipient Org:Identifier  Recipient Org:Name                 Recipient Org:Ward       Recipient Org:U β‹―
     β”‚ String             String                             String                             String3   Int64           Date        String                    String                             String                   String          β‹―
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ 360G-BFI201538772  Suffragette (a/k/a The Fury)       This award for BFI Production fu…  GBP                 1833  2015-01-28  GB-COH-03682227           Ruby Films Limited                 Gospel Oak               Hampstead and H β‹―
   2 β”‚ 360G-BFI201538753  Up A Tree in the Park at Night w…  This award from the BFI Developm…  GBP                19750  2015-02-11  GB-COH-05848118           Feet Films Limited                 Grove                    Hammersmith and  
   3 β”‚ 360G-BFI201538751  Crimson China                      This award supports an emerging …  GBP                25000  2016-03-23  GB-COH-04386242           Baby Cow Films Limited             West End                 Cities of Londo  
   4 β”‚ 360G-BFI201538737  Dark Horse                         This award from the BFI Distribu…  GBP                72305  2015-03-11  GB-COH-10609979           Picturehouse Entertainment Limit…  Brentford East           Brentford and I  
   5 β”‚ 360G-BFI201538736  Scalarama [2015]                   This award from BFI Audience Fun…  GBP                40000  2015-02-18  GB-COH-08449290           Cinema Nation C.I.C.               Borough & Bankside       Bermondsey and  β‹―
   6 β”‚ 360G-BFI201538731  Versus: The Life and Films of Ke…  This award for BFI Documentary f…  GBP               300000  2015-09-16  GB-COH-09559204           SIXTEEN DOCUMENTARIES LIMITED      West End                 Cities of Londo  
   7 β”‚ 360G-BFI201538723  THREE MILES NORTH OF MOLKOM        This award from the BFI Developm…  GBP                12000  2015-02-04  GB-COH-05040196           Blueprint Pictures Limited         West End                 Cities of Londo  
   8 β”‚ 360G-BFI201538719  JOY                                This award from the BFI Developm…  GBP                16000  2015-01-21  GB-COH-07198867           Lesata Productions Ltd             Perranporth              Camborne and Re  
   9 β”‚ 360G-BFI201538716  Horkmor                            This award from the BFI Developm…  GBP                19500  2015-02-18  GB-COH-04129959           Warp Films                         Manor Castle             Sheffield Heele β‹―
  10 β”‚ 360G-BFI201538712  Irene's Ghost                      This award for BFI Film Pre-Prod…  GBP                10000  2015-01-14  GB-COH-06335137           FORWARD SLASH FILMS LTD            Letchworth South East    North East Hert  
  11 β”‚ 360G-BFI201538701  The Hallow                         This award from the Export Devel…  GBP                 8113  2015-01-07  GB-COH-07941942           Altitude Film Sales Limited        St James's               Cities of Londo  
  12 β”‚ 360G-BFI201438688  Onwards and Outwards               This award from BFI Audience Fun…  GBP                48800  2015-03-18  GB-CHC-236848             Institute of Contemporary Arts     St James's               Cities of Londo  
  13 β”‚ 360G-BFI201438672  Wild Tales                         This award from the BFI Distribu…  GBP                92543  2015-02-18  GB-COH-01243421           Curzon Film World Limited          Holborn & Covent Garden  Holborn and St  β‹―
                                                                                                                                                                                                                            7 columns omitted

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia>     string_representation = String(take!(CSV.write(io, dcms[dcms.Identifier.∈Ref(["360G-BFI201538772", "360G-BFI201538753", "360G-BFI201538751", "360G-BFI201538737", "360G-BFI201538736", "360G-BFI201538731", "360G-BFI201538723", "360G-BFI201538719", "360G-BFI201538716", "360G-BFI201538712", "360G-BFI201538701", "360G-BFI201438688", "360G-BFI201438672"]), :])))
"Identifier,Title,Description,Currency,Amount Awarded,Award Date,Recipient Org:Identifier,Recipient Org:Name,Recipient Org:Ward,Recipient Org:UK Constituency,Recipient Org:Local Authority,Recipient Org:Region,Funding Org:Identifier,Funding Org:Name,Good Cause Area,Last Modified\n360G-BFI201538772,Suffragette (a/k/a The Fury),\"This award for BFI Production funding will support filmmakers to create a film that takes risks in form or content, where the more commercial sector cannot, and supports diversity of representation in terms of perspective, talent and recruitment.\",GBP,1833,2015-01-28,GB-COH-03682227,Ruby Films Limited,Gospel Oak,Hampstead and Highgate,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06\n360G-BFI201538753,Up A Tree in the Park at Night with a Hedgehog,\"This award from the BFI " β‹― 3687 bytes β‹― "rts,2024-12-06\n360G-BFI201438688,Onwards and Outwards,\"This award from BFI Audience Fund (2013-2017) will support the development of meaningful and mutually beneficial partnerships, projects and courses for UK and international organisations, educational institutions, governments and brands.\",GBP,48800,2015-03-18,GB-CHC-236848,Institute of Contemporary Arts,St James's,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06\n360G-BFI201438672,Wild Tales,This award from the BFI Distribution & Exhibition Fund will support exhibitors and distributors to show bold films to diverse audiences.,GBP,92543,2015-02-18,GB-COH-01243421,Curzon Film World Limited,Holborn & Covent Garden,Holborn and St Pancras,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06\n"

julia>  CSV.read(IOBuffer(string_representation), DataFrame)
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 11. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 11. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 12. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 11. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 12. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 12. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
14Γ—16 DataFrame
 Row β”‚ Identifier         Title                              Description                        Currency    Amount Awarded   Award Date                   Recipient Org:Identifier  Recipient Org:Name                 Recipient Org:Ward   β‹―
     β”‚ String31?          String                             String                             String15?   String15?        String31?                    String15?                 String?                            String31?            β‹―
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ 360G-BFI201538772  Suffragette (a/k/a The Fury)       This award for BFI Production fu…  GBP         1833             2015-01-28                   GB-COH-03682227           Ruby Films Limited                 Gospel Oak           β‹―
   2 β”‚ 360G-BFI201538753  Up A Tree in the Park at Night w…  This award from the BFI Developm…  GBP         19750            2015-02-11                   GB-COH-05848118           Feet Films Limited                 Grove
   3 β”‚ 360G-BFI201538751  Crimson China                      This award supports an emerging …  GBP         25000            2016-03-23                   GB-COH-04386242           Baby Cow Films Limited             West End
   4 β”‚ 360G-BFI201538737  Dark Horse                         This award from the BFI Distribu…  GBP         72305            2015-03-11                   GB-COH-10609979           Picturehouse Entertainment Limit…  Brentford East        
   5 β”‚ 360G-BFI201538736  Scalarama [2015]                   This award from BFI Audience Fun…  GBP         40000            2015-02-18                   GB-COH-08449290           Cinema Nation C.I.C.               Borough & Bankside   β‹―
   6 β”‚ 360G-BFI201538731  Versus: The Life and Films of Ke…  This award for BFI Documentary f…  GBP         300000           2015-09-16                   GB-COH-09559204           SIXTEEN DOCUMENTARIES LIMITED      West End
   7 β”‚ 360G-BFI201538723  THREE MILES NORTH OF MOLKOM        This award from the BFI Developm…  GBP         12000            2015-02-04                   GB-COH-05040196           Blueprint Pictures Limited         West End
   8 β”‚ 360G-BFI201538719  JOY                                This award from the BFI Developm…  GBP         16000            2015-01-21                   GB-COH-07198867           Lesata Productions Ltd             Perranporth
   9 β”‚ 360G-BFI201538716  Horkmor                            This award from the BFI Developm…  GBP         19500            2015-02-18                   GB-COH-04129959           Warp Films                         Manor Castle         β‹―
  10 β”‚ 360G-BFI201538712  Irene's Ghost                      This award for BFI Film Pre-Prod…  GBP         10000            2015-01-14                   GB-COH-06335137           FORWARD SLASH FILMS LTD            Letchworth South Eas  
  11 β”‚ 360G-BFI201538701  The Hallow                         This award from the Export Devel…  missing     missing          missing                      missing                   missing                            missing
  12 β”‚ missing            GBP                                8113                               2015-01-07  GB-COH-07941942  Altitude Film Sales Limited  St James's                Cities of London and Westminster   Westminster
  13 β”‚ 360G-BFI201438688  Onwards and Outwards               This award from BFI Audience Fun…  GBP         48800            2015-03-18                   GB-CHC-236848             Institute of Contemporary Arts     St James's           β‹―
  14 β”‚ 360G-BFI201438672  Wild Tales                         This award from the BFI Distribu…  GBP         92543            2015-02-18                   GB-COH-01243421           Curzon Film World Limited          Holborn & Covent Gar  
                                                                                                                                                                                                                            8 columns omitted

julia>

Note that in printing string_representation here, Julia omitted β‹― 3687 bytes β‹― somewhere in the middle.

Here is is in full:

julia> println(string_representation)
Identifier,Title,Description,Currency,Amount Awarded,Award Date,Recipient Org:Identifier,Recipient Org:Name,Recipient Org:Ward,Recipient Org:UK Constituency,Recipient Org:Local Authority,Recipient Org:Region,Funding Org:Identifier,Funding Org:Name,Good Cause Area,Last Modified
360G-BFI201538772,Suffragette (a/k/a The Fury),"This award for BFI Production funding will support filmmakers to create a film that takes risks in form or content, where the more commercial sector cannot, and supports diversity of representation in terms of perspective, talent and recruitment.",GBP,1833,2015-01-28,GB-COH-03682227,Ruby Films Limited,Gospel Oak,Hampstead and Highgate,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538753,Up A Tree in the Park at Night with a Hedgehog,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,19750,2015-02-11,GB-COH-05848118,Feet Films Limited,Grove,Hammersmith and Chiswick,Hammersmith and Fulham,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538751,Crimson China,"This award supports an emerging writer and a director who is established in their field (e.g. television, theatre, books, or a digital medium such as games, VR) and moving into narrative filmmaking.",GBP,25000,2016-03-23,GB-COH-04386242,Baby Cow Films Limited,West End,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538737,Dark Horse,This award from the BFI Distribution & Exhibition Fund will support exhibitors and distributors to show bold films to diverse audiences.,GBP,72305,2015-03-11,GB-COH-10609979,Picturehouse Entertainment Limited,Brentford East,Brentford and Isleworth,Hounslow,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538736,Scalarama [2015],"This award from BFI Audience Fund (2013-2017) will support the development of meaningful and mutually beneficial partnerships, projects and courses for UK and international organisations, educational institutions, governments and brands.",GBP,40000,2015-02-18,GB-COH-08449290,Cinema Nation C.I.C.,Borough & Bankside,Bermondsey and Old Southwark,Southwark,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538731,Versus: The Life and Films of Ken Loach,This award for BFI Documentary funding will support the production of documentary films.,GBP,300000,2015-09-16,GB-COH-09559204,SIXTEEN DOCUMENTARIES LIMITED,West End,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538723,THREE MILES NORTH OF MOLKOM,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,12000,2015-02-04,GB-COH-05040196,Blueprint Pictures Limited,West End,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538719,JOY,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,16000,2015-01-21,GB-COH-07198867,Lesata Productions Ltd,Perranporth,Camborne and Redruth,Cornwall,South West,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538716,Horkmor,"This award from the BFI Development Fund supports emerging writers, producers and directors to develop feature films concepts in preparation for production.",GBP,19500,2015-02-18,GB-COH-04129959,Warp Films,Manor Castle,Sheffield Heeley,Sheffield,Yorkshire and The Humber,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201538712,Irene's Ghost,This award for BFI Film Pre-Production supports producers and creative teams in crewing up and completing all pre-production activities post-script development. ,GBP,10000,2015-01-14,GB-COH-06335137,FORWARD SLASH FILMS LTD,Letchworth South East,North East Hertfordshire,North Hertfordshire,East of England,GB-CHC-287780,British Film Institute,Arts,2024-12-06
,GBP,8113,2015-01-07,GB-COH-07941942,Altitude Film Sales Limited,St James's,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06t international film festivals.
360G-BFI201438688,Onwards and Outwards,"This award from BFI Audience Fund (2013-2017) will support the development of meaningful and mutually beneficial partnerships, projects and courses for UK and international organisations, educational institutions, governments and brands.",GBP,48800,2015-03-18,GB-CHC-236848,Institute of Contemporary Arts,St James's,Cities of London and Westminster,Westminster,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06
360G-BFI201438672,Wild Tales,This award from the BFI Distribution & Exhibition Fund will support exhibitors and distributors to show bold films to diverse audiences.,GBP,92543,2015-02-18,GB-COH-01243421,Curzon Film World Limited,Holborn & Covent Garden,Holborn and St Pancras,Camden,London,GB-CHC-287780,British Film Institute,Arts,2024-12-06

Edit: Curiously, this print out of string_representation silently omits the first half of the 360G-BFI201538701 grant record. I don’t know why. It is clearly there in string_representation itself, because it is present in the DataFrame read back from there.

If I save this output (but with an extra newline after Last modified in the header) as a .csv file, then loading, renaming, saving and loading the new file works without any issues (, though the row corresponding to β€œThe Hallow” is incorrectly split across two entries).

Possibly relevant, what OS are you using? (I’m on Windows 10.)

I’m on Win 11.
(I added the header line manually, so the missing newline there is my fault).

Also with the full file (286 MB), I don’t run into any issues. Can you clarify how you read the .csv file back in (CSV.File, CSV.read, …)? Can you also post your full versioninfo and status?

I just re-read the file like this:

   dcms = DataFrame(CSV.File(joinpath(pwd(), "DCMS Grants Database", "20241219 - DCMS grants.csv"), ntasks=1))

I’m using ntasks=1 because of this issue.

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 Γ— 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 4 default, 0 interactive, 2 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4
(DCMS Database) pkg> st
Status `C:\Users\TGebbels\...\Documents\DCMS Database\Project.toml`
  [336ed68f] CSV v0.10.15
  [a93c6f00] DataFrames v1.7.0
βŒƒ [cd3eb016] HTTP v1.10.10
  [0f8b85d8] JSON3 v1.14.1
  [144e516e] PostcodeToolKit v0.1.0 `C:\Users\TGebbels\...\Documents\Julia\Packages\PostcodeToolKit`
  [08abe8d2] PrettyTables v2.4.0
βŒƒ [2913bbd2] StatsBase v0.34.3
  [69024149] StringEncodings v0.3.7
  [5c2747f8] URIs v1.5.1
  [49080126] ZipArchives v2.4.0
Info Packages marked with βŒƒ have new versions available and may be upgradable.

I’m using Julia 1.11.2

Also DataFrame(CSV.File("file.csv", ntasks=1)) works fine for me, as does CSV.read("file.csv", ntasks=1, DataFrame). This is on both Julia 1.10.4, CSV.jl v0.10.14, and Julia 1.11.1, CSV.jl v0.10.15.

I’ afraid I’m not so lucky:

julia> dcms = CSV.read(joinpath(pwd(), "DCMS Grants Database", "20241219 - DCMS grants.csv"), ntasks=1, DataFrame)
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 468833. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 468833. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 468834. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 468833. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 468834. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 468834. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469017. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469018. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469019. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469020. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469021. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469022. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469090. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469091. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469092. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469093. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469094. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 468833. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 468834. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469017. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469018. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469019. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469020. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469021. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469022. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469090. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469091. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469092. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469093. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469094. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469095. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469095. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469104. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469105. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469137. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469138. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 3 / 16 columns around data row: 469147. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592
β”Œ Warning: thread = 1 warning: only found 14 / 16 columns around data row: 469148. Filling remaining columns with `missing`
β”” @ CSV C:\Users\TGebbels\.julia\packages\CSV\XLcqT\src\file.jl:592

and looking at one of the problematic records:

julia> dcms[469137:469138, :]
2Γ—16 DataFrame
 Row β”‚ Identifier         Title                   Description                        Currency    Amount Awarded   Award Date            Recipient Org:Identifier  Recipient Org:Name      Recipient Org:Ward  Recipient Org:UK Constituency β‹―
     β”‚ String?            String                  String                             String15?   String15?        String?               String?                   String?                 String?             String?                       β‹―
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ 360G-BFI201335732  ONLY LOVERS LEFT ALIVE  This award from the Export Devel…  missing     missing          missing               missing                   missing                 missing             missing                       β‹―
   2 β”‚ missing            GBP                     14595                              2013-05-15  GB-COH-03663618  HanWay Films Limited  Holborn & Covent Garden   Holborn and St Pancras  Camden              London
                                                                                                                                                                                                                            6 columns omitted

Oh, my bad. I was expecting a hard error. In that case, I am able to replicate the problem, on both versions of Julia and CSV.jl :slight_smile:.

1 Like

Thanks for your help @eldee.

I’ve submitted the following against CSV.jl:
Not quoting strings automatically when required Β· Issue #1153 Β· JuliaData/CSV.jl

The fall back, which does seem to work, is simply to use quotestrings=true.

1 Like

As a MWE you can then simply use

using CSV, DataFrames

open("mwe.csv", "w") do file
    write(file, "\"a\r\", 1\n\"b\r\", 2")
end

df = CSV.read("mwe.csv", header=false, DataFrame)
CSV.write("mwe2.csv", df, header=false)
df2 = CSV.read("mwe2.csv", header=false, DataFrame)

Replacing the \r by \n works fine, due to the check function for quoting and escaping:

function check(bytes, sz, delim::UInt8, oq, cq, newline::UInt8)
    isempty(bytes) && return false, false
    needtoescape = false
    @inbounds needtoquote = bytes[1] == oq
    @simd for i = 1:sz
        @inbounds b = bytes[i]
        needtoquote |= (b == delim) | (b == newline)
        needtoescape |= b == cq
    end
    return needtoescape, needtoquote
end

Perhaps b == UInt8('\r') (0x0d) can also be added here when newline == UInt8('\n') (0x0a)?

2 Likes

For a CSV reader/writer robust in edgecases like that, you may try duckdb – convenient to use from Julia though QuackIO.jl.
Just do tbl = read_csv(StructArray, "mwe.csv") and write_table("mwe2.csv", tbl):