(Apologies for a long post. There is a long set-up for this issue!)
This website provides data on all the grants to good causes that the National Lottery has made since it began 30 years ago. It is possible to download this data as a CSV file by clicking the link on the web page, but it is also possible to make an HTTP get call to https://nationallottery.dcms.gov.uk/api/v1/grants/csv-export/
.
The files obtained by these two methods differ slightly. So for example:
f = open("grants-direct.csv", "r")
dcontent = read(f, String)
close(f)
f = open("grants-http.csv", "r")
hcontent = read(f, String)
close(f)
println("Direct file has length $(length(dcontent))")
println("HTTPget file has length $(length(hcontent))")
shows
Direct file has length 323619656
HTTPget file has length 322912529
The files are ordered differently, so direct comparisons arenβt straightforward. However, if I read each file line by line, sort and compare, I find they are reported to be identical:
d = [x for x in eachline("grants-direct.csv")]
h = [x for x in eachline("grants-http.csv")]
sort!(d)
sort!(h)
if (d == h) # true!
println("Two arrays ARE identical")
else
println("Two arrays are NOT identical")
end
Two arrays ARE identical
If I read the two files into a dataframe using CSV read, they also report as identical.
direct = CSV.read("grants-direct.csv", DataFrame)
HTTPget = CSV.read("grants-http.csv", DataFrame)
sh = sort(HTTPget, :Identifier)
sd = sort(direct, :Identifier)
if sd == sh # Also true!
println("Two dataframes ARE identical")
else
println("Two dataframes are NOT identical")
end
comp = sh .== sd
counter = 0
for (i, col) = enumerate(eachcol(comp)) # All columns identical!
if !all(col)
global counter += 1
println(names(comp)[i], sum(col))
end
end
if counter == 0
println("All dataframe columns ARE identical")
else
println("All dataframe columns are NOT identical")
end
counter=0
for (i, row) = enumerate(eachrow(comp)) # All rows identical
if !all(row)
global counter += 1
println("Row $i is not identical")
end
end
if counter == 0
println("All dataframe rows ARE identical")
else
println("All dataframe rows are NOT identical")
end
Two dataframes ARE identical
All dataframe columns ARE identical
All dataframe rows ARE identical
So far so good! Now the tricky part.
fixdcms!(HTTPget)
fixdcms!(direct)
CSV.write("fixeddirect.csv", direct)
test = CSV.read("fixeddirect.csv", DataFrame)
println("Reading fixeddirect.csv succeeded!")
CSV.write("sortedfixedHTTPget.csv", sort(HTTPget, :Identifier))
test = CSV.read("sortedfixedHTTPget.csv", DataFrame)
println("Reading sortedfixedHTTPget.csv succeeded!")
CSV.write("fixedHTTPget.csv", HTTPget)
test = CSV.read("fixedHTTPget.csv", DataFrame)
println("Reading fixedHTTPget.csv succeeded!")
I am applying a cumbersome set of updates to the fields that record the administrative geography where the grant was awarded. Boundaries and names change over time but the database doesnβt get updated.
fixdcms
function fixdcms!(dcms)
rename!(dcms, # rename columns to (slightly) more convenient names
"Amount Awarded" => :Amount_Awarded,
"Award Date" => :Award_Date,
"Recipient Org:Identifier" => :Recipient_Org_Identifier,
"Recipient Org:Name" => :Recipient_Org_Name,
"Recipient Org:Ward" => :Recipient_Org_Ward,
"Recipient Org:UK Constituency" => :Recipient_Org_UK_Constituency,
"Recipient Org:Local Authority" => :Recipient_Org_Local_Authority,
"Recipient Org:Region" => :Recipient_Org_Region,
"Funding Org:Identifier" => :Funding_Org_Identifier,
"Funding Org:Name" => :Funding_Org_Name,
"Good Cause Area" => :Good_Cause_Area,
"Last Modified" => :Last_Modified
)
println()
print("Removing duplicate regions... ")
@time begin
for (i, region) in enumerate(dcms.Recipient_Org_Region)
if region == "Yorkshire and Humberside"
dcms.Recipient_Org_Region[i] = "Yorkshire and The Humber"
elseif region == "Eastern"
dcms.Recipient_Org_Region[i] = "East of England"
end
end
end
println()
print("Removing duplicate Local Authorities... ")
@time begin
dcms.Recipient_Org_Local_Authority .= titlecase.(dcms.Recipient_Org_Local_Authority, wordsep=c -> occursin(c, " -/("))
for (i, LA) in enumerate(dcms.Recipient_Org_Local_Authority)
if LA in ["County Of Herefordshire", "Herefordshire County Of", "Herefordshire"]
dcms.Recipient_Org_Local_Authority[i] = "Herefordshire, County Of"
elseif LA in ["Cardiff Councik", "Cardiff Council", "Caerdydd", "Cardiff Bay"]
dcms.Recipient_Org_Local_Authority[i] = "Cardiff"
elseif LA in ["Caerphil", "C'philly", "Caer"]
dcms.Recipient_Org_Local_Authority[i] = "Caerphilly"
elseif LA in ["London Borough Of Hammersmith", "London Borough Of Hammersmith And Fulham"]
dcms.Recipient_Org_Local_Authority[i] = "Hammersmith And Fulham"
elseif LA in ["Shetland Island", "Shetland"]
dcms.Recipient_Org_Local_Authority[i] = "Shetland Islands"
elseif LA in ["Aylesbury Vale", "South Bucks", "Chiltern", "Wycombe"]
dcms.Recipient_Org_Local_Authority[i] = "Buckinghamshire"
elseif LA in ["Corby", "East Northamptonshire", "Kettering", "Wellingborough"]
dcms.Recipient_Org_Local_Authority[i] = "North Northamptonshire"
elseif LA in ["Sedgemoor", "West Somerset", "Taunton Deane", "South Somerset", "Mendip", "Somerset West And Taunton"]
dcms.Recipient_Org_Local_Authority[i] = "Somerset"
elseif LA in ["Shepway"]
dcms.Recipient_Org_Local_Authority[i] = "Folkestone and Hythe"
elseif LA in ["Forest Heath", "St Edmundsbury"]
dcms.Recipient_Org_Local_Authority[i] = "West Suffolk"
elseif LA in ["Daventry", "Northampton", "South Northamptonshire"]
dcms.Recipient_Org_Local_Authority[i] = "West Northamptonshire"
elseif LA in ["Weymouth And Portland", "West Dorset", "North Dorset", "Purbeck", "East Dorset"]
dcms.Recipient_Org_Local_Authority[i] = "Dorset"
elseif LA in ["Barrow-In-Furness", "Eden", "South Lakeland"]
dcms.Recipient_Org_Local_Authority[i] = "Westmorland And Furness"
elseif LA in ["Craven", "Hambleton", "Harrogate", "Richmondshire", "Ryedale", "Scarborough", "Selby"]
dcms.Recipient_Org_Local_Authority[i] = "North Yorkshire"
elseif LA in ["Bristol City Of", "Bristol", "City Of Bristol"]
dcms.Recipient_Org_Local_Authority[i] = "Bristol, City Of"
elseif LA in ["North Down And Ards", "Ards"]
dcms.Recipient_Org_Local_Authority[i] = "Ards And North Down"
elseif LA in ["Abertawe"]
dcms.Recipient_Org_Local_Authority[i] = "Swansea"
elseif LA in ["Antrim", "Newtownabbey"]
dcms.Recipient_Org_Local_Authority[i] = "Antrim And Newtownabbey"
elseif LA in ["Neath & Port Talbot"]
dcms.Recipient_Org_Local_Authority[i] = "Neath Port Talbot"
elseif LA in ["Allerdale", "Copeland", "Carlisle"]
dcms.Recipient_Org_Local_Authority[i] = "Cumberland"
elseif LA in ["Rhondda", "Rhondda Cynon Taff", "Rhondda Cynon Taff County Borough Council", "Rct", "R.c.t.", "Pontypridd"]
dcms.Recipient_Org_Local_Authority[i] = "Rhondda Cynon Taf"
elseif LA in ["Armagh", "Craigavon", "Armagh, Banbridge And Craigavon"]
dcms.Recipient_Org_Local_Authority[i] = "Armagh City, Banbridge And Craigavon"
elseif LA in ["Derry", "Derry And Strabane"]
dcms.Recipient_Org_Local_Authority[i] = "Derry City And Strabane"
elseif LA in ["Fermanagh", "Omagh"]
dcms.Recipient_Org_Local_Authority[i] = "Fermanagh And Omagh"
elseif LA in ["Flint"]
dcms.Recipient_Org_Local_Authority[i] = "Flintshire"
elseif LA in ["Suffolk Coastal", "Waveney"]
dcms.Recipient_Org_Local_Authority[i] = "East Suffolk"
elseif LA in ["Lisburn", "Castlereagh"]
dcms.Recipient_Org_Local_Authority[i] = "Lisburn And Castlereagh"
elseif LA in ["Down", "Newry And Mourne"]
dcms.Recipient_Org_Local_Authority[i] = "Newry, Mourne And Down"
elseif LA in ["Eilean Siar", "Comhairle Nan Eilean Siar (Western Isles Council)"]
dcms.Recipient_Org_Local_Authority[i] = "Na H-Eileanan Siar"
elseif LA in ["Conwy County Borough Council"]
dcms.Recipient_Org_Local_Authority[i] = "Conwy"
elseif LA in ["Birming"]
dcms.Recipient_Org_Local_Authority[i] = "Birmingham"
elseif LA in ["Coleraine"]
dcms.Recipient_Org_Local_Authority[i] = "Causeway Coast And Glens"
elseif LA in ["Cookstown", "Dungannon"]
dcms.Recipient_Org_Local_Authority[i] = "Mid Ulster"
elseif LA in ["Ballymena", "Carrickfergus", "Larne"]
dcms.Recipient_Org_Local_Authority[i] = "Mid And East Antrim"
elseif LA in ["Gwynedd Council", "Bangor Music"]
dcms.Recipient_Org_Local_Authority[i] = "Gwynedd"
elseif LA in ["Blaenau", "Gwent"]
dcms.Recipient_Org_Local_Authority[i] = "Blaenau Gwent"
elseif LA in ["Bournemouth", "Christchurch", "Poole"]
dcms.Recipient_Org_Local_Authority[i] = "Bournemouth, Christchurch And Poole"
elseif LA in ["City Of Westminster"]
dcms.Recipient_Org_Local_Authority[i] = "Westminster"
elseif LA in ["Mid Glam"]
dcms.Recipient_Org_Local_Authority[i] = "Mid Glamorgan"
elseif LA in ["S Glam"]
dcms.Recipient_Org_Local_Authority[i] = "South Glamorgan"
elseif LA in ["Monmouth"]
dcms.Recipient_Org_Local_Authority[i] = "Monmouthshire"
elseif LA in ["Isle Of Anglesey County Council", "Ynys Mon"]
dcms.Recipient_Org_Local_Authority[i] = "Isle Of Anglesey"
elseif LA in ["Pembrokeshire County Council", "Pembrokeshire/Ceredigion", "Sir Benfro"]
dcms.Recipient_Org_Local_Authority[i] = "Pembrokeshire"
elseif LA in ["Cardiganshire", "Cered"]
dcms.Recipient_Org_Local_Authority[i] = "Ceredigion"
elseif LA in ["Carms", "Sir GΓ’r"]
dcms.Recipient_Org_Local_Authority[i] = "Carmarthenshire"
elseif LA in ["Isle Of Anglesey County Council"]
dcms.Recipient_Org_Local_Authority[i] = "Isle Of Anglesey"
elseif LA in ["Stratford On Avon", "Stratford Upon Avon"]
dcms.Recipient_Org_Local_Authority[i] = "Stratford-On-Avon"
elseif LA in ["Kingston Upon Hull"]
dcms.Recipient_Org_Local_Authority[i] = "Kingston Upon Hull, City Of"
end
if dcms.Recipient_Org_Local_Authority[i] == "Kent"
dcms.Recipient_Org_Region[i] = "South East"
elseif dcms.Recipient_Org_Local_Authority[i] == "London"
dcms.Recipient_Org_Region[i] = "London"
elseif dcms.Recipient_Org_Local_Authority[i] == "Mid Glamorgan"
dcms.Recipient_Org_Region[i] = "Wales"
elseif dcms.Recipient_Org_Local_Authority[i] == "South Glamorgan"
dcms.Recipient_Org_Region[i] = "Wales"
elseif dcms.Recipient_Org_Local_Authority[i] == "Surrey"
dcms.Recipient_Org_Region[i] = "South East"
end
if dcms.Recipient_Org_UK_Constituency[i] == "Ruislip, Northwood and Pinner"
dcms.Recipient_Org_Region[i] = "London"
end
end
end
println()
print("Removing duplicate Constituencies... ")
@time begin
for (i, Const) in enumerate(dcms.Recipient_Org_UK_Constituency)
if Const in ["Ealing Southall"]
dcms.Recipient_Org_UK_Constituency[i] = "Ealing, Southall"
elseif Const in ["Enfield Southgate"]
dcms.Recipient_Org_UK_Constituency[i] = "Enfield, Southgate"
elseif Const in ["Berwick upon Tweed"]
dcms.Recipient_Org_UK_Constituency[i] = "Berwick-upon-Tweed"
elseif Const in ["Liverpool Riverside"]
dcms.Recipient_Org_UK_Constituency[i] = "Liverpool, Riverside"
elseif Const in ["Liverpool Walton"]
dcms.Recipient_Org_UK_Constituency[i] = "Liverpool, Walton"
elseif Const in ["Liverpool Wavertree"]
dcms.Recipient_Org_UK_Constituency[i] = "Liverpool, Wavertree"
elseif Const in ["Manchester Gorton"]
dcms.Recipient_Org_UK_Constituency[i] = "Manchester, Gorton"
elseif Const in ["Manchester Withington"]
dcms.Recipient_Org_UK_Constituency[i] = "Manchester, Withington"
elseif Const in ["Caithness Sutherland and Easter Ross", "Caithness, Sutherland and Ross"]
dcms.Recipient_Org_UK_Constituency[i] = "Caithness, Sutherland and Easter Ross"
elseif Const in ["Dumfriesshire"]
dcms.Recipient_Org_UK_Constituency[i] = "Dumfriesshire, Clydesdale and Tweeddale"
elseif Const in ["Dunfermline"]
dcms.Recipient_Org_UK_Constituency[i] = "Dunfermline and West Fife"
elseif Const in ["Inverness and Nairn"]
dcms.Recipient_Org_UK_Constituency[i] = "Inverness, Nairn, Badenoch and Strathspey"
elseif Const in ["Linlithgow"]
dcms.Recipient_Org_UK_Constituency[i] = "Linlithgow and East Falkirk"
elseif Const in ["Ross Skye and Lochaber", "Skye, Lochaber and Badenoch"]
dcms.Recipient_Org_UK_Constituency[i] = "Ross, Skye and Lochaber"
elseif Const in ["Paisley"]
dcms.Recipient_Org_UK_Constituency[i] = "Paisley and Renfrewshire North"
elseif Const in ["Falkirk West"]
dcms.Recipient_Org_UK_Constituency[i] = "Falkirk"
elseif Const in ["Kilmarnock and Irvine Valley"]
dcms.Recipient_Org_UK_Constituency[i] = "Kilmarnock and Loudoun"
elseif Const in ["Brighton Kemptown"]
dcms.Recipient_Org_UK_Constituency[i] = "Brighton, Kemptown"
elseif Const in ["Brighton Pavilion"]
dcms.Recipient_Org_UK_Constituency[i] = "Brighton, Pavilion"
elseif Const in ["Southampton Itchen"]
dcms.Recipient_Org_UK_Constituency[i] = "Southampton, Itchen"
elseif Const in ["Ynys Mon", "Ynys MΓΒ΄n", "Ynys MοΏ½οΏ½n", "Ynys MΓΒ΄n", "Ynys MΓΖΓΒ΄n"]
dcms.Recipient_Org_UK_Constituency[i] = "Ynys MΓ΄n"
elseif Const in ["Birmingham Edgbaston"]
dcms.Recipient_Org_UK_Constituency[i] = "Birmingham, Edgbaston"
elseif Const in ["Birmingham Hall Green"]
dcms.Recipient_Org_UK_Constituency[i] = "Birmingham, Hall Green"
elseif Const in ["Birmingham Ladywood"]
dcms.Recipient_Org_UK_Constituency[i] = "Birmingham, Ladywood"
elseif Const in ["Birmingham Perry Barr"]
dcms.Recipient_Org_UK_Constituency[i] = "Birmingham, Perry Barr"
elseif Const in ["Birmingham Selly Oak"]
dcms.Recipient_Org_UK_Constituency[i] = "Birmingham, Selly Oak"
elseif Const in ["Sheffield Hallam"]
dcms.Recipient_Org_UK_Constituency[i] = "Sheffield, Hallam"
elseif Const in ["Sheffield Heeley"]
dcms.Recipient_Org_UK_Constituency[i] = "Sheffield, Heeley"
elseif Const in ["Lewisham Deptford"]
dcms.Recipient_Org_UK_Constituency[i] = "Lewisham, Deptford"
end
end
end
println()
print("Removing duplicate Wards... ")
@time begin
dcms.Recipient_Org_Ward .= replace.(dcms.Recipient_Org_Ward, "St." => "St", "&" => "and", "-" => " ",)
for (i, Ward) in enumerate(dcms.Recipient_Org_Ward)
if Ward in ["Unknown", "NULL", "FAILED"]
dcms.Recipient_Org_Ward[i] = "Not Derived"
end
end
end
println()
print("Removing \"Not Derived\" where possible ... ")
@time begin
sort!(dcms, :Recipient_Org_Ward)
for i = 2:nrow(dcms)
if dcms.Recipient_Org_Ward[i] == "Not Derived" || dcms.Recipient_Org_Ward[i-1] == "Not Derived"
else
if dcms.Recipient_Org_Local_Authority[i] == "Not Derived" && dcms.Recipient_Org_Ward[i] == dcms.Recipient_Org_Ward[i-1]
dcms.Recipient_Org_Local_Authority[i] = dcms.Recipient_Org_Local_Authority[i-1]
end
end
end
sort!(dcms, :Recipient_Org_Local_Authority)
for i = 2:nrow(dcms)
if dcms.Recipient_Org_Local_Authority[i] == "Not Derived" || dcms.Recipient_Org_Local_Authority[i-1] == "Not Derived"
elseif dcms.Recipient_Org_Region[i] == "Not Derived" && dcms.Recipient_Org_Local_Authority[i] == dcms.Recipient_Org_Local_Authority[i-1]
dcms.Recipient_Org_Region[i] = dcms.Recipient_Org_Region[i-1]
end
end
sort!(dcms, :Recipient_Org_UK_Constituency)
for i in findall(l -> l == "Not Derived", dcms.Recipient_Org_Region)
j = 0
if dcms.Recipient_Org_UK_Constituency[i] !== "Not Derived"
while dcms.Recipient_Org_Region[i] == "Not Derived" && (dcms.Recipient_Org_UK_Constituency[i] == dcms.Recipient_Org_UK_Constituency[i+j] || dcms.Recipient_Org_UK_Constituency[i] == dcms.Recipient_Org_UK_Constituency[i-j])
j += 1
if dcms.Recipient_Org_Region[i+j] !== "Not Derived" && dcms.Recipient_Org_UK_Constituency[i+j] == dcms.Recipient_Org_UK_Constituency[i]
dcms.Recipient_Org_Region[i] = dcms.Recipient_Org_Region[i+j]
elseif dcms.Recipient_Org_Region[i-j] !== "Not Derived" && dcms.Recipient_Org_UK_Constituency[i-j] == dcms.Recipient_Org_UK_Constituency[i]
dcms.Recipient_Org_Region[i] = dcms.Recipient_Org_Region[i-j]
end
end
end
end
end
end
Having made this update, the ability to roundtrip the dataframe now depends on itβs source. The data from the directly downloaded file will roundtrip, but the file from HTTP get wonβt. More strangely, merely sorting the HTTP dataframe somehow allows it to make the round trip.
Reading fixeddirect.csv succeeded!
Reading sortedfixedHTTPget.csv succeeded!
β Warning: thread = 1 warning: only found 15 / 16 columns around data row: 210003. Filling remaining columns with `missing`
β @ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:586
β Warning: thread = 1 warning: only found 15 / 16 columns around data row: 210003. Filling remaining columns with `missing`
β @ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:586
β Warning: thread = 1 warning: only found 15 / 16 columns around data row: 210003. Filling remaining columns with `missing`
β @ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:586
β Warning: thread = 1 warning: only found 15 / 16 columns around data row: 210003. Filling remaining columns with `missing`
β @ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:586
ERROR: LoadError: TaskFailedException
nested task error: CSV.Error("thread = 2 fatal error, encountered an invalidly quoted field while parsing around row = 175539, col = 3: \"\"I will undertake a research trip hosted by Michele Bryd-McPhee curator of βLadies of Hip-Hop Festivalβ in New York City in March and July 2018 with 3 fundamental areas of enquiry; \n\", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself")
Stacktrace:
[1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:590
[2] parsevalue!(::Type{String}, buf::Vector{UInt8}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:798
[3] parserow
@ C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:640 [inlined]
[4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{CSV.Column}, ::Type{Tuple{}})
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:550
[5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{Vector{CSV.Column}}, rowchunkguess::Int64, i::Int64, rows::Vector{Int64}, wholecolumnslock::ReentrantLock)
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:360
[6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
@ CSV C:\Users\TGebbels\.julia\packages\WorkerUtilities\ey0fP\src\WorkerUtilities.jl:384
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base .\task.jl:448
[2] macro expansion
@ .\task.jl:480 [inlined]
[3] CSV.File(ctx::CSV.Context, chunking::Bool)
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:240
[4] File
@ C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:227 [inlined]
[5] #File#32
@ C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:223 [inlined]
[6] CSV.File(source::String)
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\file.jl:162
[7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\CSV.jl:117
[8] read(source::String, sink::Type)
@ CSV C:\Users\TGebbels\.julia\packages\CSV\cwX2w\src\CSV.jl:113
[9] top-level scope
@ c:\Users\TGebbels...\Documents\DCMS Database\CompareCsv.jl:361
[10] include(fname::String)
@ Base.MainInclude .\client.jl:489
[11] run(debug_session::VSCodeDebugger.DebugAdapter.DebugSession, error_handler::VSCodeDebugger.var"#3#4"{String})
@ VSCodeDebugger.DebugAdapter c:\Users\TGebbels\.vscode\extensions\julialang.language-julia-1.105.2\scripts\packages\DebugAdapter\src\packagedef.jl:126
[12] startdebugger()
@ VSCodeDebugger c:\Users\TGebbels\.vscode\extensions\julialang.language-julia-1.105.2\scripts\packages\VSCodeDebugger\src\VSCodeDebugger.jl:45
[13] top-level scope
@ c:\Users\TGebbels\.vscode\extensions\julialang.language-julia-1.105.2\scripts\debugger\run_debugger.jl:12
[14] include(mod::Module, _path::String)
@ Base .\Base.jl:495
[15] exec_options(opts::Base.JLOptions)
@ Base .\client.jl:318
[16] _start()
@ Base .\client.jl:552
in expression starting at c:\Users\TGebbels\...\Documents\DCMS Database\CompareCsv.jl:361
Furthermore, if I make one of a number of small changes to fixdcms!
the problem goes away. For example, if I comment out the very final conditional element in the function:
# dcms.Recipient_Org_Region[i] = dcms.Recipient_Org_Region[i-j]
Then the resultant dataset will roundtrip just fine.
Or, I can leave that line in, but comment out the elseif
blocks that assign an LA name that includes a comma like this one, for example:
# elseif LA in ["Armagh", "Craigavon", "Armagh, Banbridge And Craigavon"]
# dcms.Recipient_Org_Local_Authority[i] = "Armagh City, Banbridge And Craigavon"
Again, the resultant dataset will round-trip just fine. Either of these works even though the other is left in the code. I donβt need to take them both out. Just removing one is sufficient.
I understand that the underlying issue is almost certainly in the difference between the two files. Iβm not sure how to pin this down, though.
But my question is really about how this difference gets read into two ostensibly identical dataframes, processed and written back out again without a squeak of complaint, but one results in a csv file that CSV.read cannot read back and yet a simple sort fixes the issue.