I have a trivial program working on a simple gz’ed CSV (available here):
using CSV
df = CSV.File("Perf_Health_MSSQL\$PTSQL_Buffer Manager_Page life expectancy_21-8h.csv.gz",
header=[:domain, :host, :feature, :oid, :largeversion, :clientid,
:from, :to, :aggrlevel, :firstocc, :lastocc, :livesuntil,
:ct, :sum, :min, :max, :g_lower, :g_upper, :g_ct, :g_sum],
delim='|')
This fails with
ArgumentError: The length of provided header (20) doesn’t match the number of columns at row 1 (5).
Manually unpacking the file and reading it works perfectly. Can anyone tell me what the problem is, and maybe how to solve it?
Especially, I dont understand why there would be 5 rows in the first row …
// Edit: Suspecting that the delim does not work with a gz’ed file (but why??), I “analyzed” the first line with
using StatsBase
filter((k,v)->v==4, countmap(collect("PHARMATECHNIK|STA-WS174|Perf/Health:MSSQL\$PTSQL:Buffer Manager\\Page life expectancy:21-8h|15294|2019.11|STA-WS174|2019-07-29T21:00:00|2019-07-29T21:10:00|2|2019-07-29T21:00:06|2019-07-29T21:09:06|2019-08-03T21:10:00|10|7549.00000|484.00000|1024.00000||||")))
There are a few letters that occur 4 times (thus splitting the line into 5 columns) - but none makes much sense as a default delimiter:
'P' => 4
'.' => 4
'f' => 4
'A' => 4