When managing some bills using a toy framework I’m developing in Julia, I am finding it’s useful to embed some metadata in the filename. I wanted to share an example here for comments: does anyone know of a library in any language which implements this idiomatic “fields separated by underscores” file naming strategy? I have used it several times but always written a one-off parser for it.
I wonder if it’s worth making a package? If a package existed, what would it be named? MetadataFilenames.jl? FilenameDataFields.jl? Appreciate any collaboration! I’m sure other people do this too…
using Dates
testdata = [
# natural gas bills
"semco2019-03-25to04-17_thm62p068_usd57p93.pdf", "semco2019-04-17to05-17_thm52p650_usd104p61.pdf",
"semco2019-05-17to06-18_thm25p344_usd29p48.pdf", "semco2019-06-18to07-18_thm9p477_usd19p05.pdf",
"semco2019-07-18to08-16_thm6p318_usd17p21.pdf", "semco2019-08-16to09-17_thm6p288_usd17p18.pdf",
"semco2019-09-17to10-18_thm23p034_usd26p93.pdf", "semco2019-10-18to11-18_thm89p505_usd63p76.pdf",
"semco2019-11-18to12-18_thm107p814_usd72p91.pdf", "semco2019-12-18to2020-01-17_thm124p726_usd79p72.pdf",
"semco2020-01-17to02-18_thm141p504_usd87p84.pdf", "semco2020-02-18to03-18_thm104p445_usd66p87.pdf",
"semco2020-03-18to04-16_thm83p661_usd60p27.pdf", "semco2020-04-16to05-19_thm69p630_usd53p38.pdf",
"semco2020-05-19to06-18_thm14p686_usd21p32.pdf", "semco2020-06-18to07-20_thm6p312_usd16p42.pdf",
"semco2020-07-20to08-18_thm17p901_usd23p19.pdf", "semco2020-08-18to09-17_thm16p848_usd22p81.pdf",
"semco2020-09-17to10-19_thm32p767_usd32p84.pdf",
]
p2f(p::AbstractString, t=Float64) = parse(t, replace(p, "p" => "."))
for x in testdata
d = Dict{Symbol,Any}(:filename => x)
name, d[:extension] = splitext(x)
# terms (separated by underscores) have the following forms:
# * foobar - flag, stores :foobar => true
# * nofoobar - flag, stores :foobar => false
# * foobar2000p0 - numerical, stores :foobar => 2000.0
# * foobar-2001p0 - numerical, stores :foobar => -2001.0
# * foobar2000p1to-2001 - numerical range, stores :foobar => [2000.1, -2001.0]
# * foobar2000-01-01to02 - date range, stores :foobar => [Date(2000, 1, 1), Date(2000, 1, 2)]
# * foobar2000-01-01to02-01 - date range, stores :foobar => [Date(2000, 1, 1), Date(2000, 2, 1)]
# * foobar2000-01-01to2001-01-01 - date range, stores :foobar => [Date(2000, 1, 1), Date(2001, 1, 1)]
for term in split(name, "_")
# flag
# TODO
# date
m = match(r"^([A-Za-z]+)(\d{4}-\d{2}-\d{2})$", term)
if !isnothing(m)
key, val = m.captures
d[Symbol(key)] = Date(val, "yyyy-mm-dd")
continue
end
# date range
m = match(r"^([A-Za-z]+)(\d{4}-\d{2}-\d{2})to((?:(?:\d{4}-)?\d{2}-)?\d{2})$", term)
if !isnothing(m)
key, val_start, val_end = m.captures
d[Symbol(key)] = Date.([val_start val_start[1:end-length(val_end)]*val_end], "yyyy-mm-dd")
continue
end
# numerical
m = match(r"^([A-Za-z]+)(-?\d+p\d+)$", term)
if !isnothing(m)
key, val = m.captures
d[Symbol(key)] = p2f(val)
continue
end
# numerical range
m = match(r"^([A-Za-z]+)(-?\d+p\d+)to(-?\d+p\d+)$", term)
if !isnothing(m)
key, val_start, val_end = m.captures
d[Symbol(key)] = p2f.([val_start, val_end])
continue
end
d[:unknown] = [String(term), get(d, :unknown, [])...]
end
@show d
end
# note - perhaps add a helper to allow a Parameters.jl struct to be populated
# using fields from the filename?