It occurred to me that it would be nice to be able to create a table from a string by providing a regex that uses named groups.
nushell’s parse
function (docs, example) does this.
A RegexMatch
has all the right things for a Tables.jl row, except it doesn’t provide getproperty
overloads as historically it’s fields were part of it’s public API (though that has progressively become less true, it would be too breaking to change that now).
But we can wrap a RegexMatch
in a suitable type.
(Might as well use Tables.AbstractRow
though we could go down the getproperty
route instead if we would rather).
using Tables
struct RegexMatchRow <: Tables.AbstractRow
match::RegexMatch
end
Tables.getcolumn(m::RegexMatchRow, i::Int) = getfield(m, :match)[i]
Tables.getcolumn(m::RegexMatchRow, i::Symbol) = getfield(m, :match)[i]
Tables.columnnames(m::RegexMatchRow) = Symbol.(keys(getfield(m, :match)))
Example:
julia> pattern = r"(?P<id>\w+) +(?P<desktop>-?\d+) +(?P<x>-?\d+) +(?P<y>-?\d+) +(?P<width>\d+) +(?P<height>\d+) +(?P<pc>\w+) +(?P<title>.*)";
julia> DataFrame(RegexMatchRow.(match.(pattern, eachline(`wmctrl -l -G`))))
5×8 DataFrame
Row │ id desktop x y width height pc title
│ SubStrin… SubStrin… SubStrin… SubStrin… SubStrin… SubStrin… SubStrin… SubStrin…
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 0x02800003 -1 0 56 3760 2372 Aji @!0,28;BDH
2 │ 0x02000001 0 324 471 2677 1942 Aji Slack | * data | Julia
3 │ 0x04200003 0 20 44 2376 2372 Aji Latest Domains/Data topics - Jul…
4 │ 0x04000010 0 176 -36 2832 1604 Aji julia-master /home/oxinabox
5 │ 0x05400038 0 1936 192 1844 2298 Aji new 1 - Notepadqq
Something that would be nice to do on top of this would be to infer the types base on the regex.
e.g. if the capture is for (?P<width>\d+)
we can take a pretty solid guess that this column is Int
.