Hi everyone,
I created a package to help me work with properties in an expression without repeatedly using the dot syntax. The gist is that you can do this:
julia> nt = (x=1, y=2)
(x = 1, y = 2)
julia> z = 3
3
julia> @withprops nt x/y + z
3.5
That is, for the statement @withprops src expr
, any symbol x
that shows up in expr
will be replaced by getproperty(src, x)
. This likely already exists somewhere else, but I couldn’t find it.
My main use of this was creating my own StatsModels-like syntax, where instead of creating columns via a DSL, each term is valid Julia code (after the getproperty
replacement) e.g.
using DataFrames, StatsBase, Telperion
df = DataFrame(y=rand(100), a=1:100, b=randn(100), c=randn(100), d=rand(1:5, 100))
x, y = @xy df log.(y) ~ 1 + a + zscore(b) + abs.(sin.(c)) + dummy(d)
x
OrderedDict{String,Any} with 8 entries:
"1" => [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
"a" => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
"zscore(b)" => [1.13036, -0.280105, 2.29973, -0.267989, -0.240071, -0.797709, -0.315514, -0.322103, 0.0217353, -1.67589 … 1.45323, -0.363556, -0.650576, -1.543…
"abs.(sin.(c))" => [0.753822, 0.992965, 0.41306, 0.733578, 0.21487, 0.958583, 0.163681, 0.238074, 0.166078, 0.920199 … 0.407876, 0.277916, 0.0207317, 0.572013, 0.2…
"dummy(d) [2]" => Bool[0, 0, 0, 0, 0, 0, 1, 0, 0, 0 … 0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
"dummy(d) [3]" => Bool[0, 0, 0, 0, 0, 1, 0, 1, 0, 1 … 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
"dummy(d) [4]" => Bool[0, 1, 0, 1, 0, 0, 0, 0, 0, 0 … 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]
"dummy(d) [5]" => Bool[1, 0, 1, 0, 1, 0, 0, 0, 0, 0 … 1, 0, 0, 1, 0, 0, 0, 0, 0, 1]
The idea is that you can then create the matrix of features via reduce(hcat, values(x))
or similar.
Here’s the source: GitHub - joshday/Telperion.jl: Simple Statistical Formulas
p.s. There’s nothing to the name. It’s just a random Tolkien reference.