Is it possible to use Complex numbers and custom types in StatsModels as continuous variables?
So, why I made this thread?
With little abuse of StatsModels.jl I made this categorical term and got row columns in model matrix:
import StatsModels: ContrastsMatrix, AbstractContrasts, modelcols
mutable struct RawCoding <: AbstractContrasts
end
function StatsModels.ContrastsMatrix(contrasts::RawCoding, levels::AbstractVector{T}) where T
ContrastsMatrix(ones(1,1),
[""],
levels,
contrasts)
end
function StatsModels.modelcols(t::CategoricalTerm{RawCoding, T, N}, d::NamedTuple) where T where N
d[t.sym]
end
Why It was helpful for me?
I’m working on Metida.jl, and I thought, that I can try to make spatial structured covariance matrix with data where observation have complicated properties (for simple example: coordinates; but it can be any user data structures). So for that I only need to make model matrix with StatsModels and method to get distance (or any metric) between two observations.
DataFrame:
using DataFrames
vy = [1.0,1,2,3,2,3,2,3,4,5,6,6,3,2]
vf = ["a","a","a","a","a","a","a","b","b","b","b","b","b","b"]
vc = [CartesianIndex(1, 1),
CartesianIndex(4, 5),
CartesianIndex(5, 10),
CartesianIndex(2, 7),
CartesianIndex(7, 7),
CartesianIndex(10, 7),
CartesianIndex(6, 15),
CartesianIndex(8, 3),
CartesianIndex(4, 8),
CartesianIndex(2, 8),
CartesianIndex(3, 6),
CartesianIndex(9, 12),
CartesianIndex(8, 11),
CartesianIndex(1, 10),]
vcx = map(x-> x[2], vc)
vcy = map(x-> x[1], vc)
df = DataFrame(vy = vy, vf = vf, vc = vc, vcx = vcx, vcy = vcy)
where vy - dependent variable, vf - categorical regressor, vc - vector of coordinates (vcx , vcy - same but as simple vectors).
At first simple example of spatial exponential model:
using Metida
lmm = Metida.LMM(@formula(vy ~ vf), df;
repeated = Metida.VarEffect(Metida.@covstr(vcx+vcy|1), Metida.SPEXP),
)
Metida.fit!(lmm)
result:
Linear Mixed Model: vy ~ vf
Random 1:
No
Repeated:
Model: :(vcx + vcy)|1
Type: SPEXP (2)
Blocks: 1, Maximum block size: 14
Status: converged (No Errors)
-2 logREML: 43.3528 BIC: 48.3226
Fixed-effects parameters:
────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────
(Intercept) 1.98684 0.473562 4.20 <1e-04
vf: b 2.14717 0.666787 3.22 0.0013
────────────────────────────────────────────────
Variance components:
θ vector: [1.25307, 0.331533]
Residual σ² var 1.57018
Residual θ theta 0.331533
and if we make method for Metida to calculate distance we can use column with coordinates:
function Metida.edistance(mx::AbstractMatrix{<:CartesianIndex}, i::Int, j::Int)
return sqrt((mx[i, 1][1] - mx[j, 1][1])^2 + (mx[i, 1][2] - mx[j, 1][2])^2)
end
lmm = Metida.LMM(@formula(vy ~ vf), df;
repeated = Metida.VarEffect(Metida.@covstr(vc|1), Metida.SPEXP; coding = Dict(:vc => Metida.RawCoding())),
)
Metida.fit!(lmm)
result:
Linear Mixed Model: vy ~ vf
Random 1:
No
Repeated:
Model: vc|1
Type: SPEXP (2)
Blocks: 1, Maximum block size: 14
Status: converged (No Errors)
-2 logREML: 43.3528 BIC: 48.3226
Fixed-effects parameters:
────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────
(Intercept) 1.98684 0.473562 4.20 <1e-04
vf: b 2.14717 0.666787 3.22 0.0013
────────────────────────────────────────────────
Variance components:
θ vector: [1.25307, 0.331533]
Residual σ² var 1.57018
Residual θ theta 0.331533
as expected results is the same…
But if we want some complicated distance? For example shortest way? I used using AStarSearch.jl to use maze:
using AStarSearch, Plots
const UP = CartesianIndex(-1, 0)
const DOWN = CartesianIndex(1, 0)
const LEFT = CartesianIndex(0, -1)
const RIGHT = CartesianIndex(0, 1)
const DIRECTIONS = [UP, DOWN, LEFT, RIGHT]
manhattan(a::CartesianIndex, b::CartesianIndex) = sum(abs.((b - a).I))
function mazeneighbours(maze, p)
res = CartesianIndex[]
for d in DIRECTIONS
n = p + d
if 1 ≤ n[1] ≤ size(maze)[1] && 1 ≤ n[2] ≤ size(maze)[2] && !maze[n]
push!(res, n)
end
end
return res
end
function solvemaze(maze, start, goal)
currentmazeneighbours(state) = mazeneighbours(maze, state)
return astar(currentmazeneighbours, start, goal, heuristic=manhattan)
end
maze = [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0;
0 1 0 0 0 1 0 0 0 0 1 0 0 0 0;
0 1 0 0 1 0 1 0 0 0 1 0 0 1 0;
0 0 0 1 1 0 1 0 1 0 1 0 1 0 0;
1 0 1 0 0 0 0 0 1 0 0 1 0 0 0;
0 0 0 0 0 1 1 1 1 0 0 0 1 0 0;
1 1 1 0 0 1 0 0 0 0 0 0 0 1 0;
0 0 0 0 1 0 0 0 1 0 0 1 1 1 0;
0 0 1 1 0 0 1 1 0 1 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] .== 1
And covariance based on A* path now:
Metida.edistance(mx::AbstractMatrix{<:CartesianIndex}, i::Int, j::Int) = begin
res = solvemaze(maze, mx[i, 1], mx[j, 1])
res.cost
end
lmm = Metida.LMM(@formula(vy ~ vf), df;
repeated = Metida.VarEffect(Metida.@covstr(vc|1), Metida.SPEXP; coding = Dict(:vc => Metida.RawCoding())),
)
Metida.fit!(lmm)
result:
Linear Mixed Model: vy ~ vf
Random 1:
No
Repeated:
Model: vc|1
Type: SPEXP (2)
Blocks: 1, Maximum block size: 14
Status: converged (No Errors)
-2 logREML: 43.3107 BIC: 48.2805
Fixed-effects parameters:
────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────
(Intercept) 1.98521 0.472616 4.20 <1e-04
vf: b 2.14289 0.663757 3.23 0.0012
────────────────────────────────────────────────
Variance components:
θ vector: [1.25128, 0.447061]
Residual σ² var 1.5657
Residual θ theta 0.447061
So we have slightly different results as expected. (This minimal working example can be executed with “dev” brunch of Metida).
Inspirational materials:
Davis BJK, Curriero FC. Development and Evaluation of Geostatistical Methods for Non-Euclidean-Based Spatial Covariance Matrices. Math Geosci. 2019 Aug;51(6):767-791. doi: 10.1007/s11004-019-09791-y. Epub 2019 Mar 14. PMID: 31827631; PMCID: PMC6905632.
https://www.researchgate.net/publication/346030087_Estimation_of_Shortest_Path_Covariance_Matrices