Make model matrix with raw data (Non-Real, Complex numbers and other structures) in StatsModels and possible application with Metida (spatial covariance with shortest path)

PharmCat · January 22, 2024, 11:23pm

Is it possible to use Complex numbers and custom types in StatsModels as continuous variables?

PharmCat · January 23, 2024, 6:43pm

So, why I made this thread?

With little abuse of StatsModels.jl I made this categorical term and got row columns in model matrix:

import StatsModels: ContrastsMatrix, AbstractContrasts, modelcols

mutable struct RawCoding <: AbstractContrasts
end
function StatsModels.ContrastsMatrix(contrasts::RawCoding, levels::AbstractVector{T}) where T
    ContrastsMatrix(ones(1,1),
                             [""],
                             levels,
                             contrasts)
end
function StatsModels.modelcols(t::CategoricalTerm{RawCoding, T, N}, d::NamedTuple) where T where N
    d[t.sym]
end

Why It was helpful for me?

I’m working on Metida.jl, and I thought, that I can try to make spatial structured covariance matrix with data where observation have complicated properties (for simple example: coordinates; but it can be any user data structures). So for that I only need to make model matrix with StatsModels and method to get distance (or any metric) between two observations.

DataFrame:

using DataFrames

vy = [1.0,1,2,3,2,3,2,3,4,5,6,6,3,2]
vf = ["a","a","a","a","a","a","a","b","b","b","b","b","b","b"]
vc = [CartesianIndex(1, 1),
CartesianIndex(4, 5),
CartesianIndex(5, 10),
CartesianIndex(2, 7),
CartesianIndex(7, 7),
CartesianIndex(10, 7),
CartesianIndex(6, 15),
CartesianIndex(8, 3),
CartesianIndex(4, 8),
CartesianIndex(2, 8),
CartesianIndex(3, 6),
CartesianIndex(9, 12),
CartesianIndex(8, 11),
CartesianIndex(1, 10),]
vcx = map(x-> x[2], vc) 
vcy = map(x-> x[1], vc)
df = DataFrame(vy = vy, vf = vf, vc = vc, vcx = vcx, vcy = vcy)

where vy - dependent variable, vf - categorical regressor, vc - vector of coordinates (vcx , vcy - same but as simple vectors).

At first simple example of spatial exponential model:

using Metida
lmm = Metida.LMM(@formula(vy ~ vf), df;
    repeated = Metida.VarEffect(Metida.@covstr(vcx+vcy|1), Metida.SPEXP),
    )
Metida.fit!(lmm)

result:

Linear Mixed Model: vy ~ vf
Random 1:
   No
Repeated:
    Model: :(vcx + vcy)|1
    Type: SPEXP (2)
Blocks: 1, Maximum block size: 14
Status: converged (No Errors)
    -2 logREML: 43.3528    BIC: 48.3226

    Fixed-effects parameters:
────────────────────────────────────────────────
               Coef.  Std. Error     z  Pr(>|z|)
────────────────────────────────────────────────
(Intercept)  1.98684    0.473562  4.20    <1e-04
vf: b        2.14717    0.666787  3.22    0.0013
────────────────────────────────────────────────
    Variance components:
    θ vector: [1.25307, 0.331533]
  Residual   σ²    var     1.57018
  Residual   θ     theta   0.331533

and if we make method for Metida to calculate distance we can use column with coordinates:

function Metida.edistance(mx::AbstractMatrix{<:CartesianIndex}, i::Int, j::Int)
    return sqrt((mx[i, 1][1] - mx[j, 1][1])^2 + (mx[i, 1][2] - mx[j, 1][2])^2)
end
lmm = Metida.LMM(@formula(vy ~ vf), df;
    repeated = Metida.VarEffect(Metida.@covstr(vc|1), Metida.SPEXP; coding = Dict(:vc => Metida.RawCoding())),
    )
Metida.fit!(lmm)

result:

Linear Mixed Model: vy ~ vf
Random 1: 
   No
Repeated:
    Model: vc|1
    Type: SPEXP (2)
Blocks: 1, Maximum block size: 14
Status: converged (No Errors)
    -2 logREML: 43.3528    BIC: 48.3226

    Fixed-effects parameters:
────────────────────────────────────────────────
               Coef.  Std. Error     z  Pr(>|z|)
────────────────────────────────────────────────
(Intercept)  1.98684    0.473562  4.20    <1e-04
vf: b        2.14717    0.666787  3.22    0.0013
────────────────────────────────────────────────
    Variance components:
    θ vector: [1.25307, 0.331533]
  Residual   σ²    var     1.57018
  Residual   θ     theta   0.331533

as expected results is the same…

But if we want some complicated distance? For example shortest way? I used using AStarSearch.jl to use maze:

using AStarSearch, Plots

const UP = CartesianIndex(-1, 0)
const DOWN = CartesianIndex(1, 0)
const LEFT = CartesianIndex(0, -1)
const RIGHT = CartesianIndex(0, 1)
const DIRECTIONS = [UP, DOWN, LEFT, RIGHT]

manhattan(a::CartesianIndex, b::CartesianIndex) = sum(abs.((b - a).I))

function mazeneighbours(maze, p)
  res = CartesianIndex[]
  for d in DIRECTIONS
      n = p + d
      if 1 ≤ n[1] ≤ size(maze)[1] && 1 ≤ n[2] ≤ size(maze)[2] && !maze[n]
          push!(res, n)
      end
  end
  return res
end

function solvemaze(maze, start, goal)
  currentmazeneighbours(state) = mazeneighbours(maze, state)
  return astar(currentmazeneighbours, start, goal, heuristic=manhattan)
end

maze = [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0;
        0 1 0 0 0 1 0 0 0 0 1 0 0 0 0;
        0 1 0 0 1 0 1 0 0 0 1 0 0 1 0;
        0 0 0 1 1 0 1 0 1 0 1 0 1 0 0;
        1 0 1 0 0 0 0 0 1 0 0 1 0 0 0;
        0 0 0 0 0 1 1 1 1 0 0 0 1 0 0;
        1 1 1 0 0 1 0 0 0 0 0 0 0 1 0;
        0 0 0 0 1 0 0 0 1 0 0 1 1 1 0;
        0 0 1 1 0 0 1 1 0 1 0 0 0 0 0;
        0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] .== 1

And covariance based on A* path now:

Metida.edistance(mx::AbstractMatrix{<:CartesianIndex}, i::Int, j::Int) = begin
    res = solvemaze(maze, mx[i, 1], mx[j, 1])
    res.cost
end
lmm = Metida.LMM(@formula(vy ~ vf), df;
    repeated = Metida.VarEffect(Metida.@covstr(vc|1), Metida.SPEXP; coding = Dict(:vc => Metida.RawCoding())),
    )
Metida.fit!(lmm)

result:

Linear Mixed Model: vy ~ vf
Random 1:
   No
Repeated:
    Model: vc|1
    Type: SPEXP (2)
Blocks: 1, Maximum block size: 14
Status: converged (No Errors)
    -2 logREML: 43.3107    BIC: 48.2805

    Fixed-effects parameters:
────────────────────────────────────────────────
               Coef.  Std. Error     z  Pr(>|z|)
────────────────────────────────────────────────
(Intercept)  1.98521    0.472616  4.20    <1e-04
vf: b        2.14289    0.663757  3.23    0.0012
────────────────────────────────────────────────
    Variance components:
    θ vector: [1.25128, 0.447061]
  Residual   σ²    var     1.5657
  Residual   θ     theta   0.447061

So we have slightly different results as expected. (This minimal working example can be executed with “dev” brunch of Metida).

Inspirational materials:

Davis BJK, Curriero FC. Development and Evaluation of Geostatistical Methods for Non-Euclidean-Based Spatial Covariance Matrices. Math Geosci. 2019 Aug;51(6):767-791. doi: 10.1007/s11004-019-09791-y. Epub 2019 Mar 14. PMID: 31827631; PMCID: PMC6905632.

https://www.researchgate.net/publication/346030087_Estimation_of_Shortest_Path_Covariance_Matrices

Topic		Replies	Views
[ANN] Metida.jl: mixed-effects models fitting package Package Announcements package , announcement , statistics	23	2765	May 15, 2025
[ANN] Metida.jl: experimental package for mixed-effects models with flexible covariance structure Package Announcements package , announcement	4	1305	October 21, 2021
Should model matrix for nested factors be full-rank? Statistics	15	587	August 1, 2022
Having trouble with StatsModels.jl `modelmatrix` to replicate R results Statistics	1	63	April 11, 2025
PSA: breaking changes in StatsModels v0.6.0 (Terms 2.0: Son of Terms) Statistics	6	1592	July 2, 2019

Make model matrix with raw data (Non-Real, Complex numbers and other structures) in StatsModels and possible application with Metida (spatial covariance with shortest path)

Related topics