StatsModels values(Schema) no longer works

About a year ago, using StatsModels 0.6.5, the following code worked:

function exploder(formula::FormulaTerm, df)
    s = schema(formula, df)
    function iscat(x)
        return isa(x, CategoricalTerm)
    end
    cats = filter(iscat, collect(values(s)))
...

After updating to the latest, 0.6.14, it fails:

ERROR: MethodError: no method matching length(::StatsModels.Schema)

My stack trace doesn’t show anything useful (because I’m in the debugger?) but by stepping through I found the problem is in values(s), which leads to

collect(itr) = _collect(1:1 #= Array =#, itr, IteratorEltype(itr), IteratorSize(itr))

IteratorSize seems to lead to the call to length.

The docs say Schema is a subtype of Dictionary, and so it sounds as if this should work.

I don’t know if this is a bug or if I’m using the type incorrectly. Any ideas?

Thanks.

Here’s a self-contained example:

using DataFrames
using StatsModels
using CategoricalArrays
df = DataFrame(a=categorical([1, 3, 4, 3]), b=["z", "a", "a", "b"], c=[0., 0.5, 1.0, 1.5])
# complex model
f = @formula(a+b ~ a*b+b*c)
s = schema(f, df)
vs = values(s)
println(vs === s)
function iscat(x)
    return isa(x, CategoricalTerm)
end
cats = filter(iscat, collect(values(s)))
foreach(println, values(s))
    println(x)
end

values(s) simply returns s.

The filter leads to

MethodError: no method matching length(::StatsModels.Schema)
Closest candidates are:
  length(!Matched::JSON.Parser.MemoryParserState) at C:\Users\rdboylan\.julia\packages\JSON\3rsiS\src\Parser.jl:28
  length(!Matched::CompositeException) at task.jl:41
  length(!Matched::LibGit2.GitBlob) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LibGit2\src\blob.jl:3
  ...
_similar_for(::UnitRange{Int64}, ::Type{Any}, ::StatsModels.Schema, ::Base.HasLength) at array.jl:597
_collect(::UnitRange{Int64}, ::StatsModels.Schema, ::Base.HasEltype, ::Base.HasLength) at array.jl:630
collect(::StatsModels.Schema) at array.jl:624
top-level scope at eg01.jl:12
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088

while the foreach yields

MethodError: no method matching iterate(::StatsModels.Schema)
Closest candidates are:
  iterate(!Matched::LibGit2.GitRebase) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LibGit2\src\rebase.jl:48
  iterate(!Matched::LibGit2.GitRebase, !Matched::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LibGit2\src\rebase.jl:48
  iterate(!Matched::CommonMark.Node) at C:\Users\rdboylan\.julia\packages\CommonMark\tT85V\src\ast.jl:52
  ...
foreach(::typeof(println), ::StatsModels.Schema) at abstractarray.jl:2009
top-level scope at eg01.jl:9
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088

Julia 1.5.1.

And with this addition the previous test code works:

Base.values(schema::StatsModels.Schema) = values(schema.schema)

This is like other definitions of other iterator/accessor-like functions already in StatsModels.Schema.

Why the code used to work and no longer does remains unclear. Going back to the 0.6.5 version of Schema.jl does not result in a definition of values that has since been removed. Perhaps there was some change in Julia’s handling of collections? The current documentation for Base.values does say that the default implementation returns the original object, which is what was happening before the tweak above.

Opened https://github.com/JuliaStats/StatsModels.jl/issues/193#issue-703212877 about the general question of how Dict-like Schema should be.

1 Like