About a year ago, using StatsModels 0.6.5, the following code worked:
function exploder(formula::FormulaTerm, df)
s = schema(formula, df)
function iscat(x)
return isa(x, CategoricalTerm)
end
cats = filter(iscat, collect(values(s)))
...
After updating to the latest, 0.6.14, it fails:
ERROR: MethodError: no method matching length(::StatsModels.Schema)
My stack trace doesn’t show anything useful (because I’m in the debugger?) but by stepping through I found the problem is in values(s)
, which leads to
collect(itr) = _collect(1:1 #= Array =#, itr, IteratorEltype(itr), IteratorSize(itr))
IteratorSize
seems to lead to the call to length
.
The docs say Schema is a subtype of Dictionary, and so it sounds as if this should work.
I don’t know if this is a bug or if I’m using the type incorrectly. Any ideas?
Thanks.
Here’s a self-contained example:
using DataFrames
using StatsModels
using CategoricalArrays
df = DataFrame(a=categorical([1, 3, 4, 3]), b=["z", "a", "a", "b"], c=[0., 0.5, 1.0, 1.5])
# complex model
f = @formula(a+b ~ a*b+b*c)
s = schema(f, df)
vs = values(s)
println(vs === s)
function iscat(x)
return isa(x, CategoricalTerm)
end
cats = filter(iscat, collect(values(s)))
foreach(println, values(s))
println(x)
end
values(s)
simply returns s
.
The filter
leads to
MethodError: no method matching length(::StatsModels.Schema)
Closest candidates are:
length(!Matched::JSON.Parser.MemoryParserState) at C:\Users\rdboylan\.julia\packages\JSON\3rsiS\src\Parser.jl:28
length(!Matched::CompositeException) at task.jl:41
length(!Matched::LibGit2.GitBlob) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LibGit2\src\blob.jl:3
...
_similar_for(::UnitRange{Int64}, ::Type{Any}, ::StatsModels.Schema, ::Base.HasLength) at array.jl:597
_collect(::UnitRange{Int64}, ::StatsModels.Schema, ::Base.HasEltype, ::Base.HasLength) at array.jl:630
collect(::StatsModels.Schema) at array.jl:624
top-level scope at eg01.jl:12
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088
while the foreach
yields
MethodError: no method matching iterate(::StatsModels.Schema)
Closest candidates are:
iterate(!Matched::LibGit2.GitRebase) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LibGit2\src\rebase.jl:48
iterate(!Matched::LibGit2.GitRebase, !Matched::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LibGit2\src\rebase.jl:48
iterate(!Matched::CommonMark.Node) at C:\Users\rdboylan\.julia\packages\CommonMark\tT85V\src\ast.jl:52
...
foreach(::typeof(println), ::StatsModels.Schema) at abstractarray.jl:2009
top-level scope at eg01.jl:9
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088
Julia 1.5.1.
And with this addition the previous test code works:
Base.values(schema::StatsModels.Schema) = values(schema.schema)
This is like other definitions of other iterator/accessor-like functions already in StatsModels.Schema
.
Why the code used to work and no longer does remains unclear. Going back to the 0.6.5 version of Schema.jl does not result in a definition of values
that has since been removed. Perhaps there was some change in Julia’s handling of collections? The current documentation for Base.values
does say that the default implementation returns the original object, which is what was happening before the tweak above.