No it is definitely not a problem with Julia’s array implementation.
Can you post an MWE? It’s not clear to me when it works and when it doesn’t. Is the issue when you add groups?
No it is definitely not a problem with Julia’s array implementation.
Can you post an MWE? It’s not clear to me when it works and when it doesn’t. Is the issue when you add groups?
That’s been my issue, i cant replicate this issue on random df created within julia.
Aside from this silly example below - which I get why it fails, kinda, as I am asking it to pass a window size of 2 to a new vector created by grouping “i” but those are only going to be a vector of 1, since i made i = 1:100, so each group will only have one.
bigDf = DataFrame(i = 1:100, Growth = rand(100), Categories = rand(["a", "b", "c", "d", "f", "g"], 100))
TimedDfTest = @linq bigDf |>
groupby(:i) |>
transform(Trailing12 = running(prod, (:Growth .+1), 2).-1)
in my actual data, my df runs through pretty similar above code, works fine. I add more data to the df, by simply including more IDs by which the code groups it. NO CODE CHANGE. And then it starts having bad window span errors, even though it’s certainly not being passed smaller vectors than the window sizes. Which has left me very puzzled.
(I know it’s not the new ID added to the df that causes the issue, as I’ve just made a df of that ID, and bunch others, basically a very small df row size wise, and the same identical code cranks through it without Bad Window span errors)
Thus starting to wonder if it’s something odd with Julia arrays themselves.
end up with this error message below which i havent been able to make sense of:
Stacktrace:
[1] _combine(gd::GroupedDataFrame{DataFrame}, cs_norm::Vector{Any}, optional_transform::Vector{Bool}, copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\splitapplycombine.jl:601
Stacktrace:
[1] wait
@ .\task.jl:317 [inlined]
[2] _combine(gd::GroupedDataFrame{DataFrame}, cs_norm::Vector{Any}, optional_transform::Vector{Bool}, copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\splitapplycombine.jl:597
nested task error:
Bad window span (12) for length 10.
Stacktrace:
[1] nrolled
@ ~\.julia\packages\RollingFunctions\4Jh9c\src\support.jl:20 [inlined]
[2] running(fun::Function, data::Vector{Float64}, windowspan::Int64)
@ RollingFunctions ~\.julia\packages\RollingFunctions\4Jh9c\src\run\running.jl:8
[3] (::var"#13#15")(261::SubArray{Union{Missing, Float64}, 1, Vector{Union{Missing, Float64}}, Tuple{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}, false})
@ Main ~\.julia\packages\DataFramesMeta\mHJrB\src\parsing.jl:200
[4] do_call(f::var"#13#15", idx::Vector{Int64}, starts::Vector{Int64}, ends::Vector{Int64}, gd::GroupedDataFrame{DataFrame}, incols::Tuple{Vector{Union{Missing, Float64}}}, i::Int64)
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\callprocessing.jl:94
[5] _combine_tables_with_first!(first::NamedTuple{(:x1,), Tuple{Vector{Float64}}}, outcols::Tuple{Vector{Float64}}, idx::Vector{Int64}, rowstart::Int64, colstart::Int64, f::Function, gd::GroupedDataFrame{DataFrame}, incols::Tuple{Vector{Union{Missing, Float64}}}, colnames::Tuple{Symbol}, firstmulticol::DataFrames.FirstSingleCol)
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\complextransforms.jl:376
[6] _combine_with_first(::Base.RefValue{Any}, ::Base.RefValue{Any}, gd::GroupedDataFrame{DataFrame}, ::Base.RefValue{Any}, firstmulticol::Bool, idx_agg::Vector{Int64})
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\complextransforms.jl:69
[7] _combine_process_pair_symbol(optional_i::Bool, gd::GroupedDataFrame{DataFrame}, seen_cols::Dict{Symbol, Tuple{Bool, Int64}}, trans_res::Vector{DataFrames.TransformationResult}, idx_agg::Base.RefValue{Vector{Int64}}, out_col_name::Symbol, firstmulticol::Bool, ::Base.RefValue{Any}, wfun::Base.RefValue{Any}, wincols::Base.RefValue{Any})
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\splitapplycombine.jl:357
[8] _combine_process_pair(::Base.RefValue{Any}, optional_i::Bool, parentdf::DataFrame, gd::GroupedDataFrame{DataFrame}, seen_cols::Dict{Symbol, Tuple{Bool, Int64}}, trans_res::Vector{DataFrames.TransformationResult}, idx_agg::Base.RefValue{Vector{Int64}})
@ DataFrames ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\splitapplycombine.jl:498
[9] macro expansion
@ ~\.julia\packages\DataFrames\vQokV\src\groupeddataframe\splitapplycombine.jl:589 [inlined]
[10] (::DataFrames.var"#614#620"{GroupedDataFrame{DataFrame}, Bool, Bool, DataFrame, Dict{Symbol, Tuple{Bool, Int64}}, Vector{DataFrames.TransformationResult}, Base.RefValue{Vector{Int64}}, Bool, Pair{Int64, Pair{var"#13#15", Symbol}}})()
@ DataFrames .\threadingconstructs.jl:169
The error is a lot scarier than it needs to be because DataFrames does some multi-threading during the combine
call. This makes for weird errors but is very unlikely to be the source of your issue.
Can you do
@chain df begin
groupby(group_var)
combine(nrow)
describe
end
and post the results?
hmmm interesting… it does look like something sneaks in that has a row count of 10… well at least the arrays arent crazy…i am.
df that works:
2×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Float64 Signed Float64 Signed Int64 Type
─────┼──────────────────────────────────────────────────────────────────────────────────────
1 │ xxxxxID 6.29382e5 600339 6.33712e5 634550 0 Union{Missing, Int32}
2 │ nrow 58.9706 50 60.0 60 0 Int64
df that fails:
2×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Float64 Signed Float64 Signed Int64 Type
─────┼────────────────────────────────────────────────────────────────────────────────────
1 │ xxxxxID 6.29566e5 600339 633750.0 634628 0 Union{Missing, Int32}
2 │ nrow 58.539 10 60.0 60 0 Int64
guess that leads to next question - can you use @chain count functions to mass remove anything that counts below window span in the transform function, all in the same linq call?
yeah!
@chain df begin
groupby(:g)
@transform :t = length(:g)
@subset :t .> 10
end