I’m trying to do one quadratic fit per subdataframe on a grouped dataframe (using two of it’s columns obviously), but i can’t use map() to do it because it’s reserved, and doing a list comprehension throws an error because i’m working with views. How should i do this?
it would be easier if you post a semi-runnable snippet of what you’re trying to do
I thought that would be too difficult for some reason… heh
Working on a MWE right now
using DataFrames
using EasyFit
data = DataFrame(throw = repeat(1:5, inner=10), t = repeat(1:10, 5)), x = repeat((1:10).^2, 5)
data_gdf = groupby(data, :throw)
fits = map(data_gdf) do sdf
time, distance = sdf[!, :t], sdf[!, :x]
fitquad(time, distance)
end
Output:
ArgumentError: using map over `GroupedDataFrame`s is reserved
DataFrames no longer allows map over GroupedDataFrames. Also, the data in the MWE wasn’t usable (ERROR: Could not obtain any successful fit, probably the data is not well posed).
So, use a comprehension.
using DataFrames, EasyFit, Random
Random.seed!(1) # for reproducibility
a = repeat(1:5, inner=20) # 5 groups, 20 points each
t = repeat(range(0, 10; length=20), 5) # same t grid per group
# true quadratic y = 2t^2 - 3t + 1 plus small noise
x = 2 .* t.^2 .- 3 .* t .+ 1 .+ 0.1 .* randn(length(t))
df = DataFrame(a = a, t = t, x = x)
data_gdf = groupby(df, :a)
fits = [fitquad(sdf.t, sdf.x) for sdf in data_gdf]
summary_fits = [(a = fit.a, b = fit.b, c = fit.c) for fit in fits]
5-element Vector{@NamedTuple{a::Float64, b::Float64, c::Float64}}:
(a = 2.000993698655384, b = -3.016720982558867, c = 1.0089476616480098)
(a = 1.9991770713827897, b = -2.994078426802373, c = 0.9818210102989037)
(a = 1.9994283879614307, b = -2.998494794197106, c = 1.0155535104300808)
(a = 2.0023161744639304, b = -3.029790779480457, c = 1.095258850137451)
(a = 2.002728192866932, b = -3.0281152790428805, c = 1.0897536120125266)
I think map supports pairs, so things like this should work:
julia> fits = map(pairs(data_gdf)) do (k, sdf)
fitquad(sdf.t, sdf.x)
end;
julia> summary_fits = [(a = fit.a, b = fit.b, c = fit.c) for fit in fits]
I’d probs just go for it in one shot with either base DataFrames.jl:
julia> summary_fits = combine(groupby(df, :a)) do sdf
fits = fitquad(sdf.t, sdf.x)
(fits_a = fits.a, fits_b = fits.b, fits_c = fits.c)
end
5×4 DataFrame
Row │ a fits_a fits_b fits_c
│ Int64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 1 2.00099 -3.01672 1.00895
2 │ 2 1.99918 -2.99408 0.981821
3 │ 3 1.99943 -2.99849 1.01555
4 │ 4 2.00232 -3.02979 1.09526
5 │ 5 2.00273 -3.02812 1.08975
or DataFramesMeta.jl though:
julia> summary_fits = @by df :a @astable begin
fits = fitquad(:t, :x)
:fits_a = fits.a
:fits_b = fits.b
:fits_c = fits.c
end
5×4 DataFrame
Row │ a fits_a fits_b fits_c
│ Int64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 1 2.00099 -3.01672 1.00895
2 │ 2 1.99918 -2.99408 0.981821
3 │ 3 1.99943 -2.99849 1.01555
4 │ 4 2.00232 -3.02979 1.09526
5 │ 5 2.00273 -3.02812 1.08975
I would help but I get an error with easyfit
julia> fits = [fitquad(sdf.t, sdf.x) for sdf in data_gdf]
5-element Vector{EasyFit.Quadratic{Float64, Float64, Float64, Float64, Float64}}:
Error showing value of type Vector{EasyFit.Quadratic{Float64, Float64, Float64, Float64, Float64}}:
SYSTEM (REPL): showing an error caused an error
ERROR: 1-element ExceptionStack:
UndefVarError: `f` not defined in `EasyFit`
Suggestion: check for spelling errors or missing imports.
MethodError: no method matching fitquadratic(::SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int64}}, false}, ::SubArray{Union{Missing, Float64}, 1, Vector{Union{Missing, Float64}}, Tuple{Vector{Int64}}, false})
The function `fitquadratic` exists, but no method is defined for this combination of argument types.
i think this error has to do with the dataframe display. because if you unnest the column or use ; to suppress the output the error should go away
UndefVarError: `f` not defined in `EasyFit`
in addition the the many solutions above, you could use TidierData.jl (the main branch at this time, still unreleased) and @unnest_wider
julia> @chain df begin
@group_by(a)
@summarize(model = fitquad(t, x))
@unnest_wider(model)
end
5×9 DataFrame
Row │ a model_a model_b model_c model_R2 model_x ⋯
│ Int64 Float64 Float64 Float64 Float64 Array… ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────
1 │ 1 2.00099 -3.01672 1.00895 0.999996 [0.0, 0.10101, 0.20202, 0.30303, ⋯
2 │ 2 1.99918 -2.99408 0.981821 0.999996 [0.0, 0.10101, 0.20202, 0.30303,
3 │ 3 1.99943 -2.99849 1.01555 0.999998 [0.0, 0.10101, 0.20202, 0.30303,
4 │ 4 2.00232 -3.02979 1.09526 0.999997 [0.0, 0.10101, 0.20202, 0.30303,
5 │ 5 2.00273 -3.02812 1.08975 0.999998 [0.0, 0.10101, 0.20202, 0.30303, ⋯
4 columns omitted
This is because EasyFit apparently does not support arrays that allow missing values, even if they do not contain any, due to type parameter constraints. The problem is not that it’s a view, as you’d said in the top post. You can do dropmissing! to remove any missing values before passing the data to fitquad, or just disallowmissing! if you don’t already have missing values.
For the sake of a reproducible example, adding an allowmissing!(df, :t, :x) to a similar setup to what @technocrat provided above:
julia> disallowmissing!(df);
julia> combine(groupby(df, :throw), [:t, :x] => function (t, x)
fit = fitquad(t, x)
return (; fit.a, fit.b, fit.c)
end => AsTable)
5×4 DataFrame
Row │ throw a b c
│ Int64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 1 2.00099 -3.01672 1.00895
2 │ 2 1.99918 -2.99408 0.981821
3 │ 3 1.99943 -2.99849 1.01555
4 │ 4 2.00232 -3.02979 1.09526
5 │ 5 2.00273 -3.02812 1.08975