The original code for this is a Pluto notebook, and the code in this post is supposed to be three notebook cells. The code works, and I’m pretty sure there’s a nicer syntax for the cell three, but I just can’t seem to figure out what it is.
using DataFrames, CSV, Chain, Dates, RollingFunctions
longdf = DataFrame(CSV.File(download("https://raw.githubusercontent.com/StatisticalMice/finland-covid-19/main/fact_epirapo_covid19case.csv")))
begin
df = @chain longdf begin
unstack(:Mittari, :val)
select(:Aika => :Date, "Tapausten lukumäärä" => :Cases, "Testausmäärä" => :Tests, "Kuolemantapausten lukumäärä" => :Deaths)
subset(:Date => d -> d.>= Date(2021, 8, 1))
disallowmissing()
end
df.CasesRunMean = runmean(df.Cases, 5)
df
end
2 Likes
With vanilla DataFrames.jl it would be in your chain:
transform!(:Cases => (x -> runmean(x, 5)) => :CasesRunMean)
or
@aside _.CasesRunMean = runmean(_.Cases, 5)
With e.g. DataFramesMeta.jl it would be:
@transform!(:CasesRunMean = runmean(:Cases, 5))
(not tested as your code is not reproducible, so please comment if it does not work )
and
subset(:Date => d -> d.>= Date(2021, 8, 1))
can be written as
subset(:Date => ByRow(>=(Date(2021, 8, 1)))
or in DataFramesMeta.jl
@rsubset(:Date >= (Date(2021, 8, 1))
1 Like
@StatisticalMouse, the code already looks nice (except possibly for the Finish words we do not understand…).
As a side note, there doesn’t seem to be a big payoff here in using @chain? The following plain code produces the same result:
dg = unstack(longdf,:Mittari, :val)
select!(dg, :Aika => :Date, "Tapausten lukumäärä" => :Cases, "Testausmäärä" => :Tests, "Kuolemantapausten lukumäärä" => :Deaths)
subset!(dg, :Date => d -> d.>= Date(2021, 8, 1))
dg.CasesRunMean = runmean(dg.Cases, 5)
dg == df # true
As a side note: it is more efficient as it uses select!
and subset!
which do less copying . We can safely do this as unstack
allocates a new data frame so there is no risk of mutating the source.
1 Like
Hmm, thanks, where to start replying… From the least important maybe.
The Finnish words are from the gov agency CSV. ‘Määrä’ and ‘lukumäärä’ both mean ‘number of’, and otherwise it should be a direct translation. You’re welcome.
When I started writing this yesterday evening it was like any other time I’ve wanted to do something with DataFrames, I was unable to decide which combination of packages to use.
I like using @chain
because it uses begin
and end
, and then I don’t need to add them because Pluto requires them. It just feels cleaner. Except of course in this case I needed the extra pair.
I did make an attempt to make it reproducible, but of course I’ve forgotten to post the link to the repo itself. https://github.com/StatisticalMice/finland-covid-19
I did some versions with DataFramesMeta.jl and DataFrameMacros.jl. Both seem to have benefits. In the end I wanted just the base package as I can’t choose. I’ll fix the code tomorrow, it’s late.
I settled for this version as it seems cleanest to me.
df = @chain longdf begin
unstack(:Mittari, :val)
select(:Aika => :Date, "Tapausten lukumäärä" => :Cases, "Testausmäärä" => :Tests, "Kuolemantapausten lukumäärä" => :Deaths)
@aside _.CasesRunMean = runmean(_.Cases, 5)
@aside _.TestsRunMean = runmean(_.Tests, 5)
subset(:Date => ByRow(>=(Date(2021, 8, 1))))
disallowmissing()
end
1 Like