The original code for this is a Pluto notebook, and the code in this post is supposed to be three notebook cells. The code works, and I’m pretty sure there’s a nicer syntax for the cell three, but I just can’t seem to figure out what it is.
using DataFrames, CSV, Chain, Dates, RollingFunctions
longdf = DataFrame(CSV.File(download("https://raw.githubusercontent.com/StatisticalMice/finland-covid-19/main/fact_epirapo_covid19case.csv")))
begin
df = @chain longdf begin
unstack(:Mittari, :val)
select(:Aika => :Date, "Tapausten lukumäärä" => :Cases, "Testausmäärä" => :Tests, "Kuolemantapausten lukumäärä" => :Deaths)
subset(:Date => d -> d.>= Date(2021, 8, 1))
disallowmissing()
end
df.CasesRunMean = runmean(df.Cases, 5)
df
end
2 Likes
With vanilla DataFrames.jl it would be in your chain:
transform!(:Cases => (x -> runmean(x, 5)) => :CasesRunMean)
or
@aside _.CasesRunMean = runmean(_.Cases, 5)
With e.g. DataFramesMeta.jl it would be:
@transform!(:CasesRunMean = runmean(:Cases, 5))
(not tested as your code is not reproducible, so please comment if it does not work
)
and
subset(:Date => d -> d.>= Date(2021, 8, 1))
can be written as
subset(:Date => ByRow(>=(Date(2021, 8, 1)))
or in DataFramesMeta.jl
@rsubset(:Date >= (Date(2021, 8, 1))
1 Like
@StatisticalMouse, the code already looks nice (except possibly for the Finish words we do not understand…).
As a side note, there doesn’t seem to be a big payoff here in using @chain? The following plain code produces the same result:
dg = unstack(longdf,:Mittari, :val)
select!(dg, :Aika => :Date, "Tapausten lukumäärä" => :Cases, "Testausmäärä" => :Tests, "Kuolemantapausten lukumäärä" => :Deaths)
subset!(dg, :Date => d -> d.>= Date(2021, 8, 1))
dg.CasesRunMean = runmean(dg.Cases, 5)
dg == df # true
As a side note: it is more efficient as it uses select!
and subset!
which do less copying
. We can safely do this as unstack
allocates a new data frame so there is no risk of mutating the source.
1 Like
Hmm, thanks, where to start replying… From the least important maybe. data:image/s3,"s3://crabby-images/1294d/1294dfc2a7519552610d1355d94d735a88e23cfb" alt=":smile: :smile:"
The Finnish words are from the gov agency CSV. ‘Määrä’ and ‘lukumäärä’ both mean ‘number of’, and otherwise it should be a direct translation. You’re welcome. data:image/s3,"s3://crabby-images/fc048/fc048089868f411b8e8d8c2eb5d230e57948b215" alt=":smiley: :smiley:"
When I started writing this yesterday evening it was like any other time I’ve wanted to do something with DataFrames, I was unable to decide which combination of packages to use.
I like using @chain
because it uses begin
and end
, and then I don’t need to add them because Pluto requires them. It just feels cleaner. Except of course in this case I needed the extra pair.
I did make an attempt to make it reproducible, but of course I’ve forgotten to post the link to the repo itself. https://github.com/StatisticalMice/finland-covid-19
I did some versions with DataFramesMeta.jl and DataFrameMacros.jl. Both seem to have benefits. In the end I wanted just the base package as I can’t choose. I’ll fix the code tomorrow, it’s late.
I settled for this version as it seems cleanest to me.
df = @chain longdf begin
unstack(:Mittari, :val)
select(:Aika => :Date, "Tapausten lukumäärä" => :Cases, "Testausmäärä" => :Tests, "Kuolemantapausten lukumäärä" => :Deaths)
@aside _.CasesRunMean = runmean(_.Cases, 5)
@aside _.TestsRunMean = runmean(_.Tests, 5)
subset(:Date => ByRow(>=(Date(2021, 8, 1))))
disallowmissing()
end
1 Like