I’m running into a number of obstacles in trying to merge a dataframe and CSV (just started working with DataFrames today). I’m benchmarking runs of an inference task. There are a number of parameters for the task and I’d like to aggregate runs in a CSV that I update after each set.
If there were two parameters A
and B
and for an assignment of these parameters, A=1, B=2
I did 10 runs with 3 succeeding, I would put these results in a DataFrame and write them to a CSV:
using CSV, DataFrame
df = DataFrame(A=1, B=2, Nrun=10, Nsuccess=3)
CSV.write("test.csv", df)
Now if I do another set of runs with A=1, B=2
and get 5 successes in 15 runs, I’d like to update the entry in test.csv
to show Nrun=25
and Nsuccess = 8
. I’d like to do this by scanning through the file rather than loading it all into memory but couldn’t find a nice way of doing this. The best way I could find was to load the previous data, push a new row to it, groupby the parameters and then sum over the runs:
df = CSV.read("test.csv", DataFrame)
newruns = Dict(:A=>1, :B=>2, :Nrun=>15, :Nsuccess=>5)
push!(df, newruns)
gdf = groupby(df, [:A, :B])
fdf = combine(gdf, valuecols(gdf) .=> sum .=>valuecols(gdf))
CSV.write("test.csv", fdf)
If I run this in a Pluto notebook I get UndefVarError: groupby not defined
which seems a bit odd because I thought Julia should check the DataFrames namespace. If I change to DataFrames.groupby
I get UndefVarError: valuecols not defined
and adding DataFrames.valuecols
does not change this.
Am I missing something? Is there a cleaner way of doing this? I’d think this should be very simple but have been fighting with it for a few hours. (I also had similar problems when trying to use Underscores to pipe these group-combine operations.)