I’m running into a number of obstacles in trying to merge a dataframe and CSV (just started working with DataFrames today). I’m benchmarking runs of an inference task. There are a number of parameters for the task and I’d like to aggregate runs in a CSV that I update after each set.
If there were two parameters
B and for an assignment of these parameters,
A=1, B=2 I did 10 runs with 3 succeeding, I would put these results in a DataFrame and write them to a CSV:
using CSV, DataFrame df = DataFrame(A=1, B=2, Nrun=10, Nsuccess=3) CSV.write("test.csv", df)
Now if I do another set of runs with
A=1, B=2 and get 5 successes in 15 runs, I’d like to update the entry in
test.csv to show
Nsuccess = 8. I’d like to do this by scanning through the file rather than loading it all into memory but couldn’t find a nice way of doing this. The best way I could find was to load the previous data, push a new row to it, groupby the parameters and then sum over the runs:
df = CSV.read("test.csv", DataFrame) newruns = Dict(:A=>1, :B=>2, :Nrun=>15, :Nsuccess=>5) push!(df, newruns) gdf = groupby(df, [:A, :B]) fdf = combine(gdf, valuecols(gdf) .=> sum .=>valuecols(gdf)) CSV.write("test.csv", fdf)
If I run this in a Pluto notebook I get
UndefVarError: groupby not defined which seems a bit odd because I thought Julia should check the DataFrames namespace. If I change to
DataFrames.groupby I get
UndefVarError: valuecols not defined and adding
DataFrames.valuecols does not change this.
Am I missing something? Is there a cleaner way of doing this? I’d think this should be very simple but have been fighting with it for a few hours. (I also had similar problems when trying to use Underscores to pipe these group-combine operations.)