Combining dataframe and csv: undefined functions in DataFrames

fshell · May 17, 2021, 4:45pm

I’m running into a number of obstacles in trying to merge a dataframe and CSV (just started working with DataFrames today). I’m benchmarking runs of an inference task. There are a number of parameters for the task and I’d like to aggregate runs in a CSV that I update after each set.

If there were two parameters A and B and for an assignment of these parameters, A=1, B=2 I did 10 runs with 3 succeeding, I would put these results in a DataFrame and write them to a CSV:

using CSV, DataFrame

df = DataFrame(A=1, B=2, Nrun=10, Nsuccess=3)
CSV.write("test.csv", df)

Now if I do another set of runs with A=1, B=2 and get 5 successes in 15 runs, I’d like to update the entry in test.csv to show Nrun=25 and Nsuccess = 8. I’d like to do this by scanning through the file rather than loading it all into memory but couldn’t find a nice way of doing this. The best way I could find was to load the previous data, push a new row to it, groupby the parameters and then sum over the runs:

df = CSV.read("test.csv", DataFrame)

newruns = Dict(:A=>1, :B=>2, :Nrun=>15, :Nsuccess=>5)
push!(df, newruns)

gdf = groupby(df, [:A, :B])
fdf = combine(gdf, valuecols(gdf) .=> sum .=>valuecols(gdf))

CSV.write("test.csv", fdf)

If I run this in a Pluto notebook I get UndefVarError: groupby not defined which seems a bit odd because I thought Julia should check the DataFrames namespace. If I change to DataFrames.groupby I get UndefVarError: valuecols not defined and adding DataFrames.valuecols does not change this.

Am I missing something? Is there a cleaner way of doing this? I’d think this should be very simple but have been fighting with it for a few hours. (I also had similar problems when trying to use Underscores to pipe these group-combine operations.)

pdeffebach · May 17, 2021, 4:55pm

What version of DataFrames are you using? It seems like valuecols is exported in 1.0 at least.

groupby has a conflict with Lazy.jl, is that package loaded? Maybe it’s causing an issue.

These are strange errors, but at the top of your code you have using DataFrame rather than using DataFrames… maybe it’s something simple like this? It all works for me.

Please try in the REPL to help us debug.

fshell · May 18, 2021, 9:18am

Ah, it’s something with the version. I installed everything on a fresh laptop 2 weeks ago so I thought it would be current but there is some problem with updating DataFrames

(@v1.6) pkg> up
    Updating registry at `~/.julia/registries/General`
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`

(@v1.6) pkg> status
      Status `~/.julia/environments/v1.6/Project.toml`
  [7f9c7709] BIGUQ v0.8.0
  [336ed68f] CSV v0.8.4
  [8f4d0f93] Conda v1.5.2
  [a93c6f00] DataFrames v0.21.8
  [31c24e10] Distributions v0.23.12
  [c91e804a] Gadfly v1.3.3
  [73787735] GraphicalModelLearning v0.2.1 `~/.julia/dev/GraphicalModelLearning`
  [7073ff75] IJulia v1.23.2
  [c8e1da08] IterTools v1.3.0
  [6f286f6a] MultivariateStats v0.8.0
  [91a5bcdd] Plots v1.15.0
  [c3e4b0f8] Pluto v0.14.5
  [438e738f] PyCall v1.92.3
  [d330b81b] PyPlot v2.9.0
  [2913bbd2] StatsBase v0.32.2
  [f3b207a7] StatsPlots v0.14.21
  [d9a01c3f] Underscores v2.0.0
  [9a3f8284] Random

Any idea of what could be stopping DataFrames from updating?

sijo · May 18, 2021, 9:28am

Try ]add DataFrames@1.1.1, it should tell you what’s holding it back.

fshell · May 18, 2021, 9:53am

For whatever reason I had to rm DataFrames first before adding it back. It was BIGUQ which is a Bayesian information gap and uncertainty quantification package that must have been a dependency of a nonnegative matrix factorization package I had installed. I removed it and DataFrames now is at current version. Thanks so much!

pdeffebach · May 18, 2021, 1:45pm

Make sure to have all your projects in their own directories with their own Project.toml files.

Topic		Replies	Views
Julia appending data to dataframe gives dataframe not defined New to Julia	1	1311	May 27, 2019
Update a value in DataFrames.jl New to Julia question , dataframes	2	1532	July 20, 2020
How do you edit a DataFrame after reading it from a CSV? Data	6	1031	March 1, 2021
Help with CSV and Dataframe Data question , package	2	869	January 26, 2021
Looking for a clean way to add a row in a DataFrame General Usage question , dataframes , csv	10	241	December 18, 2024

Combining dataframe and csv: undefined functions in DataFrames

Related topics