Data Cleaning: Split, Combine, Apply?

I have a DF:

Grade = ["A", "B","C","D"]
Price = rand(1:.001:500)
Name = ["John", "Jim", "Paula", "Ruby"]
Date = ["13/06/2008","17/05/2009", "21/04/2008"]

data = DataFrame(Name,Date, Grade,Price)

Excluding missing data types, I am attempting to calculate the mean price of each group with dropmissing() and by().

grPr = by(dropmissing(data), :Grade, :Price => x -> 
	AvgPrice = round(mean(x), digits=-3))

Returning the follow error:

ArgumentError: by function was removed from DataFrames.jl. Use the `combine(groupby(...), ...)` or `combine(f, groupby(...))` instead.

What adaptation would you use to convert the block above into one that includes combine(f, groupby(…)) since the by() method has been discontinued by Julia.

I applied:

grGr = groupby(dropmissing(data), [:Grade])


aggPr = combine(grGr, [:Price] .=> mean)

How would you combine these blocks? Or is it good practice to keep them separate?


Your example doesn’t work, possibly because rand(1:.001:500) doesn’t do what you think it does.

However, something like this works:

using DataFrames, Statistics

df = DataFrame(A=rand(100), B=rand(["a", "b", "c"], 100))
combine(groupby(df, :B), :A => mean)

How would you apply the step of 0.001?

I’m not exactly sure what you wanted to do originally, but what that line was doing doing is defining the sequence from 1 to 500 with stepsize .001 (checkout collect(1:.001:500)) and then drawing one random sample from that.

That is also not a valid way to create a DataFrame. You can either specify the names or use the keyword syntax, like so:

DataFrame(;Grade, Price)

While by was deprecated from DataFrames.jl, it lives on in the macro @by in DataFramesMeta.jl.

DataFramesMeta.jl provides a more convenient syntax for helping new users use Split-Apply-Combine concepts, as well as more basic transformations, in DataFrames. I recommend you check it out.

using DataFramesMeta

grPr = @by(dropmissing(Data), :Grade, :Price=> x -> 
	AvgPrice = round(mean(x), digits=-3))


LoadError: ArgumentError: Malformed expression in DataFramesMeta.jl macro

Would you make any adjustments to the code block above to accomodate the @by expression?

Yes. Please read the documentation here. You want

@by(dropmissing(Data), :Grade, :AvgPrice = round(mean(:Price), digits = 3))
