Data Cleaning: Split, Combine, Apply?

Greetings Everyone:

I have a DF:

Grade = ["A", "B","C","D"]
Price = rand(1:.001:500)
Name = ["John", "Jim", "Paula", "Ruby"]
Date = ["13/06/2008","17/05/2009", "21/04/2008"]

data = DataFrame(Name,Date, Grade,Price)

Excluding missing data types, I am attempting to calculate the mean price of each group with dropmissing() and by().

grPr = by(dropmissing(data), :Grade, :Price => x -> 
	AvgPrice = round(mean(x), digits=-3))

Returning the follow error:

ArgumentError: by function was removed from DataFrames.jl. Use the `combine(groupby(...), ...)` or `combine(f, groupby(...))` instead.

What adaptation would you use to convert the block above into one that includes combine(f, groupby(…)) since the by() method has been discontinued by Julia.

I applied:

grGr = groupby(dropmissing(data), [:Grade])

Then:

aggPr = combine(grGr, [:Price] .=> mean)

How would you combine these blocks? Or is it good practice to keep them separate?

Thanks,

Your example doesn’t work, possibly because rand(1:.001:500) doesn’t do what you think it does.

However, something like this works:

using DataFrames, Statistics

df = DataFrame(A=rand(100), B=rand(["a", "b", "c"], 100))
combine(groupby(df, :B), :A => mean)

@Jakob - Thank you!

How would you apply the step of 0.001?

I’m not exactly sure what you wanted to do originally, but what that line was doing doing is defining the sequence from 1 to 500 with stepsize .001 (checkout collect(1:.001:500)) and then drawing one random sample from that.

That is also not a valid way to create a DataFrame. You can either specify the names or use the keyword syntax, like so:

DataFrame(;Grade, Price)

While by was deprecated from DataFrames.jl, it lives on in the macro @by in DataFramesMeta.jl.

DataFramesMeta.jl provides a more convenient syntax for helping new users use Split-Apply-Combine concepts, as well as more basic transformations, in DataFrames. I recommend you check it out.

thank you,

using DataFramesMeta

grPr = @by(dropmissing(Data), :Grade, :Price=> x -> 
	AvgPrice = round(mean(x), digits=-3))

Returns

LoadError: ArgumentError: Malformed expression in DataFramesMeta.jl macro

Would you make any adjustments to the code block above to accomodate the @by expression?

Yes. Please read the documentation here. You want

@by(dropmissing(Data), :Grade, :AvgPrice = round(mean(:Price), digits = 3))
1 Like

Thanks for resource. Will access in the future!

Simply had to reset the kernels, everything worked well. Thank you!