# Data Cleaning: Split, Combine, Apply?

Greetings Everyone:

I have a DF:

``````Grade = ["A", "B","C","D"]
Price = rand(1:.001:500)
Name = ["John", "Jim", "Paula", "Ruby"]
Date = ["13/06/2008","17/05/2009", "21/04/2008"]

``````

Excluding missing data types, I am attempting to calculate the mean price of each group with dropmissing() and by().

``````grPr = by(dropmissing(data), :Grade, :Price => x ->
AvgPrice = round(mean(x), digits=-3))
``````

``````ArgumentError: by function was removed from DataFrames.jl. Use the `combine(groupby(...), ...)` or `combine(f, groupby(...))` instead.
``````

What adaptation would you use to convert the block above into one that includes combine(f, groupby(…)) since the by() method has been discontinued by Julia.

I applied:

``````grGr = groupby(dropmissing(data), [:Grade])
``````

Then:

``````aggPr = combine(grGr, [:Price] .=> mean)
``````

How would you combine these blocks? Or is it good practice to keep them separate?

Thanks,

Your example doesn’t work, possibly because `rand(1:.001:500)` doesn’t do what you think it does.

However, something like this works:

``````using DataFrames, Statistics

df = DataFrame(A=rand(100), B=rand(["a", "b", "c"], 100))
combine(groupby(df, :B), :A => mean)
``````

@Jakob - Thank you!

How would you apply the step of 0.001?

I’m not exactly sure what you wanted to do originally, but what that line was doing doing is defining the sequence from 1 to 500 with stepsize .001 (checkout `collect(1:.001:500)`) and then drawing one random sample from that.

That is also not a valid way to create a `DataFrame`. You can either specify the names or use the keyword syntax, like so:

``````DataFrame(;Grade, Price)
``````

While `by` was deprecated from DataFrames.jl, it lives on in the macro `@by` in DataFramesMeta.jl.

DataFramesMeta.jl provides a more convenient syntax for helping new users use Split-Apply-Combine concepts, as well as more basic transformations, in DataFrames. I recommend you check it out.

thank you,

``````using DataFramesMeta

grPr = @by(dropmissing(Data), :Grade, :Price=> x ->
AvgPrice = round(mean(x), digits=-3))
``````

Returns

``````LoadError: ArgumentError: Malformed expression in DataFramesMeta.jl macro

``````

Would you make any adjustments to the code block above to accomodate the @by expression?

``````@by(dropmissing(Data), :Grade, :AvgPrice = round(mean(:Price), digits = 3))