How to calcule the mean of values considering their tuples of another value:

Hi all, I have an issue that I can’t figure out how to solve it:

I have a df that stores two columns that look like this:

df.scores1 = [[1,2,2,3,5,6,1,2,9,2,1,6,4,2]]

df.normalized_len = [[0,0,0.1,0.1,0.2,0.3,0.4,0.5,0.5,0.6,0.7,0,8,0.9,1]]

And this goes on for many rows, all of them having the same length.

What I am trying to do is get the mean values of df.scores1 that have the same value of df.normalized_len, so the result should look like this:

df.mean_val_norm = [[1.5,2.5,5,6,1,5.5,2,1,6,4,2]]

Any help is welcome!

Thanks a lot,
Juan

I don’t understand the question - could you explain how the values in your desired output mean_val_norm are derived?

You can also use the DataFrames documentation to form a DataFrame that is grouped with these values:

x = [1,2,2,3,5,6,1,2,9,2,1,6,4,2]
y = [0,0,0.1,0.1,0.2,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1]
df = DataFrame(scores1=x, normalized_len=y)
gb=groupby(df, :normalized_len)
println(combine(gb, :scores1 => mean))

Output:

11Γ—2 DataFrame
 Row β”‚ normalized_len  scores1_mean 
     β”‚ Float64         Float64      
─────┼──────────────────────────────
   1 β”‚            0.0           1.5
   2 β”‚            0.1           2.5
   3 β”‚            0.2           5.0
   4 β”‚            0.3           6.0
   5 β”‚            0.4           1.0
   6 β”‚            0.5           5.5
   7 β”‚            0.6           2.0
   8 β”‚            0.7           1.0
   9 β”‚            0.8           6.0
  10 β”‚            0.9           4.0
2 Likes

Sorry, I just realized my post was kinda vague.

I want to get the mean values of the items in df.score1 that share the same value in df.normalized_len. So, for all the values that a normalized_len of 0 (the first two ones) would get summed and divided by two (because there are only two values that have that normalized len), and so on. Is it clearer now?

Hi, someone posted the correct answer but then deleted it! Just in case anyone has the same issue, this was it:

m = [mean(x[findall(==(u), y)]) for u in unique(y)]

In the end I used @rtransform like this and it worked great:

df_t = @rtransform df_t1 :mean_pos_rep = begin ## 
           [mean(:sum_total[findall(==(u), :norm_length)]) for u in unique(:norm_length)]##
end;

Cheers,
Juan

But isn’t the correct answer the one proposed by @JorizovdZ?

This isn’t bad either, but maybe it’s not the first that comes to mind.

It works too, but the result that I intended was the one that got deleted. But, as the answer is still posted, I’ll mark it as the correcto solution.