How to calcule the mean of values considering their tuples of another value:

Juan_Mac_Donagh · November 9, 2022, 11:40am

Hi all, I have an issue that I can’t figure out how to solve it:

I have a df that stores two columns that look like this:

df.scores1 = [[1,2,2,3,5,6,1,2,9,2,1,6,4,2]]

df.normalized_len = [[0,0,0.1,0.1,0.2,0.3,0.4,0.5,0.5,0.6,0.7,0,8,0.9,1]]

And this goes on for many rows, all of them having the same length.

What I am trying to do is get the mean values of df.scores1 that have the same value of df.normalized_len, so the result should look like this:

df.mean_val_norm = [[1.5,2.5,5,6,1,5.5,2,1,6,4,2]]

Any help is welcome!

Thanks a lot,
Juan

nilshg · November 9, 2022, 11:44am

I don’t understand the question - could you explain how the values in your desired output mean_val_norm are derived?

JorizovdZ · November 9, 2022, 12:07pm

You can also use the DataFrames documentation to form a DataFrame that is grouped with these values:

x = [1,2,2,3,5,6,1,2,9,2,1,6,4,2]
y = [0,0,0.1,0.1,0.2,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1]
df = DataFrame(scores1=x, normalized_len=y)
gb=groupby(df, :normalized_len)
println(combine(gb, :scores1 => mean))

Output:

11×2 DataFrame
 Row │ normalized_len  scores1_mean 
     │ Float64         Float64      
─────┼──────────────────────────────
   1 │            0.0           1.5
   2 │            0.1           2.5
   3 │            0.2           5.0
   4 │            0.3           6.0
   5 │            0.4           1.0
   6 │            0.5           5.5
   7 │            0.6           2.0
   8 │            0.7           1.0
   9 │            0.8           6.0
  10 │            0.9           4.0

Juan_Mac_Donagh · November 9, 2022, 12:54pm

Sorry, I just realized my post was kinda vague.

I want to get the mean values of the items in df.score1 that share the same value in df.normalized_len. So, for all the values that a normalized_len of 0 (the first two ones) would get summed and divided by two (because there are only two values that have that normalized len), and so on. Is it clearer now?

Juan_Mac_Donagh · November 14, 2022, 11:16am

Hi, someone posted the correct answer but then deleted it! Just in case anyone has the same issue, this was it:

m = [mean(x[findall(==(u), y)]) for u in unique(y)]

In the end I used @rtransform like this and it worked great:

df_t = @rtransform df_t1 :mean_pos_rep = begin ## 
           [mean(:sum_total[findall(==(u), :norm_length)]) for u in unique(:norm_length)]##
end;

Cheers,
Juan

rocco_sprmnt21 · November 14, 2022, 11:59am

But isn’t the correct answer the one proposed by @JorizovdZ?

This isn’t bad either, but maybe it’s not the first that comes to mind.

Juan_Mac_Donagh · November 14, 2022, 12:02pm

It works too, but the result that I intended was the one that got deleted. But, as the answer is still posted, I’ll mark it as the correcto solution.

Topic		Replies	Views
How to average column values in a dataframe based on multiple other matching columns? General Usage question	2	1043	February 22, 2023
Creating new dataframe column! General Usage question , dataframes	3	295	April 21, 2021
Normalizing DataFrame column by group Data	5	864	October 30, 2022
Functional style table processing General Usage question	7	496	September 22, 2018
How to take the mean of entries across an array of DataFrames conditional upon the value of a separate column? General Usage statistics , dataframes	5	1310	May 11, 2022

How to calcule the mean of values considering their tuples of another value:

Related topics