I am trying to make a graph of over 200 emergence curves from a dataset that is structured below.
Row │ Farm Variety Container ED Wasp
│ String String String Float64 Int16
─────┼────────────────────────────────────────────
1 │ Talsma Aurora A1-1 15.375 1
2 │ Talsma Aurora A1-1 15.5417 1
3 │ Talsma Aurora A1-1 15.5417 1
4 │ Talsma Aurora A1-1 16.5417 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
275 │ Talsma Aurora A9-5 16.5417 1
276 │ Talsma Aurora A9-5 17.5417 1
277 │ Talsma Aurora A9-5 17.5417 2
278 │ Talsma Aurora A9-5 18.5417 1
ED is the elapsed days from setting them out.
So what I need to do is calculate the percentage of the total wasps (small non-stinging) from each Container that has emerged at the time points
I have imported the data into a DataFrame and have used
varDaySum=combine(varmean,:VarietyTotal=>sum)
Which results in a DataFrame that looks like this
48×2 DataFrame
Row │ Container Wasp_sum
│ String Int64
─────┼─────────────────────
1 │ A1-1 55
2 │ A1-2 21
3 │ A1-3 8
4 │ A1-4 4
⋮ │ ⋮ ⋮
45 │ A9-2 11
46 │ A9-3 1
47 │ A9-4 11
48 │ A9-5 5
Where varmean is a grouped DataFrame created from
varmean=groupby(dataset,:ED)
What I cannot figure out is how to calculate the cumlative percentages at each time point.
For example for the Container A1-1
Row │ ED Wasp Container
│ Float64 Int16 String
─────┼───────────────────────────
1 │ 15.375 1 A1-1
2 │ 15.5417 1 A1-1
3 │ 15.5417 1 A1-1
4 │ 16.5417 2 A1-1
5 │ 16.5417 38 A1-1
6 │ 16.875 3 A1-1
7 │ 16.875 3 A1-1
8 │ 17.5417 2 A1-1
9 │ 17.7083 1 A1-1
10 │ 18.375 1 A1-1
11 │ 19.5417 1 A1-1
12 │ 21.875 1 A1-1
The series would be as follows
Time point Cumlative count Percent
15.375 1 1/55=0.0181818
15.5417 3 3/55=0.0545454
16.5417 43 44/55=0.80000
etc
There must be an elegant coding solution and not a long series of for next loops building a DataFrame one column at time.
Mike Sergeant