I have a question regarding the correct way to store an n dimensional matrix in julia - and how one would take the mean across an axis based on conditions of a different column.
Suppose V is a vector of DataFrames consisting of entries like the following.
V[1] =
Row │ Name Time Data
│ String Day Float64
1 │ A 1 day 1
2 │ A 2 days 2
3 │ A 3 days 3
4 │ A 4 days 4
V[2] =
Row │ Name Time Data
│ String Day Float64
1 │ B 1 day 5
2 │ B 2 days 6
3 │ B 3 days 7
4 │ B 5 days 8
V[3] =
Row │ Name Time Data
│ String Day Float64
1 │ C 1 day 9
2 │ C 3 days 10
3 │ C 6 days 11
4 │ C 7 days 12
Is this a good/efficient way to store this kind of data? It seems a poor choice because the “Name” column seems redundant, but I was unsure about what would be the appropriate alternative.
Is there an efficient way to take an average of the entries in the Data Column across the DataFrames based one the value of a separate column? So suppose I wanted the 2 day average for the Data Column across the DataFrames in V, i.e. 4. How could this be efficiently achieved?
I saw on a separate posting that if I wanted to just average the Data Columns I could do something like. V[1][ : , 3 ] + V[2][ : , 3 ] +V[3][ : , 3 ]/3
however that would not work here because of the different days involved. I wanted to average only the values where the Time column values match.
I think I could go through each DataFrame with a nested loop which but my understanding is that is not a great practice and not particularly efficient. It would probably look something like this
# 1 collect a list of all the unique days
uniquedays = [ ]
for i in 1:size(V)[1]
a = unique(V[i].Time)
uniquedays = vcat(uniquedays,a)
uniquedays = unique(uniquedays)
# 2 loop through each DataFrame Checking for Data on each of the unique days
# and storing the results in a DataFrame
DF = DataFrame(Time = Day[], Average = Float64[])
count = 1
while count < = size(uniquedays)[1]
data = [ ]
for i in 1:size(V)[1]
# add the datapoint of the correct day to data as a Float64
push!(data, V[i][V[i].Time.==Day(uniquedays[count]),3][1])
avg = mean(data)
push!( DF, [uniquedays[count], avg])
count +=1
I’m not sure this is the best or even a good approach. Is there was a more efficient/straightforward way to do this type of conditional averaging across DataFrames?