I have a question regarding the correct way to store an n dimensional matrix in julia - and how one would take the mean across an axis based on conditions of a different column.
Suppose V is a vector of DataFrames consisting of entries like the following.
V[1] =
Row │ Name Time Data
│ String Day Float64
─────┼─────────────────────────────────────
1 │ A 1 day 1
2 │ A 2 days 2
3 │ A 3 days 3
4 │ A 4 days 4
V[2] =
Row │ Name Time Data
│ String Day Float64
─────┼─────────────────────────────────────
1 │ B 1 day 5
2 │ B 2 days 6
3 │ B 3 days 7
4 │ B 5 days 8
V[3] =
Row │ Name Time Data
│ String Day Float64
─────┼─────────────────────────────────────
1 │ C 1 day 9
2 │ C 3 days 10
3 │ C 6 days 11
4 │ C 7 days 12
-
Is this a good/efficient way to store this kind of data? It seems a poor choice because the “Name” column seems redundant, but I was unsure about what would be the appropriate alternative.
-
Is there an efficient way to take an average of the entries in the Data Column across the DataFrames based one the value of a separate column? So suppose I wanted the 2 day average for the Data Column across the DataFrames in V, i.e. 4. How could this be efficiently achieved?
I saw on a separate posting that if I wanted to just average the Data Columns I could do something like. V[1][ : , 3 ] + V[2][ : , 3 ] +V[3][ : , 3 ]/3
however that would not work here because of the different days involved. I wanted to average only the values where the Time column values match.
I think I could go through each DataFrame with a nested loop which but my understanding is that is not a great practice and not particularly efficient. It would probably look something like this
# 1 collect a list of all the unique days
uniquedays = [ ]
for i in 1:size(V)[1]
a = unique(V[i].Time)
uniquedays = vcat(uniquedays,a)
end
uniquedays = unique(uniquedays)
# 2 loop through each DataFrame Checking for Data on each of the unique days
# and storing the results in a DataFrame
DF = DataFrame(Time = Day[], Average = Float64[])
count = 1
while count < = size(uniquedays)[1]
data = [ ]
for i in 1:size(V)[1]
# add the datapoint of the correct day to data as a Float64
push!(data, V[i][V[i].Time.==Day(uniquedays[count]),3][1])
end
avg = mean(data)
push!( DF, [uniquedays[count], avg])
count +=1
end
I’m not sure this is the best or even a good approach. Is there was a more efficient/straightforward way to do this type of conditional averaging across DataFrames?