I am new to Julia (and the forum - apologies on my formatting!) and struggling to implement complex split-apply-combine functions, in particular when these involve iterative looping across columns. I could really use some help to develop an approach that, applied to different slices of the data, iteratively constructs pair-wise operations across multiple columns. Iβll give an example below that might seem solvable with brute force / manual approaches, but in reality this will be applied to a large array and so an automated/iterative solution is very much needed.
As an example, consider these data:
test=DataFrame(x1=rand(1000), x2=rand(1000), x3=rand(1000), x4=rand(1000))
test.subject=βaβ
test[501:1000,5]=βbβ
This yields 4 columns of data for two subjects (a, and b), each with 500 rows. I found that the by, combine, and map functions allow a nice solution to apply any given function to any given column by each slice (subject) - but, Iβm trying to create a function that iterates across columns simultaneously.
Say for example that the goal is to compute some measure - difference between variable X1 and variable X2, and likewise X1 vs. X3, and X1 vs X4 - and, likewise, will want to do the same for X2 vs X1, and X2 vs, X3 and X2 vs X4 - every possible combination of pairwise differences, while avoiding duplication.
So the pseudocode would be something like:
function dostuff(x,y)
new_var=x-y
end
result=combine(df->dostuff(df.X1, df.X2), groupby(test, [:subject])
β¦but this obviously only calculates across 1 possible permutation, and Iβm confused as to how to even approach automating this in Julia - nest a function call in a For Loop? How do I iteratively cycle through each permutation of columns I need to test? How do I provide the appropriate (and changing) inputs to the function each time it is iterated, i.e. the first βdostuffβ would compare X1 and X2, but the next must compare X1 and X3 and so on until all combinations have been done, while avoiding duplication.
Any advice would be appreciated. The final result would be a dataframe that provides the (named) difference calculated in the βdostuffβ function for each particular combination. (also I picked a simple subtraction just as an example, the real application involves a more complicated calculation. So if thereβs a hardwired column differences function or something that wonβt help)
Thanks for considering!