I need to see what i am doing wrong with the query you suggested as at the moment it does not give me the wanted results.
maybe because the following result is comming from the query:
A, B, C, O
1, 5 ,9, 20
1, 5 ,9, 20
1, 5 ,9, 20
1, 5, 9, 21
and on a big dataset the lines might still be many. In fact only the last 2 rows would be required. the first 2 are redundant.
Perhaps a bit of background helps.
I have a model for which A, B, C represent the inputs and O is the output of the model. The model specification says that if the same imput pattern is presented, there should be only one outcome on the output.
However due to design “bugs” of the model, the model might have hidden states for which depending from the sequence of the input pattern the output might really be different.
Simulation is used to generate the dataset and clearly I do not design for the hidden states but they are there. Such query should help to find out what are those combination of the input pattern which lead to different output.
The output should be automatically a table with rows forming a unique set (I assume). I believe is an interesting application case.
Hm, tricky… I guess the right solution here would be that per group that satisfies the length>1 condition, we want to get the unique rows, where unique is based on just looking at the O column. There are multiple difficulties with that at the moment First, ideally we would have a unique function with a by keyword. But, unfortunately, right now, we don’t… And then I need to think hard how one could combine that with the way I’m handling groups in Query.jl… It might just work or not, I’m not sure right now.