I proceed to gather more data of the stated kind, but at a different moment. This new data will be different at least in the timeStamp array, but may also be different elsewhere (most obviously price and totalVolume). Again, I assemble the data into matrix form:
The problem that I am trying to solve in the most efficient way possible is the following:
“How to I combine dataold and datanew to get dataall, such that dataold must not contain redundant data?”
By redundant data I mean that rows (in the matrix form [dataold datanew]) that are duplicates except in timeStamp are removed, such that only the first of such rows (i.e., the row with the smallest timeStamp) remain.
I am looking for a general strategy (what is most efficient? Do I need to assemble the matrices, or is there a much better way, for example?) in Julia.
Perhaps someone may even venture a snippet of code that does the job. That would be greatly appreciated!
This sounds like a job for leftjoin using DataFrames. Is there a reason you want to use a matrix? A data frame seems like a much more intuitive structure for this kind of thing.
+1 for the suggestion of DataFrames! Doing something like this without seems quite painful. With DataFrames at the very minimum you could just create one long DataFrame (like your dataall) and then do groupby(df,[:mktId,:stockId]) to isolate the data for each stock and then do your filter.
Can I ask you to apply the filter? In my case, the first step appears to be:
groupby(df,[:mktId,:stockId,:totalVolume,:price])
That procedure creates many groups. In a particular one of these groups, the elements mktId, stockId, totalVolume and price are the same, but the last element (timeStamps) can be different (if multiple timeStamps exists).
After the first step has been applied, I want each group to only have one timeStamp (the smallest datetime). So, each group should have only one line.
Be sure the data is sorted properly because those methods will blindly pick the first row. But yes the last solution using unique is definitely the way to go in this case if I understand correctly what you want.