I’ve been using JuliaDB and absolutely loving it . It helped me restructure my data and I’m now able to process my data in about 40 LOC. It all makes sense.
In my application, I’m
joining and grouping sources together to distill a table that contains all the data needed for the final step. This final step is costly (each iteration – each row – takes a few seconds). What I’m missing though, is some sort of “piping”:
I’m executing this costly step on each row with a
groupby (so it groups the data and then applies the step). The result of which is the final product (the
groupby returns a table when it’s done, not per row). But since I have hundreds of rows and each row is slow, stopping the process midway causes all the data to get lost (why stop the process one might ask, good question). What would be good is some
DataFrames.groupby that iterates over the groups and has some side-effect (like saving or piping it to a sink). As I’m writing this, I figure I could just create a grouped table (that’s of course very fast), and then in a for-loop iterate over the rows, saving the results as I go. Yea, that’s basically the same.
OK, I’ll post this just in case someone has a better suggestion. Sorry for the somewhat vague post