Something I used a lot when I am in the R/Tidyverse world is spread and gather from tidyr most of which I can replicate using stack and unstack in DataFrames.jl. Within Query.jl is there a way to accomplish this? If not then maybe I can help add in the functionality but wanted to checkfirst since I know David has a lot of vision and workflows in mind.
I can certainly collect into a DataFame and then call the stack function but that breaks up my pipes, it would be awesome to use alongside @rename, @filter, etc.
We added a gather implementation here and were about to release it publicly in the API etc., and then I saw this and some video of Hadley where he made a convincing point that both the stack and gather attempts in tidyr had some serious problems that were really only solved in tidyr 1.0. So now I’m thinking we should copy the design of pivot_wider and pivot_longer right away, and skip the entire gather and stack story. But I haven’t made any progress on that
Help with this would certainly be most welcome! I do think there is going to be a fair bit of design discussion necessary first, and just a heads up that I might be a bit of a bottleneck there, in particular in the next weeks (semester is starting). But if you want to help and tackle this, I’d certainly try to support you and would very much welcome that!
Thank you for the response and that link. I know Hadley has thought a lot about these type of problems and API designs over the years so I would also think that understanding his logic and what the previous problems were in tidyr/reshape2 would lead to a better implementation here.
I have been using Julia more and more lately and have always wanted to contribute/help so this seems like a good place to start. I will carve out some time this weekend to at least kick off the design discussion. Is the best way to kick it off with a github issue on Query.jl?
Yes, a github issues is probably best. There is a video on youtube with Hadley somewhere where he discusses the problem with the old design a fair bit, but I can’t find it right now… There also might have been a blog post. I think it boiled down to “even I can never remember how to use these commands, even though I wrote them”
Yes I know what video you are referring to, I will reference it and take a look at the existing gather code you linked and start to spec out a query version. Got to start somewhere! This is a good way for me to get familiar with the inner workings of the queryverse so I’m excited.
I kicked things off with what become a long issue post but its mainly because there are so many previous implementations of this to learn from, I agree there should be a healthy amount of design discussion first.