Who is using Dagger?

I haven’t used JuliaDB in a while (used to use Dagger via JuliaDB), but it’s nice to see that Dagger is actively maintained! Even though I’m not an active user / developer, here are my two cents.

I imagine that map, filter, reduce and groupby already cover a fair amount of usecases of distributed data processing, especially since reduce also works on grouped tables.

OTOH, and this could be typical of julia, a lot of features work out of the box and it may just be a matter of documenting that they do. I suspect that writing docs for things that just work by composability could be a simple way to “add more features for free”.

For example, I tried the following and it worked

julia> using Dagger, OnlineStats

julia> d = DTable((a = rand(100), b = rand(100)), 50);

julia> m = reduce(fit!, d, init=Variance());

julia> fetch(m)
(a = Variance: n=100 | value=0.0898072, b = Variance: n=100 | value=0.0796099)

So you can already compute summary statistics in a distributed way with one pass over the data, which is really nice but also hard to guess from the docs. I suspect this would also work with grouped data to compute grouped summary statistics.

I think Dagger could benefit with more “docs for end users” (as in tutorials and how-to-guides in the divio system), and with a clearer signaling of what docs are more “beginner-friendly” (in the current version, I’d say it’s mostly this section).

As a practical suggestion, other than the features you get from composability, things that IMO could be added to the docs are

  • a typical data-wrangling tutorial done with Dagger (that’s also a great way to see if features are missing), I’m familiar with this one but there are certainly many other options out there
  • a nice simple section for DArray that parallels the one for DTable, ensuring it has the same ease of use. For example, it was surprising to see that DTable((a = rand(100), b = rand(100)), 50) works but DArray(rand(100), 50) does not.

Hope this helps, and kudos for all the hard work on Dagger, it’s really coming along very nicely!

8 Likes