There’s also a ZevenbergenThorne implementation in Geomorphometry (I’m one of the authors), but I’m not sure what you’re looking for? Is it performance or more/better algorithms, or anything else?
In terms of speed, the current implementations should be fast enough for most purposes (running in 0.05s for a raster with 1M cells), and there are indeed plans to use Stencils to make it even faster.
There’s a gdaldem executable in GDAL_jll (but not with different algorithms, and certainly not faster). I’m not aware of other native Julia libraries.