That solution seems broken (d1
doesn’t do the same thing as d0
). I’m sharing a post I recently wrote below, which I think is really important, and at the same time very simple to do.
As for how to speed this up: I often find that the easiest and fastest way is to just stick to for loops instead of views and reshapes. Just remember to do @inbounds
, and to acess arrays in memory order, along columns. (Note: due to caching/data locality, you might sometimes benefit from reorganizing your data.)