in this thread some people had found reduced TTFX by removing that as a dependency. So I was experimenting with that. I don’t think it made any difference.
But it did enable me to simplify because I could add a methods for op(::Vec3, ::Float64) which broadcasts the scalar op to every member of the vec. although that’s for looks not speed
I’ve also been playing with GPU code for it, and although I haven’t got it working yet, it does look promising - testing the hits for the whole scanline in a single GPU Kernel