Orthogonalize.jl

Hi, using orthogonalize.jl from IterativeSolvers.jl I made some performance plots (dodgeTPSbench.png attached) where “ON” tags refers to the julia internal function orthogonalize_and_normalize!() and MGS referst to Modified Gram Schmidt process. As we can see on this plot, MGS time is twice larger than CGS (Classical Gram-Schmidt) process and bit less than the reorthogonalized one (CGS2). My question is how can we explain those good time results for ONMGS ? If we look orthogonalize.jl, we see blas-1 ops but I guess those good results come from the rank-1 update of w, since this one looks vectorized?