Just for the record,
f6(A, B) = map(LinearAlgebra.dot, eachcol(A), Iterators.repeated(B), eachcol(A))
also exists, and uses the dot(v, B, v) notation, which specifically computes this needed form. It is slower than suggested f4 but uses the same amount of memory.
The benchmark here is for specific n and k and the results may depend on those, so @e3c6 , if there are specific values you need, it would be helpful to know (and if any of the matrices are sparse would also be interesting).