We can write an optimized BLAS library in pure Julia (please skip OP and jump to post 4)

Fantastic job !
I think this effort is likely to drive the HPC community to consider Julia more seriously (should be presented at SC !).

Could this be mix with your accurate arithmetic efforts (Julia equivalent of Python's "fsum" for floating point summation - #54 by ffevotte) to obtain fast and accurate gemms ?

8 Likes