After looking deeper into the code of StaticArrays, I think I found the cause. solve.jl introduces specialized functions for calculations with vectors up to the length of 3. But these are not applied when A (in B\A) is a Matrix.
If the dimensions of B is 4 or higher, the performance advantage changes and funDiv1 is faster than funDiv2. Though, this is only true for StaticArrays. I tested up to a dimension of 50 and the performance of Div2 is still better than Div1 for normal Arrays.
Another issue I found with StaticArrays: While it supports lu-factorization, the factorized matrix B can only be used with B\A and not with A/B.
I will try to make a PR for StaticArrays. For the build-in Matrix, I am a bit lost in the code. Should I open an issue for Julia? or even OpenBLAS?