To get the full benefit, you need the inputs to be arrays of SMatrix, so that the compiler knows the sizes. Don’t convert to static arrays only for the multiplication step, use them from the beginning (and elsewhere throughout your program), e.g. generate them via:
T = SMatrix{2,2,ComplexF64,4}
A = rand(T,A_dim)
B = rand(T,B_dim,A_dim)
C = rand(T,C_dim,A_dim)
This will also fix the issue of the dimension ordering noted by @Mason above (and issues with constant propagation of array sizes, which is no longer needed), though to get even better locality I would also probably transpose A_dim with B_dim so that it becomes:
@tullio X[k,l,m] := A[k]*B[k,m]*A[k]*C[k,l]
(but you can experiment a bit with different dimension orderings of X and B to see which one is fastest with Tullio).