I’d like to be able to be able to broadcast matrix multiplication across multidimensional arrays similar to the following:

```
a = rand(4,3,2)
b = rand(3,4,2)
a .* b # expect a (4,4,2) array, but instead errors
```

I understand this would be ambiguous in the case of 2 4x4x2 arrays as to what I wanted to do. Is there a way currently to help broadcast out by specifying a dimension?

Now, I know I can do this in a for loop, with iteration, etc. It looks like batched matrix multiplication has already been discussed by this community. As far as I can tell this was never implemented, but I might be missing something.

The real payoff here is being able to use this syntax with some of the Array interface GPU programming provided by the CuArrays.jl/CUDA.jl packages where the parallelism can really be exploited. It looks like there is already a `gemm_batched`

function wrapping the equivalent cuBLAS functionality, but I can’t access it with simple Julia Linear Algebra calls yet.