Hello! I have recently started to migrate to Julia, and decided to port some of my Matlab code. I am struggling to make a performant row-wise Kronecker product (Khatri-Rao product) implementation for two or three matrices. At the moment I am doing the following:

```
function dotkron!(A::Matrix{Float64},B::Matrix{Float64},C::Matrix{Float64})
N = size(A,1)
@inbounds @simd for n = 1:N
kron!(A[n,:],B[n,:],C[n,:])
end
end
function dotkron(A::Matrix{Float64},B::Matrix{Float64})
(N,DA) = size(A)
(N,DB) = size(B)
C = Matrix{Float64}(undef,N,DA*DB)
dotkron!(C,A,B)
return C
end
function dotkron!(A::Matrix{Float64},B::Matrix{Float64},C::Matrix{Float64},D::Matrix{Float64})
dotkron!(A,dotkron(B,C),D)
end
function dotkron(A::Matrix{Float64},B::Matrix{Float64},C::Matrix{Float64})
(N,DA) = size(A)
(N,DB) = size(B)
(N,DC) = size(C)
D = Matrix{Float64}(undef,N,DA*DB*DC)
dotkron!(D,A,B,C)
return D
end
```

can it be done more neatly and faster? For reference on my system:

```
using BenchmarkTools
A=rand(1000000,10);
B=rand(1000000,10);
@btime dotkron(A,B);
1.229 s (9000002 allocations: 2.15 GiB)
```

Matlab is approx. 3-4 times faster.

(Apologies for posting here, it is my first post, but this seems the more fitting category)