Bottleneck of JuMP

I am solving a SDP problem using JuMP and the associated MosekTools.
The basic pattern is this:`

NQ=9
Nk=9
nred=4
Dmatrix = Vector{Any}(undef, 3)
        Dmatrix[1] = Vector{Any}(undef, NQ) # uuuu channel
        Dmatrix[2] = Vector{Any}(undef, NQ) # dddd channel
        Dmatrix[3] = Vector{Any}(undef, NQ) # uddu channel
    model = Model(Mosek.Optimizer)
        #This is D constraint
        for jq in 1:NQ
            nred = length(k_pairs_dic[jq])

            Dmatrix[1][jq] = @variable(model, [1:nred, 1:nred], Hermitian)
            Dmatrix[2][jq] = @variable(model, [1:nred, 1:nred], Hermitian)
            Dmatrix[3][jq] = @variable(model, [1:Nk,   1:Nk  ], Hermitian)

                @constraint(model, Hermitian((Dmatrix[1][jq]+Dmatrix[1][jq]')/2) in HermitianPSDCone())
                @constraint(model, Hermitian((Dmatrix[2][jq]+Dmatrix[2][jq]')/2) in HermitianPSDCone())
                @constraint(model, Hermitian((Dmatrix[3][jq]+Dmatrix[3][jq]')/2) in HermitianPSDCone())
       end`

I am initializing a bunch of matrix and declaring them to be hermitian variables. I then put all the variables as a vector:
Dvec = vectorize_D(Dmatrix, ...)

Then I will construct a bunch of mapping matrix from Dvec to other variables. Unvectorize them, organize those into a matrix, and then impose constraints that those matrix to be PSD. The problem looks like this.

Amatrix=construct_map(...)

other_vec=Amatrix*Dvec

other_matrix=unvectorize(...)


@constraint(model, Hermitian(othermatrix) in HermitianPSDCone())

At the end of the day, I am imposing PSD constraints on 1 120 by 120 matrix, one 81 by 81 matrix, around 50 15 by 15 matrix. Now this has already become extremely expensive and slow to solve, and many times I get OOM error when I call

optimize!(model)

I want to point out that everything before optimize!(model) seems fine, when I construct the linear mapping matrix Amatrix or the other_vec, there were no error. Also, for reference, a problem of similar size in matlab using YAMIP and Mosek only takes less than a minute. I am not sure what is the bottle neck here. Is this workflow even correct? Or what can be limiting the speed and memory usage?