Faster matrix product with a specifically structured matrix

In this way you do not count the time necessary for the construction of the matrix M and of the vector b.
I suppose this only has to be done once while multiplication by x has to be done a very large number of times.
In this case I would like to submit the following proposal to you:

function my_mul(Qt,y)
rv=rowvals(Qt)
nzv= nonzeros(Qt)
c=zeros(Float64, size(Qt,2))
# c=Vector{Float64}(undef, size(Qt,2))
# fill!(c,0.)
@inbounds for col in 1:size(Qt, 2)
    rj=nzrange(Qt,col)
     for j in rj
      c[col] += y[rv[j]]
     end
     c[col] *= nzv[rj.stop]
end
c
end


Pt=sparse(P')

julia> n,m = 1000,1000
(1000, 1000)

julia> x = rand(m);

julia> P = sparse((rand(n,m).>0.97) .* rand(m));

julia> Pt=sparse(P');

julia> P*x ≈ my_mul(Pt,x)
true

julia> @btime P*x;
  25.900 μs (1 allocation: 7.94 KiB)

julia> 

julia> @btime my_mul(Pt,x);
  21.400 μs (1 allocation: 7.94 KiB)

The construction of the Pt matrix is expensive and the improvement obtained is not very big, but if the multiplication has to be done many times, it may be worth it.