It’s not a performance issue. It’s just because CPU matmuls of that size are faster on CPU. Did you try
A = rand(50,50)
b = rand(50)
@btime A*b
gA = gpu(A)
gb = gpu(b)
@btime gA*gb
You can make the example bigger and bigger until GPUs finally make sense. You need pretty big problems for GPUs to make sense if the CPU code is optimized. You can also batch data points which would help here.