FWIW, on my laptop replacing du .= p[1] by fill!(du,0.0) or du .= 0.0 gives a noticeable speedup from ~8ns to ~5ns. The latter two seem the same speed.
If your problem is really only two dimensional though you might do better using out of place and static arrays. That gets the call to the analogous f(u,p,t) = @SVector [0.0,0.0] down to just over 1ns on my laptop.
edit: And I would guess that if you have a large number of components in your real problem the relative time spent in f! relative to g! is likely lower.