Is matlab automatically threading this? You can probably make this faster by using LoopVectorization. I see the following at 5x faster than your original f!
function f2!(D,A,B,C)
@turbo D .= max.(A.*B.+C, 0.2)
end
and using @tturbo (threaded version of @turbo) gives another 4x speedup.
function f2!(D,A,B,C)
@tturbo D .= max.(A.*B.+C, 0.2)
end