If you do these changes, you reduce even further the allocations:
rtmp = zeros(2) # preallocate a temporary vector here
for i=1:N
k1 = pend.dt .* getDerivs(pend,r,t)
rtmp .= r .+ 0.5 .* k1 # compute it here
k2 = pend.dt .* getDerivs(pend,rtmp,t + 1/2 * pend.dt)
r .+= k2 # add the dot
julia> @time saveTheta,saveOmega = RK4(myPendulum,poincare=true)
0.977168 seconds (19 allocations: 488.308 MiB, 4.39% gc time)
(now the loop does not allocate anything).
Some tips here on how to find the allocations: Disabling allocations
My impression here is: loops that perform critical tasks should only allocate if one is clearly aware of the reason of the allocation. Otherwise it is a good idea to search where the allocation is and try to solve it. Of course, if the execution time is still a problem at all.