Reduced performance for parallel loops in larger code?

are those expected? Ideally in critical performance code you should not allocate anything or allocate only if completely on purpose.