Multi-threading on a 2 CPU system

The way you have your loops structured might be problematic.

@threads for ..
    @threads for..

Nested @threads loops is probably a little hard to reason about what happens. If you restructured your code to have one top level parallel loop you would likely get better performance and more intuitive understanding of the parallel execution.