Automatically fusing together several for-loops

It should be possible to get speed ups from parallelism. How did you try doing this before?