Speed up for-loop with multithreading

Using

using Profile
Profile.clear()
a1 = init()
round_serial(a1)
@profile for i in 1:1000; round_serial(a1); end
Juno.profiler()

for my example I see in Juno/Atom something like this

where you can navigate from the profile pane to the source code. In the source code pane bigger bars mean larger part of the runtime. Read color indicates part of the program which allocate, yellow color indicates dynamic dispatch.