DistributedArrays: unexpected behavior before modifying localpart

I have been trying to learn enough about distributed arrays to solving a simple PDE in parallel; however, there appears to be an error associated with accessing localpart data right before modifying it. I’m trying to understand why and to see if I’m on the right path with DArrays.

Here is a working example:

using Base.Test
@everywhere using DistributedArrays
addprocs(4-nprocs())
dArray=distribute( collect(linspace(1,6,6)) )

@parallel for p in procs(dArray)
    firstLocalId=2*p-3
    localpart(dArray)[1:2,:]=1
    println("indexing at ",firstLocalId," gives ",dArray[firstLocalId,1])
end

@test sum(dArray)==6

It prints out the expected:
From worker 3: indexing at 3 gives 1.0
From worker 4: indexing at 5 gives 1.0
From worker 2: indexing at 1 gives 1.0
and the test passes.

However, if I move the print to before where I am modifying the data:

@parallel for p in procs(dArray)
    firstLocalId=2*p-3
    println("indexing at ",firstLocalId," gives ",dArray[firstLocalId,1])
    localpart(dArray)[1:2,:]=1
end

I still get a sensible output:
From worker 3: indexing at 3 gives 3.0
From worker 4: indexing at 5 gives 5.0
From worker 2: indexing at 1 gives 1.0
but the test fails (I think because there was some crash in the parallel executions somewhere?).

I also have two more related questions:

  1. Why does Julia not crash when I have a syntax error or runtime error within an @parallel? It runs to completion but leaves the distributed array unchanged (which is why I added the @test to my example). Is this expected behaviour to have Julia not throw errors within parallel executions?
  2. Is @parallel for i in procs() the same as
for p in procs()
    @spawnat p begin
       ...
    end
end

For my examples, the results seem similar. Is one better performing than the other? Thanks.

The documentation for @parallel states:

  Note that without a reducer function, @parallel executes asynchronously, i.e. it spawns independent tasks on all available workers and returns immediately without waiting for
  completion. To wait for completion, prefix the call with @sync, like :

  @sync @parallel for var = range
      body
  end

So in both cases you should write a @sync in fron of your @parallel loop wich will also catch and show any errors.

@parallel will be faster than your manual for loop since it will start the distributed computation async.
It is equivalent to:

for p in procs()
    @async @spawnat p begin
       ...
    end
end