Writing parallel for loops

I’ve got a program where I want to parallelize some for loops. If I add the parallelize decorator will it automatically detect my number of cores and use them? Or do I have to declare it somewhere beforehand? The examples in the docs I saw are geared towards running stuff from the interpreter.

If I define a SharedArray for use in parallel processing (as mentioned in the docs) is there any downside downstream? I assume its type is still SharedArray after the parallel processing. Basically can I use a SharedArray just like a normal Array in linear processing?

Finally, is there a nice way to switch between linear and parallel processing? Ideally it’d be something like
@parallel(len_data > 1000), might there be something like that which would allow one to avoid having to write code for linear and parallel use?

No, check out the documentation parts on addprocs or opening Julia with julia -p n

Yes. SharedArrays share the data, so they are not as fast in many cases as doing a strictly parallel operation without it. Many times it can be faster to just write an algorithm that splits off the data to processes exactly the way you need it, and do the parallel reduction on independent pieces. YMMV.

A conditional? Or write a macro that spits this kind of code out. But you have to be careful because @parallel isn’t exactly the same as a loop.

I’m still confused as to how to declare which parts of a program are accessible or not from the different workers. On a relatively low level in a program of mine I have a for loop I want to parallelize.

module module_name
[...]
addprocs(3)
totalsum = @parallel (+) for i in 1:large_number
    tmp_sum = 0
    for j in 1:num
        ...  # calls f1 f2
    end
    tmp_sum  # not sure how to 'return' the result, the examples have a conveniantly placed calculation at the end
end
rmprocs([2 3 4])
[...]
end

As I understand I’d have to put the @everywhere decorator infront of f1 and f2. But the program fails far before with the additional workers complaining that UndefVarError: module_name not defined, and I have no clue how to fix that.

I feel like I’ve missed something needed for setting up parallel processing. As I understood it other than writing the actual @parallel part, one needs to addprocs and then add the @everywhere decorator to those functions used inside the loop. Is that really it?

I know pmap is better suited for what I’m doing here but I wanted to get the simpler option to work first (I’d need to pass several arguments the pmap function).

You defined the module in the global scope so like all others in the global namespace, you need to make sure it’s on the other processes… so @everywhere it. Packages are exempt from this because using knows to check the path directly and so each can import the module itself.

Yup. If you’ve defined each process, and each process has the required functions, then it will run in parallel just fine.