Is the @threads
macro available for multithreading over multiple CPU, or does it only work for setups with one CPU?
It’s unclear what you’re asking. Do you want to distribute your work across multiple CPUs or processes? Multithreading (ability to spawn multiple tasks within a single process on a single machine) is distinct from distributed computing (ability to have multiple processes communicating with each other, running on one or more machines) and they use distinct tools.
I’m looking at a single machine which has 2 CPUs with both CPU having multiple cores, so I don’t know whether it is possible to use multithreading in this case.
You can have a single Julia process with as many threads as you want, but generally not more than the logical threads supported by your system, more often closer to the number of physical cores you have. @threads
will distribute your workload across those threads.
It’s an implementation detail, but when 2+ CPU, they share same memory space (“NUMA”), and I believe the sum of the number of cores is reported, and that’s what Julia looks at and exploits, as it should.
When you have two or more CPUs “in the same machine” but the memory space isn’t shared it’s called a cluster and then it’s just impossible to share threads across so you use Distributed, and different “computers” in the cluster have their own address space/process each. The sum of the computers is then a cluster, but each computer has its own core count.
I am interested in learning more on this topic as well. If I buy a dual-CPU high-end computer, does it really appear to Julia (or other programs) as a single machine? I always steered clear because I thought that it would not, and I would run into this distributed fiasco, when really I just want a single powerful computer.
I suggest reading up on the first touch policy and NUMA domains.
It will appear and work like one machine, but you’ll need to be much more careful to get good performance.
I recall dual first from Pentim Pro days, i.e. two CPUs separate on the motherboard. Is there any reason to do that now? You can get one or two dies now in the same chip. I’m not sure, but those might look like two CPUs to the computer. If that’s what you mean i.e. a multi-chip module, then it meant in the past that both chips inside have separate cache (but shared memory, while cache traffic between slows down), now you would have one shared L3 in the chip to bridge those two CPUs, I think. I very much doubt it should worry you that it looks to the OS as two CPUs, but I think it would very likely be overkill for most people to have more than on CPU.
Having a motherboard that comes with a 2 CPU socket gives you a slight edge when it comes to more cores and performance. Before we dig in and review
Are you looking for throughput performance (as in web server), or for a single app? As Elrod mentioned, it’s hard to get performance (out of one app), but with tuning the gain might be more than “a slight edge”. I think he means HPC style, not throughput style performance.
I’m not sure, the reason for these dual-CPU might be to get two cheaper CPUs, combined cost, i.e. for throughput. You might want to max out both CPUs, or not. I’m not sure why people do this, and doubt e.g. helps in games.
This still exists for servers. Dual CPU servers are getting less common now that single CPUs get up to 64 cores, but dual CPUs are still pretty common (either for going to 128 cores or for the extra PCIE lanes).