I am about to build a new workstation. It is mainly used for numerical tasks, which is nowadays 99% Julia for me. This system is kind of the sweet spot for things a laptop is not adequate for, yet having full control over everything is valued more than using our cluster.
Typically I write Julia code that uses 10–30% BLAS/LAPACK, rest is native. For parallel workloads (MCMC, simulation), I use 3–5 threads. Lot of branches, typical CPU, not GPU stuff.
I was thinking that I would wait for the Ryzen 5 3600 (not X, as it does not seem to be worth it) to trickle down to retail in a few weeks. But I am not sure what motherboard to use, and whether the choice of chipsets makes any difference.
Most advice I found online is about building gaming PCs, not sure how much of that carries over. Any advice would be appreciated.
I think it might be worth looking at recommendations for building “Creator” pc’s. I.e. Youtubers, people who stream and whatnot. The rendering/exporting and streaming tasks seem to be quite cpu-demanding and relatively well optimized (not sure in how far they actually use BLAS but most are avx workloads). I’m not sure if there are build guides around that topic.
From what I know (grain of salt), motherboards mostly matter in terms of overclocking (#of phases of power delivery, VRM quality etc), and in PCI-E availability. Aside from that I think the cheapest that has the features you want is fine.
Note on Ryzen, I’ve heard that the previous notion of “ram clockspeeds don’t matter so much” doesn’t hold since the speed of the infinity fabric in between the cpu chiplets scales with the ram speed.
The new generation has some sort of multiplier over the infinity fabric, but the dependence is still true
From Anandtech.com :
One of the features of (infinity fabric, second generation) is that the clock has been decoupled from the main DRAM clock. In Zen and Zen+, the IF frequency was coupled to the DRAM frequency, which led to some interesting scenarios where the memory could go a lot faster but the limitations in the IF meant that they were both limited by the lock-step nature of the clock. For Zen 2, AMD has introduced ratios to the IF2, enabling a 1:1 normal ratio or a 2:1 ratio that reduces the IF2 clock in half.
BLAS/LAPACK are much faster with avx512, as is native if you’re willing to vectorize all the bottlenecks. Besides double-width vectors, it also offers twice the registers (reducing register pressure), and efficient masking, which can make vectorizing with the likes of SIMD easier.
Although masked instructions are about as efficient as their unmasked counterparts, unfortunately no compiler and very few libraries take advantage of them. Some of mine do, which is why PaddedMatrices.jl – which uses masking for unpadded matrices – was about 3x or more faster than Eigen for most small statically sized (unpadded) matrices.
Last I tested, BLAS/LAPACK only benefit from avx512 if you’re using MKL, and not if you’re using OpenBLAS.
Unfortunately, the cheapest avx512 cpu I see from a quick search is a pre-owned 6-core 7800X for $300 on ebay. That’s 50% more than the Ryzen 3600. The Ryzen has higher clock speeds, and less than half the TDP.
For the CPU, unless you’re super excited about vectorization, the new Ryzens look like much better deals.
Old Ryzen’s did have half-rate 256 bit fma throughput, which is bad for numerics and BLAS/LAPACK in particular. The 3600 & Co are full-rate.
EDIT: My 9940X GeekBench vs a prototype of the upcoming 16-core Ryzen 3950X that made the news recently as “record setting”.
While my CPU came out behind in the multithread score (unless I overclocked), the single threaded SGEMM and SFFTs performed much better, at 200.3 and 18.3 GFLOPS vs 98.8 and 13.5 GFLOPS.
So in the particular tasks I spend most of my time on, it does perform better.
Then again, the 3950X will debut for not much over half the cost of the 9940X, and at higher clock speeds than the GeekBenched part…
I know @Tamas_Papp is not in the UK. If anyone in the UK is looking for a custom built workstation I would recommend a company I used to work for. I admit though that they build gaming PCs with all the nice cases and lights, and VR rigs. MEssage me offline for a contact.
Thanks for the advice — unfortunately they are not (yet) available in retail in Europe, even if the CPU is nominally “released”. So I have more time to plan.
Does this have any practical consequence for the RAM I should choose? I plan to go with a G.SKILL 32GB Aegis DDR4 3000MHz CL16 KIT (F4-3000C16D-32GISB). I figured that CL15 or faster RAM clock would not make a huge difference for me, on a B450 chipset.
The X570 chipset is only important, if you want lighting fast Gen4.0 i/o speed - in the future.
The 3x M.2 is important, for data intensive works
Why?
Julia 1.2(1.3) Big improvements will be the real multithreading
so the thread count will be important ! Later more and more julia packages will be Auto-scaling to use all available threads …
I ended up with a ThreadRipper 2950X with a Gigabyte X399 Designare motherboard. Multithreading is pretty great (16 cores, 32 threads), but it does suffer a bit in sequential floating point, I think from missing out on MKL optimizations. Not sure how similar that is to the 3600, but I’d be happy to run any Julia benchmarks if that would be informative for you.
The ZEN2 ( Ryzen 3600 ) AVX2 is much better “The key highlight improvement for floating point performance is full AVX2 support. AMD has increased the execution unit width from 128-bit to 256-bit, allowing for single-cycle AVX2 calculations, rather than cracking the calculation into two instructions and two cycles. This is enhanced by giving 256-bit loads and stores, so the FMA units can be continuously fed. AMD states that due to its energy aware scheduling, there is no predefined frequency drop when using AVX2 instructions (however frequency may be reduced dependent on temperature and voltage requirements, but that’s automatic regardless of instructions used)”
They’ll be released July 7th in the US (7/7 for the 7nm parts).
The 7nm Ryzen clearly beat earlier chips, especially for numerical workloads.
I think they also look like much better choices than all non-avx512 intel parts, which is why I focused on them.
If you don’t mind installing a bunch of unregistered libraries, you could try benchmarking the vectorized pow functions here, or small matrix multiplication like in the “3x or more faster than Eigen” link.
Here is a 2950X GeekBench result that did very well. By adding .gb4 to the end of the urls, you can see some sampled clock speeds. That one ran at 4.4 GHz, while the 395prototype was slower at 4.29 (the released version will clock higher).
While its overall scores were comparable to the 3950X, in SGEMM and SFT it was 62.9 and 10.4 GFLOPS vs 98.8 amd 13.5 GFLOPS.
( I suspect the jump is smaller than that provided by avx512, because avx512 does the number of registers on top of doubling their width, letting you use larger kernels, reducing the ratio of move/fma instruction ratio.)
For JuMP, the first step to answering that question is determining whether most of your time is spent in the problem formulation (actual JuMP/MathOptInterface/solver wrapper Julia code) or inside the particular solver you’re using. Most likely it’s the latter, in which case the answer is of course solver-dependent. But generally, if you’re solving mixed-integer programs, then most solvers can exploit multiple cores. A lot of algorithms for solving LPs and QPs, as well as gradient-based nonlinear optimization are harder to parallelize.
I am working on a paper that applies Bayesian analysis to a high frequency foreign exchange price data. The data has a size of 100 gigabytes. I guess I need a 128 GB ram and a good CPU to do that.