Hi. I am very new to Julia and am not a computer geek so this question may not be very clear. I am happy to add any information as necessary.
I am running Julia on VSCode in Windows. I recently added some memory sticks (128G to 256G) to the PC and found out that Julia was significantly slower. I tried several things including moving the positions of the memory sticks, reinstalling Julia and VSCode, disabling hyperthreading. (in this order). Nothing worked.
I then decided to install Windows Server on my computer, hoping that Julia would behave normally under the new system. It was still slow.
Could anyone give me some suggestions on what to do? Thanks!
Did you run benchmarks, or is this subjective?
What is the speed of the memory, new and old?
Maybe an XMP profile was enabled before, but is disabled now? That is, if you buy RAM marked at 3200 MHz and install it, it will run at 2133 MHz until you enable the appropriate memory profile (or manually tweak settings).
Thanks for the reply.
I did not run any benchmarks. I ran the same Julia code, and it takes twice the time to finish (recorded by ProgressMeter). I also noticed that the CPU usage is significantly lower. It dropped from 30-40% to 10-15%.
My memory sticks are 2133 MHz. I have a duo Xeon CPU E5-2690 v4 @2.60GHz.
It’s hard to help since your post lacks specifics, especially code. That said…
This quote makes me wonder if your code is now running on a single CPU core after previously running on several. Is your code multi-threaded? If so, try running it again in a Julia session started with julia --threads X where X is the number of cores on your machine. Or set the JULIA_NUM_THREADS environment variable as described here.
I used CINEBENCH to run a benchmark. The multi-core score was 18500 points and the single-core score was 769 points. I’m not sure how to understand these scores. Given that I have a 5-year old computer (running Windows Server 2019 Standard) with Intel Xeon CPU E5-2690 v4 duo processors with 28 cores @ 2.6GHz, I am guessing these are okay scores.
It is possible that you are accidentally using multithreading via BLAS if you work with Float32s. Otherwise it would be impossible to hit 40% total CPU utilization on a 28 core machine.
Either way, I am unsure how the RAM could be meaningful here.
I doubt if RAM should be the problem here, too. Just can’t wrap my head around what has happened here. Sorry if my question sounds naïve, but what is the “right” number of BLAS threads I should be using?
I have experimented with multithreading, although my computer has 28 cores, lower thread numbers (4) seem to work better than high thread numbers (28 or higher). In either case, CPU occupancy was low in that it ranges from 10-15% when the program first starts and goes up to 50-60% after the program has been running for 8-9 hours. Ideally, I would like to create a log documenting relevant information, but as you mentioned in your response, I am no sure what information is relevant for you to provide effective suggestions for me.
Sorry for the long reply, and thanks for your advice!
A bit of a speculative thing to suggest here. Are there power saving / capping settings here? Your old OS may have had some power saving settings and the new OS more restrictive ones?
My advice would be to find the latest BIOS for your motherboard then set BIOS options to maximum performance not power saving. Set fan profile to maximum performance.
Then find a Windows tool which displays CPU states as you are running a Julia program.
Also you refer to moving the memory sticks to different slots. Hold on there. Consult your motherboard manual and place the DIMMs as recommended to keep memory channels balanced.
Many servers have a diagram on the chassis lid which shows you how to populate DIMM slots.
If we are talking about memory performance I have been running the stream benchmark on an HPC cluster this weekend. Oh the joys of being an HPC engineer…
You would have to look for benchmark numbers from a similar system to see if your system in underperforming.
Agreeing with @Tamas_Papp You need a baseline.
I do benchmarking of HPC systems - the Linpack benchmark still rules, for two reasons (a) it exercises systems and heats them up very well (b) you can compare like with like back several generations.
I believe the Phoronix benchmark suite is pretty well respected in the Windows world
Thanks for your suggestions. I changed my bios setting as you recommended. My computer did speed up quite a bit! This time I tracked its performance using CineBench and the scores went up after I changed the bios setting.
The same odd thing happened when I ran my codes, though: the estimated time (calculated by ProgressMeter) initially went down and then gradually increased. Before I added the memory sticks, the estimated time never went up. I cannot figure out if this was because of my codes (I did not change my codes during this process) or because of my computer.