Skip to content

Threads vs. Watts

So I have been testing, again.

My hapless test subject this time is a Dell Box, an R630.

It has a comfortable 384GB of memory, one of two 25 GBit/s ports active, and it comes with two E5-2690v4 CPUs. That gives it 14 cores per die, 28 cores in total, or with hyperthreading, 56 threads.

$ cat /proc/cpuinfo | grep 'model name' | uniq -c
56 model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
$ ./mprime -v
Mersenne Prime Test Program: Linux64,Prime95,v28.10,build 1

I have not been nice, because I have been abusing the box with mprime95 in Torture Test Mode, trying to make it consume as much power as possible.

$ cat prime.txt
V24OptionsConverted=1
WGUID_version=2
StressTester=1
UsePrimenet=0
MinTortureFFT=8
MaxTortureFFT=64
TortureMem=0
TortureTime=3
TortureThreads=56

[PrimeNet]
Debug=0

Running this with variable values for TortureThreads allows me to learn the impact of CPU usage on power consumption. When Idle, the box consumes 170W. When busy, it’s some 406 to 420W.

Running at almost full power.

And the power consumption is not linear. Well, it is up to 28 Threads, and then more or less plateaus.

Watt used, by number of threads busy.

From 170W idle to 406W at 28 Threads busy, then basically static for the rest.

If you think about this, it makes a lot of sense.

At 28 Threads busy, all the physical cores are being kept busy by the very much architecture and topology aware mprime95 program: It analyzes which Threads are located on which Cores and Dies, and then sets itself up with CPU affinity for maximum utilization. So at 28 Threads busy (50% full capacity) we are already pretty much at full power consumption.

Unfortunately, conversely this looks not as nice: If you define the full power usage at 420W and want to spend only half of that, 210W, you will reach that point at around 4-6 Threads busy – about 10% of the potential compute already consume 50% of the power.

Another way to think about it is Watt per Thread:

Watts per Thread.

A nice hockey stick. Inflection point is around 6 cores. Things are becoming interesting, from a bang/buck perspective, if the box is at load 6 or higher, at all times.

From a data center planning perspective, it becomes clear that there is no way to make thermally oversubscribing racks financially attractive. It is better to build for full thermal utilisation without oversubscription, and then make sure the boxes are always as busy as possible. Computation past a base load of 6 is basically available for free from a power/cooling point of view.

Published inContainers and KubernetesData Centers

5 Comments

  1. Martin

    Brutal numbers, never thought about it that way.

    • kris kris

      Thanks, I fixed that.

  2. Thomas J.

    You really should look into what that idle power is comprised of. Our Bull DLC blades have much lower idle power levels. (Around 75W and already accounting for the built-in IB switch). Looks to me like the Dell is not a good match for compute-heavy loads or at least not configured to be.

    • kris kris

      I am aware of the high standby power. All machines are in high-performance mode, all energy savings in the BIOS turned off. That’s intentional, and the other modes (energy saving and mixed mode) perform worse for the intended workload.

      This was not about controlling total power consumption. The experiment was about the relationship between offered load and power consumption.

Leave a Reply

Your email address will not be published. Required fields are marked *