Ancient, Overweight, Yet Alive And Kicking

Unlike what you would expect, the title of this post isn’t about myself. It’s about me parting ways with my old Dell T5500. More like it’s going to be parked in the storage room until the kids are old enough to game on a PC, but she isn’t going to be simulating any structures anymore…

As in any starting business, you want to manage your expenses before income starts flowing. So went to the used PC market (not physically) and bought myself an insanely heavy Dell T5500. Seriously, these things are built to last!

So I removed all the unwanted hardware from there

Upgraded with a 7 RAM sticks of 8 GB

Replaced the Applied new thermal paste, added some extra Chassis fans for the GPU, an SSD and Viola! A bona-fide workstation (Oh, and a black paintjob. The silver face was annoying)

Here are some specs for us geeks:

This served me well, ran quite a few simualtions and held it’s own, for sure. But the day has come and I upgraded to a lovely Lenovo P920 that serves both as simuation machine and very good space heater while solving FEM matrices. Some more specs!

Although I would rather put my hands on something with PCIe 4.0 (even though this workstation was released in 2019, two years past the release of PCIe 4.0), it is still a very impressive machine. So let’s see what 8 years of Intel Xeon core improvement have done.

Setting The Marks

So before delving into the fine details, let’s try a standard benchmark. I’ve got here a standard CST benchmark, called “compute layers”. I’ve modified it a bit to pass the mesh refinement iterations a bit faster, but same idea. Both machines are installed with CST 2023 SP5.

Important to denote, I am not using the iterative solver here, at all. I am testing only the direct solver, which is what I use most of the time, anyway. I find it a lot faster, most of the time.

so let’s run it on the new device!

Ok, although it’s a high amount of RAM, the 64GB Dell solid metal brick should be able to do it! Right?

Oh boy… It used too much of the system memory, and started caching on the hard drive. Even using an SSD, this is too slow.

Ok, Let’s reduce the work-load a bit.

Is this a fair test though? Let’s overview the two workstations a bit.

The P920 has a pretty standard 2xCPU configuration. 3 dual channels are attached to each core. There is a 4th one, but the manufacturer does not recommend populating it. This is the reason, by the way, that I chose to buy 384GBs or RAM. The other choice would be 192GB, using 16GB memory sticks.

The T5500 2nd CPU is not attached directly to the board. It’s sitting on an externally connected riser board, that has 3 single DDR3 channels, namely the available bandwidth for this CPU will be lower.

Let’s try to limit ourselves then to the number of cores the older X5650 CPU has, and the amount of readily available RAM it has, 40GB. I’ll start actually by an even smaller simulation, see if I can spot the difference there. Let’s limit both devices to 6 threads, single core, and see:

T5500P920
Matrix setup time : 5 s
Solver setup : 105 s
Solver time : 2 s
Matrix setup time : 3 s
Solver setup : 20 s
Solver time : 1 s
Compare two panels, with 6 cores

Let’s look at the specific lines in the log file and define the figures of merit here.

  1. The first line, “Matrix setup”, is how long it takes to setup the FEM matrix in the memory.
  2. “Solver Setup” is when the “Matrix factorization” happens. This is relevant for the direct solver, that I mostly used. Here, if I understand correctly, the FEM matrix is factorized into an LU matrix pair (See below).
  3. “Solver time” is when the linear equation and back substitution of the found coefficients is performed.
LU Decompositio
LU Decomposition

So since the resolution is in seconds, the answer here is that I need to look at all three of these, combined.

To try and see the effects of the memory speeds, I’ll also run the larger two panel simulation, but again with the reduced core count. Let’s also verify that it only runs on the 1st 6 cores, and not the 2nd CPU:

Phew! Just wait for it to finish and crunch the numbers for the single layer simulation.

Per-core, 467% speedup is rather impressive! However, let’s add to that the fact that the Gold 6154 core can boost all the way to a clock of 3.7 GHz, while the X5650 is limited to 3.07 GHz. So let’s normalize the results by the clock speeds, as well.

It’s still almost a x4, core-for-core, speedup. I can’t really normalize out the memory speeds, but that’s part of the deal. The memory here is almost x2 times faster, so it’s got to be something else, as well…

Let’s see how these numbers add up while running the longer, 2 panel simulation:

Even crazier! I guess this is where the faster memory really kicks in.

Now let’s just show off by running all 36 cores vs. all 12 cores. Just for funzees, I’ll add in a reasonable RAM consumer, in the form of this brute force horn Antenna + Reflector.

Epilogue

There you have it. Up to the year this CPU is relevant to, circa. 2019, there were significant advancements even at the core level. Good on intel for this fine device. I, for one, am very happy with this nifty piece of hardware and already putting it to good use.

How would you benchmark this? Not limiting this question for CST, of course.

Some Thanks

Thank you, ADCOM team, for giving me the opportunity to post stuff like this and helping out in these troubled times.

Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *