Rendering and HPC Benchmark Session Using Our Best Servers
by Johan De Gelas on September 30, 2011 12:00 AM ESTQuad Xeon: the Quanta QSCC-4R Benchmark Configuration
CPU |
Quad Intel Xeon "Westmere-EX" E7-4870 (10 core/20 threads at 2.4GHz, 2.8GHz Turbo, 30MB L3, 32nm) |
RAM | 32 x 4GB (128GB) Samsung Registered DDR3-1333 at 1066MHz |
Motherboard | QCI QSSC-S4R 31S4RMB00B0 |
Chipset | Intel 7500 |
BIOS version | QSSC-S4R.QCI.01.00.S012,031420111618 |
PSU | 4 x Delta DPS-850FB A S3F E62433-004 850W |
The quad Xeon configuration is equipped with 128GB RAM to make sure that all memory channels are filled.
Dual Xeon: ASUS RS700-E6/RS4 Configuration
CPU |
Dual Intel Xeon “Westmere” X5670 (6 core/12 threads at 2.93GHz, 3.33GHz Turbo, 12MB L3, 32nm) |
RAM | 12 x 4GB (48GB) ECC Registered DDR3-1333 |
Motherboard | ASUS Z8PS-D12-1U |
Chipset | Intel 5520 |
BIOS version | Version 1.003 |
PSU | Delta Electronics DPS-770 AB 770W |
The dual Xeon server in contrast "only" has 48GB. This has no influence on the benchmark results, as the benchmarks use considerably less RAM.
Quad Opteron: Dell PowerEdge R815 Benchmarked Configuration
CPU |
Quad AMD Opteron "Magny-Cours" 6174 (12 cores at 2.2GHz, 12MB L3, 45nm) |
RAM | 16x4GB (64GB) Samsung Registered DDR3-1333 at 1333MHz |
Motherboard | Dell Inc 06JC9T |
Chipset | AMD SR5650 |
BIOS version | v1.1.9 |
PSU | 2 x Dell L1100A-S0 1100W |
We reviewed the powerful but compact Dell R815 here. This time we're running 64GB, though again the amount of RAM was selected to make sure memory performance is optimized rather than for usage requirements.
52 Comments
View All Comments
proteus7 - Tuesday, October 11, 2011 - link
STREAM triad on a 4S Xeon E7 should hit about 65GB/s, unless your memory, or UEFI/bios options are misconfigured. Firmware settings can have a HUGE difference on these systems.Did you:
Enable Hemisphere mode?
Disable HT?
If running Windows, assume it was Server 2008 R2 SP1?
If running Windows, realize that only certain applications, compiled with specific flags will work on core counts over 64 (kgroup0). Not an issue if HT was off.
Enable prefetch modes in firmware?
ensure system firmware was set to max perf, and not powersaving modes?
if running windows, set power options to max performance profile? (default power profile on server drops perf substantially for short burst benchmarks)
TPC-E is also a great benchmark to run (need some SSD storage/Fusion I/O) HPCC/Linpack are good for HPC testing.
pventi - Monday, October 31, 2011 - link
As you can read from the icc manual when running on non INTEL processors the Non-Temporal pre-fetches are not implemented in the final machine code. This alone means it could be up to 27% faster.Another reason why it's slower is because the "standard" HW configuration of the Opteron throttles the DRAM pre-fetchers when under load.
Under Linux this behaviour can be changed from shell and should add another 5~10% increase in performance.
So this benchmark should show ~ 30% higher number for the Opteron.
www.metarstation.com
Best Regards
Pierdamiano