The Impact of Bulldozer's Pipeline

With a new branch prediction architecture and an unknown, but presumably significantly deeper pipline, I was eager to find out just how much of a burden AMD's quest for frequency had placed on Bulldozer. To do so I turned to the trusty N-Queens solver, now baked into the AIDA64 benchmark suite.

The N-Queens problem is simple. On an N x N chessboard, how do you place N queens so they cannot attack one another? Solving the problem is incredibly branch intensive, and as a result it serves as a great measure of the impact of a deeper pipeline.

The AIDA64 implementation of the N-Queens algorithm is heavily threaded, but I wanted to first get a look at single-core performance so I disabled all but a single integer/fp core on Bulldozer, as well as the competing processors. I also looked at constant frequency as well as turbo enabled speeds:

Single Core Branch Predictor Performance—AIDA64 Queens Benchmark

Unfortunately things don't look good. Even with turbo enabled, the 3.6GHz Bulldozer part needs another 25% higher frequency to equal a 3.6GHz Phenom II X4. Even a 3.3GHz Phenom II X6 does better here. Without being fully aware of the optimizations at work in AIDA64 I wouldn't put too much focus on Sandy Bridge's performance here, but Intel is widely known for focusing on branch prediction performance.

If we let the N-Queens benchmark scale to all available threads, the performance issues are easily masked by throwing more threads at the problem:

SMP Branch Predictor Performance—AIDA64 Queens Benchmark

However it is quite clear that for single or lightly threaded operations that are branch heavy, Bulldozer will be in for a fight.

Power Management and Real Turbo Core Cache and Memory Performance
Comments Locked

430 Comments

View All Comments

  • Kristian Vättö - Wednesday, October 12, 2011 - link

    I'm happy that I went with i5-2500K. Performance, especially in gaming, seems to be pretty horrible.
  • ckryan - Wednesday, October 12, 2011 - link

    I was just going to say the same thing. I was all about AMD last year, but early this year I picked up an i5 2500K and was blown away by efficiency and performance even in a hobbled H67. Once I bought a proper P67, it was on. It's not that Bulldozer is terrible (because it isn't); Sandy Bridge is just a "phenom". If SB had just been a little faster than Lynnfield, it would still be fast. But it's a big leap to SB, and it's certainly the best value. AMD has Bulldozer, an inconsistent performer that is better in some areas and worse in others, but has a hard time competing with it's own forebearer. It's still an unusual product that some people will really benefit from and some wont. The demise of the Phenom II can't come soon enough for AMD as some people will look at the benchmarks and conclude that a super cheap X4 955BE is a much better value than BD. I hope it isn't seen that way, but it's not a difficult conclusion to reach. Perhaps BD is more forward looking, and the other octocore will be cheaper than the 8150 so it's a better value. I'd really like to see the performance of the 4- and 6- before making judgement.

    It's still technically a win, but it's a Pyrrhic victory.
  • ogreslayer - Wednesday, October 12, 2011 - link

    I tell friends that exact thing all the time. Phenoms are great CPUs but switch to Nehelam or Sandy Bridge and the speed is noticibly different. At equal clocks Core 2 Quads are as fast or faster.

    Bulldozer ends up with a lot of issues fanboys refused to see even though Anandtech and other sites did bring it up in previews. I guess it was just hope and a understandable disbelief that AMD would be behind for a decade till the next architecture. We can start at clockspeed but only being dual-channel is not helping memory bandwidth. I don't think there is enough L3 and they most definitely should have a shortpipeline to crush through processes. They need an 1.4 to 1.6 in CBmarks or what is thhe point of the modules.

    The module philosophy is probably close to the future of x86 but I imagine seeing Intel keeping HT enabled on the high-end SKUs. Also I think both of them want to switch FP calculation over to GPUs.
  • slickr - Wednesday, October 12, 2011 - link

    Yeah I agree. To me Bulldozer comes like 1 year late.

    Its just not competitive enough and the fact that you have to make a sacrifice to single threaded performance for multithreaded when even the multithreaded isn't that good and looses to 2600K is just sad.

    They needed to win big with Bulldozer and they failed hard!
  • retrospooty - Wednesday, October 12, 2011 - link

    Ya, it seems to be a pattern lately with the last few AMD architectures.

    1. Hype up the CPU as the next big thing
    2. Release is delayed
    3. Once released, benchmarks are severely underwhelming
  • JasperJanssen - Wednesday, October 12, 2011 - link

    4. Immediately start hyping up the next release as the salvation of all.
  • GatorLord - Thursday, October 20, 2011 - link

    It looks to me like BD is the CPU beta bug sponge for Trinity and beyond. Everybody these days releases a beta before the money launch.

    Hence the B3 stepping...and probably a few more now that a capable fab is onboard with TSMC. BD is not a CPU like we're used to...its an APU/HPC engine designed to drive code and a Cayman class GPU at 28nm and lots of GHz...I get it now.

    Also, the whole massive cache and 2B transistors, 800M dedicated to I/O, thing (SB uses 995M total) finally makes sense when you realize that this chip was designed to pump many smaller GPGPU caches full of raw data to process and combine all the outputs quickly.

    Apparently GPUs compute very fast, but have slow fetch latencies and the best way to overcome that is by having their caches continously and rapidly filled...like from the CPU with the big cache and I/O machine on the same chip...how smart..and convenient...and fast.

    Can you say 'OpenCL'?
  • jleach1 - Friday, October 21, 2011 - link

    I don't see how this can be considered an APU, This product isn't being marketed as a HPC proc., and i don't see the benefit of this architecture design in GPGPU environments at all.

    It's sad...i've always given major kudos to AMD. Back in the days of the Athlon's prime, it was awesome to see david stomping goliath.

    But AMD has dropped the ball continuously since then. Thuban was nice, but it might as well be considered a fluke, seeing as AMD took a worthy architecture (Thuban) and ditched it for what's widely considered as a joke.

    And the phrase "AMD dropped the ball" is an understatement.

    They've ultimately failed. They havent competed with Intel in years. They...have...failed. After thuban came out i was starting to think that the fact that they competed for years on price and clock speed alone was a fluke, and just a blip on the radar. Now i see it the opposite way...it seems that AMD merely puts out good processors every once in a while...and only on accident.
  • medi01 - Wednesday, October 12, 2011 - link

    Well, if anand didn't badmouth AMD's GPU's on top of CPU's, we would see less "fanboys" complainging about anand's bias.
  • vol7ron - Wednesday, October 12, 2011 - link

    By badmouth do you mean objectively tell the truth? Do you blame PCMark or FutureMark for any of that? Perhaps if all the tests just said that AMD was clearly better, it wouldn't be badmouthing anymore.

Log in

Don't have an account? Sign up now