Comments Locked

181 Comments

Back to Article

  • jeremyshaw - Tuesday, May 25, 2021 - link

    Something I'm not quite catching with the DSU, does it allow for different configurations that we've already seen? Something like the 8xA78C we saw announced a while back?
  • jeremyshaw - Tuesday, May 25, 2021 - link

    *than, sorry
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    They show 8x X2 configs, so I'd be shocked if 8xA710 was not also on the menu.
  • igor velky - Tuesday, May 25, 2021 - link

    first two slides on page 5 will give you answer,

    both slides show cpu cores inside one cpu cluster
    first slide shows different cores,
    second shows only one type of core in cpu cluster

    on page 6 because of bad formatting there are two slides looking like one picture

    so second slide, bottom half of first picture
    shows you that you can put max 8 cpu clusters to one chip.

    so you can have
    max 8 cpu cores per cpu cluster
    times
    8 cpu clusters per one chip.

    you choose cores, you choose how many cores, you choose which type of cores, you choose how many memory channels, you choose how many and what type of additional accelerators you put inside chip...

    because youre apple, samsung, qualcomm...
    and you choose this things and let someone to "etch it" into silicon.
    and you then sell it.
  • melgross - Tuesday, May 25, 2021 - link

    Well, Apple doesn’t “choose” cores, they design them from scratch.
  • Linustechtips12#6900xt - Wednesday, May 26, 2021 - link

    ehhh, they get the IP for cores like the x1 or a76 then they tweak them either a lot or a little and create their current "firestorm/Icestorm" cores
  • michael2k - Wednesday, May 26, 2021 - link

    Sure, they tweak them a lot, just like I tweaked your post a lot to make my own. The A13 released in 2019 was an 8 wide CPU; in comparison the state of the art A76 at the time was only a 4 wide CPU. That’s a pretty big deal.

    The X1 has an 8 wide dispatch, meaning it can issue 8 Mops per cycle but only decode 5 instructions per cycle. This is 2 years after Apple released the A13 which was 8 wide dispatch and decode. If you look at Anandtech’s A14 article you see that Apple has made the Icestorm cores roughly equivalent to an A76 since it is a 3 wide out of order design.

    You can read more here:
    https://www.anandtech.com/show/16226/apple-silicon...
  • mattbe - Wednesday, May 26, 2021 - link

    This is complete BS. They license the ISA from ARM. They DO NOT USE OR TWEAK cores like the X1 and A76 to create their firestorm/ice storm cores. These are information that can easily be verified so it's pretty ignorant for you to make those claims.
  • FunBunny2 - Wednesday, May 26, 2021 - link

    " They DO NOT USE OR TWEAK cores"

    near as I can tell, most 'innovation' in cpu design/engineering has been, for years, throwing ever expanding transistor budgets (can we expect that to continue?) at register width, path width, buffer/cache width and number, pulling off-chip function on-chip. and the like. if Apple should ever publish the full spec of one of these chips, will we see that they've done anything more 'innovative' than Bigger, Wider, More?

    all of the 'innovation' cited by michael2k fits that bill.
  • mode_13h - Thursday, May 27, 2021 - link

    > if Apple should ever publish the full spec of one of these chips,
    > will we see that they've done anything more 'innovative' than Bigger, Wider, More?

    You don't get perf/W numbers like Apple's by simply doing "bigger, wider, more".

    There's information out there about some of their tricks, if you're willing to look for it. But I understand that it takes work and why do that, when you're perfectly content in your belief that there's nothing new under the sun?
  • ChrisGX - Thursday, May 27, 2021 - link

    Yes, @melgross, @mattbe and @mode_13h are absolutely right. Apple has an architectural license from ARM, viz. a license for the ARM ISA rather than any physical IP. Not deterred by that some individuals commenting here seem to want to suggest that Apple has infringed on ARM's IP or somehow by nefarious means has acquired crucial information about proprietary tech found in ARM chips without stumping up the cash for it. These suggestions are pathetic. If a patent infringement is being alleged please tell us the patent number so that we can determine for ourselves whether there really has been a patent infringement. Or, is a criminal conspiracy with other parties to steal trade secrets from ARM being asserted? There is an obvious problem with that idea. Does anyone seriously suppose that ARM would fail to have Apple before a court demanding a huge settlement for theft of trade secrets, if it had any reason to think that Apple had been engaged in such an exercise? Uninformed individuals are just making up things that chime with their sense of how things must be. Hmm...here's a thought. If you know so little about a topic that you wouldn't be willing to stake your reputation on it or swear to in a court, say, then perhaps saying nothing on the topic would be a better choice than pretending to possess knowledge that you so obviously don't possess.
  • mode_13h - Saturday, May 29, 2021 - link

    > Uninformed individuals are just making up things that chime with
    > their sense of how things must be.

    Welcome to the world of internet comment forums.

    > If you know so little about a topic that you wouldn't be willing to stake your reputation on it

    We don't do "reputation". Everybody is on equal footing, here. Just challenge them with facts, references, and sound logic.
  • jeremyshaw - Tuesday, May 25, 2021 - link

    Thanks SarahKerrigan, igor velky. I was mostly thinking of configurations we didn't commonly see. We have seen 4xLITTLE, 2xbig.4xLITTLE, etc even the 8xA78C. The slides on page 5 cover setups we have seen before. Mostly curious if the fabric is tied to specific configs like was implied at the 8xA78C launch, or if it's flexible enough to have, say, two X2, two A710, four A510, or something like one X2 with four A510 (like Intel's Lakefield), etc. IMO, there are a lot of embedded controllers that don't need a lot of CPU throughput, but can benefit from one faster core for UI.
  • Kangal - Saturday, May 29, 2021 - link

    I'm more interested in seeing a 3+5 design.

    The "Large Cores" just aren't good on a phone, a tablet maybe, not on a phone. We're already getting throttling on the "Medium Cores" (eg Cortex A78/A710). And most tasks on Android are handled great in Dualcore mode, and very few in Quad-core mode, when looking at the schedulers. So Three Medium Cores will offer 95% of the performance of your regular flagship processor. Extending the Small Cores to a group of five, also can help efficiency by having more performance in the lower zone, reducing the amount of times the large cores need to be stressed.

    However, with what was announced today, we can actually expect a REDUCTION in 2022 ARM processors compared to 2021 ARM processors. I mean we're talking about 10% gains in X2, 10% gains in A710, and 1% gains in A510, when compared to a design that should be on a better node with better cache. That's not guaranteed with the continuing Chip Shortage. IN FACT most chipmakers are willing to "cheap out" and simply use the marketing of "running on ARMv9" to justify the higher cost and lower performance.

    They stuffed up with the naming scheme btw. And they really stuffed up by not removing 32-bit support completely. And they stuffed up with not doing a blank-sheet approach, for a revolutionary ARMv9 design. We're going to see the smallest gains in Android Phones, just like it happened when people were comparing the QSD 800/801/805 to the QSD 808/810 (Cortex A57) back in 2015. Which hopefully means ARMs other divisions in UK/France can pick the slack and come with a proper successor. This would be the Cortex A72 to their Cortex A57, a la, 2022 A710 versus the 2023 A730. Though I doubt the little cores will get any improvement besides a 10% bump due to the node lithography improvements.
  • psychobriggsy - Monday, June 21, 2021 - link

    Theoretically this should support 16 A510s (8 clusters), as each cluster shares a port on the interconnect.

    We may see 2X 4B 4L configurations (10 cores) one day, but in the main I guess we're stuck with 1X 3B 4L (8L?) options. I see budget chips using 4L+4L (wider FP on some).

    Wonder if there's room for an A310 chip (4 int cores per cluster, 1 shared FP, 2-wide).
  • docola - Tuesday, May 25, 2021 - link

    does the shift to 64 bit cpus and apps mean that todays phone will start
    becoming obsolete starting next year?
  • iphonebestgamephone - Tuesday, May 25, 2021 - link

    If you are on a 32 bit phone yeah
  • docola - Tuesday, May 25, 2021 - link

    fun... so this means i shouldnt buy an expensive phone for another 1 or 2 years,
    because this is gonna be one of those rare REAL shift in tech... sigh....
  • supdawgwtfd - Tuesday, May 25, 2021 - link

    Current phones support 64bit instructions...

    No need to delay.
  • docola - Tuesday, May 25, 2021 - link

    great thanks! i know i sound ignorant in here oh well
  • dotjaz - Wednesday, May 26, 2021 - link

    Where do you even get expensive 32bit phones? There is no REAL shift other than Play Store policy which doesn't even affect end users.
  • mode_13h - Wednesday, May 26, 2021 - link

    Look up you phone specs on a site like gsmarena and see what cores it has. If any are ARM Cortex-A35, A5x, or A7x, then you already have a 64-bit phone.

    Most phones sold for the past 5 years have been 64-bit.
  • RSAUser - Wednesday, May 26, 2021 - link

    Anything launched with lollipop or higher is most probably 64bit, so shouldn't be an issue.
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    A55, But Wider And More Dozery was not what I expected.

    Still, it looks quite decent. Excited to see A710 and A510 in silicon. Not sure how to feel about X2.

    The fun begins immediately! Or in about seven months, as the case may be!
  • eastcoast_pete - Tuesday, May 25, 2021 - link

    I had a somewhat different reaction: the X2 makes some sense, it's a continuation of the X1 performance over efficiency approach, the 710 is the next big "A" core, and the 510 is, as Andrei wrote, a bit underwhelming. To me, it looks like ARM didn't even consider using their A65 design (OOO) and come up with a true contender for the perf/W crown for efficiency cores. Apple remains light years ahead here, and anyone in the non-iOS space is stuck with this attempt to inject some Bulldozer design features into the tired in-order A55 lineage. With no custom ARM-derived cores on the horizon (doubt if Google will surprise us with their custom SoC), what's next? RISC-V?
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    No custom cores on the horizon? What about Nuvia and Ampere's cores?
  • mode_13h - Tuesday, May 25, 2021 - link

    There remains the outside possibility that AMD or Intel decides to enter the ARM race.
  • ikjadoon - Tuesday, May 25, 2021 - link

    I will not yet forgive AMD for binning Jim Keller's K12 design. Qualcomm, Arm, Apple all needed more competition in the perf-watt battle.
  • mode_13h - Tuesday, May 25, 2021 - link

    > I will not yet forgive AMD for binning Jim Keller's K12 design.

    It costs money to bring a chip to market, and AMD was deep in debt. Lisa Su barely managed to keep the lights on, with that Chinese licensing deal. And the market for ARM servers just wasn't ripe.

    Assuming they really couldn't afford to do both (at least, without significant compromises), they definitely made the right call by going with x86.
  • mode_13h - Tuesday, May 25, 2021 - link

    BTW, I agree that I'd love to see how well it compared to other ARM cores of its day, but we can't ignore the practical and business realities.

    I hope AMD will one day reveal more about the K12. That definitely won't happen as long as a potential successor is in the works!
  • ikjadoon - Tuesday, May 25, 2021 - link

    Fair; I'll take a K12 successor as recompense.

    The business side is good context I forgot, but now in 2021, AMD is in much better straits and surely K12's successor is worth a shot.

    https://www.anandtech.com/show/7990/amd-announces-...

    Surely there were great ideas in Keller's work, their team's work, in their post-Styx designs

    AMD might find a lot of benefit in preparing an Arm roadmap. What's to stop consoles, laptops, desktops from switching to Arm, from AMD"s financial perspective? Hopefully, they have clear eyes on x86's relevance to both consumers & businesses. AMD has a knack for fighting back, so I hope the build on their financial momentum.
  • TheinsanegamerN - Wednesday, May 26, 2021 - link

    Compatibility, performance, and existence.

    ARM brings compatibility issues with previously existing software. Emulation wont work 100%,a nd compatibility with existing hardware is a minefield

    With that emulation/compatibility layer comes performance degregations. Sometimes it may not be so bad, other times it will be horrendous. The overall software market is not as tightly controlled as apple's walled garden approach.

    And finally, existence. There is currently no high performance ARM processor in existence. Show me a desktop ARM process ro that could replace a 5900x or a 10900k. How about one that could replace the CPU in the PS5? Currently one does not exist. You could say one exists for laptops, but that is only available for apple.
  • mode_13h - Thursday, May 27, 2021 - link

    > There is currently no high performance ARM processor in existence.

    There are probably a dozen ARM server processors on the market or still in service that would fit a reasonable definition of high-performance.

    > Show me a desktop ARM processor that could replace a 5900x or a 10900k.

    I see you stuck that word "desktop" in there. Desktop is probably the last market ARM would penetrate. So, if your point is that you won't take ARM seriously until there's a competitive ARM-based desktop offering, that's like reaching for the fire extinguisher once you're surrounded by flames instead of when you first smell smoke.

    I'm eager to see what V1-based CPUs look like. Those cores could make for a viable workstation CPU.
  • mode_13h - Tuesday, May 25, 2021 - link

    And don't forget about Chinese designs (although this one is mentioned as being A72-derived):

    https://en.wikichip.org/wiki/hisilicon/microarchit...
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    The KP920 core isn't A72 derived. It says "from A72" but all it's saying there is that its predecessor used A72's - it's not saying the core is derived from A72's.

    That being said, with Phytium and Hisilicon cut off from TSMC, mainland core development may not result in compelling silicon any time soon.
  • eastcoast_pete - Tuesday, May 25, 2021 - link

    Fair point on "no custom cores". However, I don't expect any custom cores from Ampere coming to a smartphone near me anytime soon, and QC seems to want Nuvia's IP mostly for larger systems. Neither strikes me as a source for efficiency cores in the mobile space. QC may incorporate Nuvia's tech into big cores for its SoCs , but I doubt they'd even do that.
  • eastcoast_pete - Tuesday, May 25, 2021 - link

    Addendum: ".. anytime soon" to the end of the last sentence. They probably will try big cores for their SoCs, but I'm afraid they'll pair those with A510 LITTLE cores.
  • mode_13h - Wednesday, May 26, 2021 - link

    > I'm afraid they'll pair those with A510 LITTLE cores.

    As opposed to what? We saw nothing to suggest the A510 is *worse* than A55. And if you're doing ARMv9, then there are no other options (except proprietary).

    Also, why are you freaking out over A510? It's a little underwhelming, but it's not *bad*.
  • mode_13h - Wednesday, May 26, 2021 - link

    > QC seems to want Nuvia's IP mostly for larger systems

    No. Nuvia said they were building server cores, but Qualcomm's messaging around the acquisition was that Nuvia will build cores showing up in mobile SoCs, first.

    They didn't rule out the possibility of larger systems, but that's clearly not their priority.
  • roboman21 - Tuesday, May 25, 2021 - link

    Apple is lightyears ahead and it is due in no small part to this acquisition:
    https://www.anandtech.com/show/3665/apples-intrins...
    This is tough to pull of but it can yield advantages to a competitor with the same ARM core and 7nM semiconductor process.
  • name99 - Tuesday, May 25, 2021 - link

    Inrinsity was about circuit design.
    PA Semi was about microarchitecture.

    There was a *lot* of good stuff in PA Semi! I have looked quickly at quite a few of the Intrinsity patents, but I don't know enough about that level of the stack to have any option as to how impressive they were. (This is not a criticism -- even if all that was picked up from Intrinsity was a number of competent engineers capable of implementing the micro-architecture ideas of the PA Semi folks, that's an essential part of shipping a chip!)
    I'd honestly love someone who is familiar with the circuit level to look at the Intrinsity (low level and PA Semi patents, like for a new register file design) and let us know an informed opinion.

    But as important as both of these has been Apple's willingness to keep pushing the envelope, to keep pouring money into design, to keep taking risks (every design change is a risk...) and not to accept "good enough". That might seem obvious except that, of course,
    - Intel has been cruising on "good enough" for 10 years,
    - QC (notoriously) made "good enough" its official response to the A7, and followed that up by cancelling Centriq, and
    - ARM, for whatever reason, seems to alternate between designs that look like they're trying to at least approach Apple, and designs that feel like "good enough.
  • melgross - Tuesday, May 25, 2021 - link

    Intrinsity was about efficiency. That was what they were known for.
  • mode_13h - Wednesday, May 26, 2021 - link

    > anyone in the non-iOS space is stuck with this attempt to inject some
    > Bulldozer design features into the tired in-order A55 lineage.

    Well, they can have just one core per complex, instead of 2.

    I'm not really sure why the hate, unless you think you're going to be running a lot of FP/vector threads.
  • melgross - Thursday, May 27, 2021 - link

    That was the problem with Bulldozer. They made the same mistake.
  • mode_13h - Saturday, May 29, 2021 - link

    > That was the problem with Bulldozer. They made the same mistake.

    You mean the 2 cores per complex? But ARM is giving customers the option to order up an A510 with just 1 per complex, if you think you need enough FP/vector throughput to warrant it.

    I think a lot of the hate being directed at the A510 is mere guilt by association. It's massively different than Bulldozer, but the sharing of that one feature really seems to have tainted it with all the negative feelings people have towards Bulldozer.
  • lemurbutton - Tuesday, May 25, 2021 - link

    x86 is dead.

    AMD doing 5% to 15% improvements every year.
    Intel doing -5% to 10% every year.

    Meanwhile, Apple & ARM are doing 10 - 20%+ every year and including accelerators like machine learning.

    M1 runs circles around anything AMD and Intel have. M1X and M2 will allow Apple to claim performance wins across all consumer computing devices. Can't wait for the 32/64 core Mac Pros too. It's going to be ugly for AMD/Intel.
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    I would be hesitant to lump in Apple and ARM, given how far apart the highest-performing shipping licensables and the highest-performing shipping Apple cores are.

    ARM is still a long way from matching peak AMD or Intel ST (not merely iso clock, where they do okay, but absolute) in any shipping product, and honestly, neither A710 nor X2 look especially groundbreaking. A510 looks really good, but mixed with a certain amount of "well, about frigging time."
  • ikjadoon - Tuesday, May 25, 2021 - link

    I agree on point 1, sadly. The X1 earns 40 points on SPEC2006 1T Geomean, while the A14 broke 70 points and A13 is 59 points.

    The X2 vs A15 battle will be interesting in terms of power, but the X2 will likely be slower than the A13.

    On the second, isn’t the A510 four years late and it has an almost identical power vs performance curve to the A55? Personally, I thought it was the smallest and saddest announcement today.

    The only genuine A510 improvement is at the A55’s worst position / peak power: 10% faster for 20% less power. That’s four years later.

    The rest of A510 power vs performance is by ramping up the power budget. That +10% perf for -20% power = 37.5% increase in perf-power over four years = 8% perf-power improvements per year. ;(

    If they are sticking with in-order, I hoped the A510 could’ve done something more over four years.
  • Raqia - Tuesday, May 25, 2021 - link

    Apple will rule the roost for the next year, at least until Nuvia's Phoenix cores make their debut some time in the second half of 2022 (that announced timeline likely means the design has taped out...) The cache hierarchy of Apple CPU complexes is simpler and fewer in level than what ARM's is capable of, which reflects the scope of their respective ambitions. ARM's hierarchy hobbles performance at mobile device scales but has much more headroom for supercomputing or server scale compute.
  • Wilco1 - Tuesday, May 25, 2021 - link

    Your numbers are off. AnandTech's SPECINT2006 results are 63.34 for A14 and 41.3 for SD888: https://images.anandtech.com/doci/16463/SPEC-power...

    TSMC 5nm offers ~15% speedup over 7nm, so 3.3-3.5GHz may be feasible (compared to 3.1GHz for SD865+ on 7nm), and that should get Cortex-X2 scores in the high 50's, close to the A14.

    As for efficiency, it's unrealistic to expect major gains when starting from an already very efficient design. It's the same with performance, you can't expect a doubling of ST performance every few years like in the past.
  • Ppietra - Tuesday, May 25, 2021 - link

    I believe that he was talking about the overall SPEC2006 score and not just SPECint. Still he would be wrong about the X1 score, which would be 50 and not 40 (probably a typo).
    Anyway a 16% improvement for X2 over X1 would mean a score of 58 which, like he said, would still be behind the A13 performance core and well behind the 72 score for the A14.
    X1 is already being manufactured at 5nm, so it makes no sense to factor in a transition from 7nm.
  • Wilco1 - Tuesday, May 25, 2021 - link

    Cortex-X1 can reach 3.2GHz in Samsung's 5nm process but the power is too high: https://images.anandtech.com/doci/16463/2100-volta...

    TSMC 5nm is faster and lower power, which allows for higher frequencies. At a conservative 3.3GHz X2 would have a combined score of ~66.7 (only 7% slower than A14).
  • Ppietra - Tuesday, May 25, 2021 - link

    That is not how it works!
    First of all you have no idea what would be the advantage from using TSMC instead of Samsung, so you are just throwing numbers with no substance. Secondly, X1 energy consumption is already very high (it is less efficient than the A14 Firestorm core), so no, there doesn’t seem to be a lot of room to improve X2 clock speed to 3.3GHz. Thirdly even with your assumption you would still have X2 performing worse than a 1 year old core
  • Wilco1 - Tuesday, May 25, 2021 - link

    We absolutely do know. TSMC 5nm is ~15% faster than 7nm at the same power (or 30% lower power at the same frequency). We know that SD865+ achieves 3.1GHz on 7nm and that the frequency gain from A13 on 7nm to A14 on 5nm was around 13%. So 3.3GHz should be feasible on 5nm without increasing power.

    The point is that TSMC 5nm will give a significant perf/power boost (that A14 already benefits from). And that means the gap has narrowed to only one generation rather than 2.
  • melgross - Tuesday, May 25, 2021 - link

    It’s not that simple. The cores would require a bit of a redesign for the different process, and each design would fare differently. Some might get a good boost, and others may not.
  • michael2k - Tuesday, May 25, 2021 - link

    You're comparing the X2 to the A14? I mean, if we're lucky we will see the X2 in 2022 alongside the A16. The A15 will be released this year, in 2021. We already have some X1 baselines:
    https://www.anandtech.com/show/16463/snapdragon-88...

    So in terms of generation:
    2021 X1 not competitive with the 2019 A13 now
    2021 X1 competitive with the 2019 A13 on TSMC 5nm
    2021 X1 not competitive with the 2021 A15 (est 10% boost to hit 70 SPECint)
    2022 X2 competitive with the 2020 A14 on TSMC 5nm
    2022 X2 not competitive with the 2021 A15

    That still sounds like a 2 generation gap to me. The real problem isn't fundamentally the core, but the OEM choosing not to use a 2x2 design (2 X1 and 2 A77) or (2 X2 and 2 A710), so even if the cores get faster each generation, overall performance is hobbled by using 3 medium cores instead of a pair of higher performance X1 or X2 cores.
  • Fulljack - Wednesday, May 26, 2021 - link

    it's cat and mouse, really. Apple release their phones in late Q3, while Samsung S-series are released in late Q1. there's 5 to 6 month difference.
  • Ppietra - Wednesday, May 26, 2021 - link

    Nothing of what you said gives you any data to infer about a transition from Samsung to TSMC.
    SD865+ does not use a X1 core, as such you have no commonality to make that kind jump in analysis, secondly the X1 core already consumes significantly more than the SD865+ core, so clearly there is no much room to increase clock speed from that perspective. If you want to increase clock speed you need to keep power consumption under control.
  • Wilco1 - Wednesday, May 26, 2021 - link

    These are different generations of the same microarchitecture from the same design team with the same frequency capability (as reported by AnandTech). So yes there is obvious commonality.

    We also know this microarchitecture is capable of higher frequencies, for example AnandTech reports Cortex-X1 can reach 3.2GHz. The main problem is power however, which is what limited Cortex-X1 on Samsung's process. TSMC 5nm reduces power by 30% which enables higher clock speeds.
  • Ppietra - Wednesday, May 26, 2021 - link

    actually they aren’t different generations from the same microarchitecture. The next generation for the A77 is the A78. The X1 goes for a bigger core design, and as such consumes more.
    Being capable of higher frequencies doesn't mean that Qualcomm (, etc) finds it viable to use those higher frequencies in a smartphone SoC...
    NODE power reduction is stated for same performance and microarchitecture (which X1 is not) and only as an internal TSMC comparison... The data you give tells you nothing about X1 (already at 5nm) transitioning to TSMC. You are making an analysis based on wrong assumptions.
  • ChrisGX - Thursday, May 27, 2021 - link

    Using Andrei's initial performance estimate for the X1 (clocked at 3GHz) as a guide (47.2 SPECint2006) and ARM's own projection for the peak performance boost offered by the X2 over the X1 (+30% when process optimisations and frequency increases are factored in) we get a rating of 61.4 SPECint2006 for next years X series core. That is really a best case estimate and it is where the good news ends.

    The information we have on the 2022 Cortex cores does seem to lack the transparency of material of this sort that had been issued in earlier years by ARM. It is disappointing not to have any core frequency data in that information. A lot turns on realistic estimations of attainable core frequencies and having a good sense of power dissipation at maximum burst frequencies. We now know that Andrei's performance estimate for the X1 core wasn't borne out in practice (the actual performance exhibited by the highest performing X1 core - the one in the Snapdragon 888 - was 14% lower at 41.3 SPECint2006) owing to pitiful power dissipation characteristics of Samsung's 5nm LPE process that in order to prevent thermal build up and keep power consumption within budget on SoCs incorporating an X1 core SoC designers found it necessary to either a) reduce burst clock frequencies to below 3GHz or b) use restrictive power management controls to damp down chip operations threatening the thermal stability of the SoC that will naturally have the effect of throttling performance.

    Using more realistic assumptions to project the likely performance of the X2 core (which I suspect might still be too optimistic because these forthcoming SoCs that incorporate an X2 core like the flagship Android smartphone SoCs of 2021 are said to be scheduled for production on Samsung fabs which must cast a degree of doubt on ARM's performance projections because recently Samsung's silicon process technologies have demonstrably failed to come up tp ARM's expectations) we get this number: 53.7 SPECint2006 (which represents the actual performance of the highest performing X1 core of 2021 x 1.3). Given the lack of transparency in the ARM data it is possible to entertain a rather broad range of imaginable SPEC peak integer performance ratings for the X2 core. A SPECspeed of 53.7 is my best guess.
  • ikjadoon - Tuesday, May 25, 2021 - link

    These are geometric means of integer & FP scores. Typo for X1, ~50 not ~40. Let's be precise, now that I'm not on a phone ;)

    SPEC2006 1T Geomean:
    A14 Firestorm = 71.72 points
    A13 Lightning = 59.09 points
    SD888 X1 = 49.48 points

    Awarding the X2 the full 16% jump on both integer & FP, we can napkin math this to a Reference X2 = 57.40 at iso-frequency.

    You're right they can (and maybe should) boost clocks so it could surpass the A13, but it's unrealistic to imagine Qualcomm shipping a 3.3+ GHz smartphone in 2022, especially if it's Samsung foundries again, but even on TSMC. Qualcomm's SD888 X1 at 2.84 GHz already ate 9621 joules (SPECint2006) and 4972 joules (SPECfp2006).

    From https://www.anandtech.com/show/16463/snapdragon-88...

    No one expects a literal doubling, but Arm's competition (NUVIA, Apple, Ampere) are not slowing down, either.
  • Wilco1 - Tuesday, May 25, 2021 - link

    OK I get it now - the 40 made me think you meant SPECINT as that is what Cortex-X1 scored.

    3.3GHz on TSMC 5nm is conservative given it has significant performance/power improvements (which the A14 already benefits from), but yes it means switching back to TSMC.
  • Thala - Tuesday, May 25, 2021 - link

    Indeed the performance gains of A14 vs. A13 are mostly frequency driven and much less IPC driven. It is not unreasonable to assume some frequency gains for X2 as well when moving to 5nm TSMC.
  • Ppietra - Wednesday, May 26, 2021 - link

    Thala, looking at the numbers IPC improved 8%, being responsible for 42% of performance growth... so not exactly much less!
    A13 to A14 transition gives no clue about what would happen between X1 and X2 or between Samsung and TSMC... X1 is already at 5nm, and its clock speed is just 5% less than the A14... Their designs priorities are also different, so you cannot infer much!
  • name99 - Tuesday, May 25, 2021 - link

    All true. But of course A14 is an especially easy opponent!
    - it's the last round of what appears to be Apple's 4-year "seriously new micro-architecture" cadence -- A77..A10 as gen 1, A11..A14 as gen 2, A15 et seq as gen 3.

    - it was clearly designed with the single highest priority being get the x86 stuff working. Meaning that anything that did not match that priority (including, eg, optimized physical layout and risky micro-architectural innovations) was punted till the A15

    - it doesn't have SVE/2, which is good for anything up to a 80% speed boost depending on the exact code (even with just compiler vectorization). Averaging over "representative code bases" is a game that's never going to get everyone to agree, but the one attempt I saw to do this came up with an average performance boost of ~30%.
    Given how *low* ARM's numbers are, I assume they're talking about performance in the absence of use of SVE/2? Honestly the whole thing is kinda weird, how little they're pushing the SVE/2 angle and how much one might expect it to improve things.

    I do *not* expect Apple to be as timid... And while Apple tend to shy away from hardware announcements at WWDC, we all know about M1 by now, AND we know that SVE/2 will be in next year's ARM (so presumably in this year's Apple). Meaning maybe there will be some talks about SVE/2 (and other ARMv9 stuff) at WWDC?
  • Wilco1 - Tuesday, May 25, 2021 - link

    You are exaggerating the gains of SIMD in general. Yes automatic vectorization helps, but even if it improves some image transformations by 80%, it's never going to speed up browsing by 30%. Or any other general purpose code. Or SPEC.

    Remember this is 128-bit SVE, so 4 Neon units are about as fast as 4 SVE units. For great SVE performance on HPC code look at A64FX or the upcoming Neoverse V1.
  • name99 - Tuesday, May 25, 2021 - link

    - One hopes that the combination of agnostic length and predicates will allow a lot of code for which vectorization was previously uneconomical (too expensive to mask out fiddly bits, too much overhead in loop prologs and epilogs) to now be handled. We shall see.

    - You are right about the 128b of course; yet another instance of ARM never trying for a stretch goal! I assume Apple will be implementing this as 2x256, meaning, among other things, the path from L1D to LSU grows from 128b to 256b wide, and that's an example of where SVE/2 (indirectly, sure) helps boost performance for everyone.
  • smalM - Friday, May 28, 2021 - link

    Could you please explain why you asume Apple will reduce the number of FP/SIMD units in exchange for widening them. Is that really better? From a FP or from a SIMD standpoint?
    Thanks in advance.

    Making the path from L1D to LSU 256b wide is something Apple could do anyways and is not directly related to SVE2; I was astonished they didn't do it when they added the fourth FP/NEON unit.
  • mode_13h - Saturday, May 29, 2021 - link

    > would you please explain why you asume Apple will reduce the number of FP/SIMD units
    > in exchange for widening them. Is that really better?

    For SVE it's a little better, since it means software having to run loops for about half as many iterations, correspondingly reducing loop overhead.

    > Making the path from L1D to LSU 256b wide is something Apple could
    > do anyways and is not directly related to SVE2

    I don't really see why. It seems like it'd only help for back-to-back reads or writes from/to consecutive addresses. And if 128-bits is already enough to do that, then extra width would be a waste.
  • Thala - Tuesday, May 25, 2021 - link

    You compare only peak performance. ARM has demonstrated that SVE2 can have big advantages over NEON, in particular for computational kernel, which does not parallelize well for NEON.
  • WorBlux - Thursday, May 27, 2021 - link

    >If they are sticking with in-order, I hoped the A510 could’ve done something more over four years.

    In order is hard. The A55 was pretty cool in allowing certain instruction dependencies to be issued together. The traditional way to get more IPC out of in-order is VLIW, but would require an ABI break or at least a special sort of compiler optimization and quasi-long-words that in the end wouldn't do any better than the A55/A510 on legacy and non-optimized code.
  • mode_13h - Tuesday, May 25, 2021 - link

    x86 is indeed on the way out, but your analysis is too facile.
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    Essentially agreed.
  • yeeeeman - Tuesday, May 25, 2021 - link

    x86 maybe dead if you don't understand how and why things stand like they do.

    First of all, Apple is in a very very special situation where they control everything. Hardware, software, product. Plus they use the best process there is at the moment. All of this, contributes to their results. Which are very good, but they stem from what I told you.

    Now, a better picture of what ARM is actually capable of in ... real life is the snapdragon 8cx, which for all intents and purposes is still alive only because qualcomm has a ton of money and can throw it away for projects that don't really sell.

    Apple is using just ARM ISA. If Apple has great performance and great efficiency, it doesn't mean automatically that ARM and the companies that work with them will also reach that point. The truth is, Apple has put a LOT of money and R&D and got the best talents there are to get where they are today. Their cores are not exactly suited for the plethora of android devices that range from 50 bucks to 2000+.

    Now, regarding x86, if you compare amd's zen 3 with m1, you'll see that they are not that far off, in perf and in efficiency. And AMD is using 7nm, not 5nm! Also, nowdays, all the cpus are risc inside, so x86 cpus are very similar inside to arm cpus, with the addition of the extra decoding and micro ops.

    x86 main weakness is also its greatest advantage. Backwards compatibility is very important and needs to stay. ARM cpus lose compatibility totally once in a while, which is not something that will work in the long run.

    Also, don't forget that Intel hasn't introduced anything major since 2015! Ice Lake/Tigerlake are just a bump in execution units over skylake, which on its own brings 20% better IPC. But Intel has stayed still for so many years, that is why ARM has got the chance to close the gap.
  • SarahKerrigan - Tuesday, May 25, 2021 - link

    What? SNC is not merely a bump in execution units from SKL at all. It's a new, wider, more aggressive uarch across the board. SNC is a larger change than SKL itself was, and not by a small margin.
  • boredsysadmin - Tuesday, May 25, 2021 - link

    @yeeeeman - "Also, nowdays, all the cpus are risc inside, so x86 cpus are very similar inside to arm cpus, with the addition of the extra decoding and micro ops."
    Excuse, where did you get this BS? Only Arm, Risk-V, MIPS, and PowerPC are using RISC. x86 from both Intel and AMD are very much still CISC. So, no they aren't very similar in any share and form.
  • Drumsticks - Tuesday, May 25, 2021 - link

    All x86 CPUs crack CISC macro instructions into smaller RISC like operations. The actual execution of the CPU operates on these smaller micro ops. Beyond the initial decode/cracking stage, it's pretty much a RISC operation.

    They are CISC from an architectural perspective, but they've been RISC in execution for some time.
  • vvid - Tuesday, May 25, 2021 - link

    >> All x86 CPUs crack CISC macro instructions into smaller RISC like operations.
    RISC-like is not RISC. It is like saying that a woman with pear-like figure shape is actually a pear.
    x86 uops are pretty much corresponding to CISC ISA now.

    >> but they've been RISC in execution for some time
    RISC-like.
  • mode_13h - Wednesday, May 26, 2021 - link

    > they've been RISC in execution for some time.

    And sadly, Internet Oversimplification Syndrome claims another victim.
  • WorBlux - Thursday, May 27, 2021 - link

    These micro-ops are greatly exaggerated. For instance Gracemont CPU's don't have any. And 4 of the 5 decoders on intel are simple, meaning they only drop one micro-op per instruction.

    Having to deal with a variable length instruction is still a bitch on the front end.
  • mode_13h - Saturday, May 29, 2021 - link

    > Gracemont CPU's don't have any.

    I think you meant to say they don't have micro-op *caches*.
  • Tomatotech - Tuesday, May 25, 2021 - link

    They’re correct. x86 cores have been RISC internally since the Pentium era. They’re black boxes that take CISC instructions, then internally these instructions are converted to RISC for the microprocessors.

    See the Development section of this wiki article for the Pentium. Later chips expanded and further developed the internal RISC parts after the success of the Pentium. Sorry to shatter your illusions.

    https://en.wikipedia.org/wiki/P5_(microarchitectur...
  • Wilco1 - Tuesday, May 25, 2021 - link

    RISC/CISC is only ever about the ISA, never about implementation. Even the very first 8086 uses simpler micro-ops internally in its microcode, but that doesn't make it any more RISC than modern implementations.

    Another common misconception that changing the decoder is all that is required to change ISAs. This is also incorrect since the internals are very different between ISAs.
  • Thala - Tuesday, May 25, 2021 - link

    Precisely. x86 will never escape from the problem:
    - having variable length instructions
    - having less architectural registers
    - having TSO memory model

    And no internal RISC-like microarchitecture will help with above issues.
  • GeoffreyA - Wednesday, May 26, 2021 - link

    "having variable length instructions"

    The main bottleneck of x86 and the part where ARM has the upper hand. Still, it's not impossible that some genius at AMD or Intel could crack the variable-length handicap once and for all. The micro-op cache did much. Something else is still missing.
  • mode_13h - Wednesday, May 26, 2021 - link

    > Still, it's not impossible that some genius at AMD or
    > Intel could crack the variable-length handicap once and for all.

    The only solution I see to that is basically letting the uop cache spill to RAM, so the decoder works more like a JIT translation engine.

    And that only solves *one* of x86's key detriments.
  • GeoffreyA - Thursday, May 27, 2021 - link

    That's a possibility but more work on the OS side. In that case, it might be better to switch to a new, fixed-length ISA altogether.

    If there were some way to index instruction start/end before reaching the decoder. Perhaps the compiler could help but that might break compatibility.
  • mode_13h - Saturday, May 29, 2021 - link

    > That's a possibility but more work on the OS side.

    Yes. The era of "free" CPU performance improvements is coming to an end.

    > In that case, it might be better to switch to a new, fixed-length ISA altogether.

    Well, it's one thing Intel or AMD could do to eke a little more life out of x86-64. I think it's actually not a lot to ask from operating systems.

    > If there were some way to index instruction start/end before reaching the decoder.

    Perhaps the L1 instruction cache could do some preliminary analysis, during fills. They could add a couple extra bits per byte, to hold information subsequently useful to the decoder.

    Or, maybe the decoder could just write back some info to help itself, if it needs to re-decode those same instructions after the corresponding micro-ops have been evicted from the micro-op cache.
  • GeoffreyA - Monday, May 31, 2021 - link

    "Perhaps the L1 instruction cache could do some preliminary analysis"

    Interestingly, some CPUs did mark the instruction boundaries in the cache. Possibly the same principle. If I remember right, the Pentium MMX and some of the Atoms; and on AMD's side, K7 all the way to Bulldozer.
  • mode_13h - Tuesday, June 1, 2021 - link

    > some CPUs did mark the instruction boundaries in the cache.

    Not surprising, other than how far back you say it went.
  • mode_13h - Wednesday, May 26, 2021 - link

    All good points. People who think "ISA doesn't matter" don't really understand everything an ISA encompasses.
  • GeoffreyA - Wednesday, May 26, 2021 - link

    The difference that divides them is that CISC can include a memory operation as part of an arithmetic one, whereas in RISC the two are separate (at least, in a load-store architecture).
  • mode_13h - Wednesday, May 26, 2021 - link

    > The difference that divides them is that CISC can include a memory operation

    I'm not an expert on the subject, but there are other elements in RISC orthodoxy, concerning things like:

    * number of operands (also number of src & dst operands)
    * encoding of immediates
    * side-effects

    I view the whole subject of CISC vs. RISC as something like MMA (Mixed Martial Arts). It turns out that there's no single best classical martial arts style. The most effective fighters use a blend of techniques adopted from various, disparate fighting styles.
  • GeoffreyA - Thursday, May 27, 2021 - link

    "The most effective fighters use a blend of techniques"

    Absolutely. It's almost a universal principle that the winning design puts together the best elements from competing designs and throws away the junk.
  • Thala - Tuesday, May 25, 2021 - link

    It does not matter that they are RISC-like inside, the issue with x86 is that they still carry the typical CISC baggage - and no internal RISC-like structure will help them here.
  • km1810vm4 - Wednesday, May 26, 2021 - link

    I would say x86 baggage. There were much nicer CISC architectures around at the time, like the Motorola 68000.
  • TheinsanegamerN - Wednesday, May 26, 2021 - link

    Intel's p5 processors implemented RISC style micro ops into their x86 decoding, that's part of why they were so dramatically faster then 486. It's not like this is hard to find out....
  • mode_13h - Thursday, May 27, 2021 - link

    > Intel's p5 processors implemented RISC style micro ops into their x86 decoding

    In fact, early CPUs used a lot of microcode, and one of the ways they got faster was by replacing microcoded operations with hardwired logic. This was enabled by ever-increasing transistor budgets.

    > that's part of why they were so dramatically faster then 486.

    This feels like a bit of revisionist history. Here are some of the reasons why Pentium was faster than 80486:

    https://en.wikipedia.org/wiki/P5_(microarchitectur...
  • Kamen Rider Blade - Tuesday, May 25, 2021 - link

    Android needs to get their developers to stop using Java and use C/C++/Rust for their apps to eek out the max performance possible.

    Apple's App code base is generally C/C++, that's why they have the performance that they have.

    And it's a long time nagging issue that I wish the Android community would solve.

    It would give Android a HUGE performance boost to move their Apps over to C/C++/Rust.

    Programming Languages have already been benchmarked against each other to see which one's faster and sorry to say it, but the C/C++/Rust family win the day in terms of Code Speed while using the same hardware platform.

    https://benchmarksgame-team.pages.debian.net/bench...
  • mode_13h - Wednesday, May 26, 2021 - link

    > Android needs to get their developers to stop using Java and use C/C++/Rust for their apps to eek out the max performance possible.

    No, I'm sure Google would rather they use Go.

    Also, unless you compile your C++ to web asm, it has the disadvantage of leaving out users on newer devices not supported by the NDK version where you built your app. Like RISC V, for instance. Interpreted languages and those that compile into a portable intermediate representation don't have this problem.

    > it's a long time nagging issue that I wish the Android community would solve.

    Your best hope is that Web Assembly takes over, then.
  • hlovatt - Thursday, May 27, 2021 - link

    > Android needs to get their developers to stop using Java and use C/C++/Rust for their apps to eek out the max performance possible.

    > Apple's App code base is generally C/C++, that's why they have the performance

    Apple code is mainly Objective-C and Swift (neither are particularly fast).

    > https://benchmarksgame-team.pages.debian.net/bench...

    These benchmarks are largely discredited because they include the start up time in the measurements, which unrealistically hampers virtual machines as used by Java. Its like opening your mailer, typing a couple of characters, and then shutting down your mailer, opening your mailer again, another couple of characters, repeat. Then saying you mailer is slow. Most apps are long running and counting the opening and closing down of the virtual machine for a small task doesn't give useful results.
  • mode_13h - Wednesday, May 26, 2021 - link

    > Apple is in a very very special situation where they control everything. Hardware,
    > software, product. Plus they use the best process there is at the moment.
    > All of this, contributes to their results. Which are very good, but they stem from
    > what I told you.

    They get a benefit from using the latest process, but that doesn't help them relative to anyone else on that same process node. ARM probably does as much work or more to port their IP to a process node & libraries as Apple does.

    They *do* get a benefit from controlling the OS. I'll grant you that. The main thing that can probably help is dialing in clockspeed & thermal management, as well as how load-balancing with the low-power cores is managed.

    However, the rest of it is irrelevant for SPEC scores, because the Anandtech team uses the same compilers and the SPEC source is also the same.

    > Their cores are not exactly suited for the plethora of android devices that range from 50 bucks to 2000+.

    Well, the upper end of that range, yes. That's the biggest thing Apple has in their favor: bigger budgets for bigger cores on newer nodes.

    > ARM cpus lose compatibility totally once in a while, which is not something that will work in the long run.

    Seems like little-to-no burden for ARMv9 CPUs to retain ARMv8 compatibility, though. When they go to ARMv10, that might be a different story.

    > Intel hasn't introduced anything major since 2015!

    If Sunny Cove doesn't count as something new, then I think your standards are unrealistic.

    BTW, if you want bigger micro-architectural changes, try Gracemont.
  • Silma - Tuesday, May 25, 2021 - link

    Apple & ARM benefit from the best foundries in the world, which has not been the case for Intel for at least 3 years.
    If Intel catches up in production tech or gets access to the same process than Apple and Co, we'll see who has the better designs for which workloads.
  • melgross - Tuesday, May 25, 2021 - link

    I do think that Intel’s designs are better than AMD designs. They’re not that much s,owner, when they are, and and is on a smaller, faster node. But as far as Apple’s designs, I doubt it. The designs are too different to make that claim. Additionally, and SoC is far more than just CPU cores. That just a fifth of Apple’s SoC.
  • igor velky - Tuesday, May 25, 2021 - link

    AMD didnt invent multichip modules, those are lies !

    IBM had servers with multichip cpus in like 1985ish
    Intel Core2 had some cpus which were MCM, too.
    ten or so years ago.
  • mode_13h - Wednesday, May 26, 2021 - link

    > Intel Core2 had some cpus which were MCM

    The Pentium Pro had its L2 cache on a separate die.
  • kgardas - Tuesday, May 25, 2021 - link

    x86 is dead? Well, it is, welcome amd64.
    Anyway, I would not consider latest Zen or Sunny Cove/Willow Cove cores as non-competitive even with the latest Apple Mx designs. IMHO they are doing fine. Now, do you know that Alder Lake/Sapphire Rappids will have Golden Cove? And that should arrive this and early next year probably. The core should again provide quite nice bump in IPC. So both ARM and even Apple will have again more than adequate competition. No, neither intel nor amd are dead. Pretty exciting times ahead...
  • GeoffreyA - Tuesday, May 25, 2021 - link

    Oh boy, here we go again. x86, dead. Apple M1, enchanted stuff. Intel/AMD, rubbish for the dump. All hail, Apple!
  • Silver5urfer - Wednesday, May 26, 2021 - link

    Logically their tunnel vision has only 2 possible reasons

    One - Apple hardcore fans and somehow their daily tasks and lives rely only on Mac OS or iOS, ignorant on the reality and dumb to believe SPEC and Apple marketing PR.

    Two - They hate Intel a lot and also PC platform a lot, have a console probably and a Macbook BGA junk.

    I do not know what else and why would anyone hate x86 processors from Intel and AMD, I do not see any point since they are the PCs we can own today and they will last literally for decades. People are using old school Xeon for home server and old school pre SSE4.2, basically Phenom II and Intel Core 2 Quad Q6600 Processors to play damn latest games with community patches for .exes, then we have the latest HW for PC in HEDT and Mainstream for multiple use cases.

    Why would anyone hate the only processing standard which has excellent backwards compat full blown parts system for DIY and repair etc, and literally choice of your own OS - Linux, Windows and some Intel HW for Hackintosh. Yep they are dumb and ignorant for sure.
  • mode_13h - Wednesday, May 26, 2021 - link

    > I do not know what else and why would anyone hate x86 processors from Intel and AMD

    Love & hate don't enter into it, for many of us. Based on our understanding of the tech, we recognize that x86 is fighting a losing battle. Apple is merely interesting as the foremost of x86' competitors.

    > believe SPEC and Apple marketing PR.

    There are plenty of Mac app benchmarks now, between x86 and ARM-based Macs. It's not just the SPEC scores and PR.

    > People are using old school Xeon for home server

    Sure. More power to them! The picture gets more complicated for laptops, though.

    > to play damn latest games with community patches

    You can only do that with some games, and eventually you have to start dialing back the quality settings, when you go back far enough.

    For sure, x86 will be with us for at least another decade, in some fashion and degree. And the PC gamer will probably be one of the last holdouts.

    > Why would anyone hate the only processing standard ...

    Since you view this in terms of love/hate, why do you seem to hate ARM?

    > Yep they are dumb and ignorant for sure.

    Who said that? I think you're projecting.
  • mode_13h - Wednesday, May 26, 2021 - link

    Do you understand thath GeoffreyA was being sarcastic? I think he was poking fun at the very pitched battle that you seem to be walking right into!
  • Silver5urfer - Wednesday, May 26, 2021 - link

    Another bs as usual, show me any AT Andrei bench here which shows how that garbage M1 scales in SMT, he never includes that CPU in SMT / HT benchmarks and only in ST it shows up showing some perf. And it's not even breaking any AMD or Intel CPU, with TGL Intel clearly demonstrated they are ready for AMD, forger Apple.

    64Core Mac Pros ? haha lmao you think logic and transistors simply can expand as long as Apple can buy shills out, you have look at the density of the chip and the uArch scaling PLUS power planes for such huge amount of cores AND the Power envelope.

    M1 loses out AMD BGA processors and M1X, M2 do not exist today. We can also talk how AMD Genoa is going to increase cores to 80C and if you add SMT on that with on-die Chipset HBM TSMC 5N, it's a bloodbath for HEDT. Period. So Ryzen 5000 will smash the M1 to smithereens and blow it's ashes into air, wait for the Threadripper based on Milan to see even more catastrophic destruction of the M1, how are you even generalizing these high core count AMD and Intel CPUs on all computing devices, smells massive pile of dumbness.

    I wonder what x86 did to you to hate it so much, it brought PC to masses, and it gave you the power of a computer from big rooms to your own room and now we have DIY with full socketed HW to use, ARM garbage has no consumer end ddevices which are popular enough. One can buy the Fujitsu A64FX but it;s super expensive, and Graviton 2 is AWS only, Ampere announced full custom, so expect not possible to buy anytime soon and their old 80C platform is now outdated, what is that ARM is giving and empowering you ? the iPad you used to type or the iPhone ? which have zero filesystem access or the M1 which has no user replaceable components.
  • mode_13h - Wednesday, May 26, 2021 - link

    > I wonder what x86 did to you to hate it so much

    Tech is just a tool. I started out using MS DOS and then I moved on to Windows and Linux. I didn't hate DOS, but it had outlived its usefulness for me.

    > ARM garbage has no consumer end ddevices which are popular enough.

    Laptops, so far. Mini PCs are probably just around the corner. Mediatek is licensed Nvidia's GPU IP and has talked about building ARM-based gaming machines.

    Intel and AMD could also get into the ARM race, and they could eventually make socketed processors. Certainly, any server processors will be socketed, but I mean for DIYers.
  • melgross - Thursday, May 27, 2021 - link

    You have serious problems.
  • mdriftmeyer - Friday, May 28, 2021 - link

    Genoa is known already at 96 cores and patents have them up to 128 cores.
  • The_Assimilator - Wednesday, May 26, 2021 - link

    According to you ARM-bros, x86 has been dead every year for the past two decades. So excuse me if I don't put much stock in your particular brand of wankery - especially since Arm IPC improvements have hit a wall at the 3GHz mark.
  • mode_13h - Wednesday, May 26, 2021 - link

    > every year for the past two decades

    two decades ???

    > Arm IPC improvements have hit a wall at the 3GHz mark.

    Yeah, it's a fair point but also kind of irrelevant. Wider, shallower cores tend to clock lower. For mobile and servers, that works out better, since perf/W is a key metric for them. It's mostly just desktops and workstations where you have the luxury of clocking as high as you want. Even HPC is really starting to focus on energy-efficiency.

    As for the relevance of IPC and clocks, what really counts is single- and multi- thread performance. The user just cares how fast it goes and potentially how much power it burns or heat it churns out.
  • GeoffreyA - Thursday, May 27, 2021 - link

    It's interesting to see whether Intel and AMD will ever dial back the clocks in their quest for wider, weightier cores.
  • mode_13h - Saturday, May 29, 2021 - link

    Well, AMD's Zen cores have never clocked as high as Intel's, but Zen2 and Zen3 have been enjoying more perf/W and are also wider than Intel's. For Intel's part, they added more width in Sunny Cove.

    Increasing clock speed is a fairly reliable, straight forward way to raise performance over a wide variety of workloads. It's just not great in perf/W.

    And to the extent that narrower cores use less silicon, that make them cheaper to produce.
  • Spunjji - Thursday, May 27, 2021 - link

    Comments saying "x86 is dead" are just as daft as the comments declaring that ARM will never be a threat to x86.
  • mode_13h - Tuesday, May 25, 2021 - link

    What a terrible naming scheme!

    If they didn't want to just start from a blank slate, they should've gone on to letters. So, A7A and A5A.

    Also, given that the X-cores are typically going to be paired with their cousin A-series core, the naming scheme should reflect that relationship. So, maybe the X1 should've been the X78 and the X2 could be the X710 or X7A.
  • mode_13h - Tuesday, May 25, 2021 - link

    Also, why skip 9? A59 and A79 would be a great mnemonic for the first mobile cores to be ARMv9!
  • nandnandnand - Tuesday, May 25, 2021 - link

    I'm fine with the naming scheme.

    For the Cortex-X line, they can just do X1, X2, X3, X4... X-cetera.

    For these new ones, A710 and A510 are the baseline, and they can put out A720, A525, or whatever until they run it up to A799. That could take over a decade if they don't increment the numbers so much. The '7' and '5' let you know these are related to the A78/A55, and the 3 digits lets you know it's part of the brave new world of ARMv9.
  • mode_13h - Tuesday, May 25, 2021 - link

    > they can put out A720

    That could potentially create some confusion about the relationship between A72 and A720.

    > 3 digits lets you know it's part of the brave new world of ARMv9.

    Okay, so create a new numbering scheme! No need to piggy back off the old one, if it's "a brave new world", right?
  • phoenix_rizzen - Tuesday, May 25, 2021 - link

    Would have been a good time to pick new letters. Leave Cortex-A, Cortex-X, Cortex-M etc for Armv8.x.

    Even better, drop the Cortex name, and pick something new for Armv9-based cores.

    X, Y, Z would have been nice for big, middle, little cores.

    Ah well, marketing-droids will do what marketing-droids do. :D
  • mode_13h - Tuesday, May 25, 2021 - link

    Also, A79 would line up nicely with being the last generation of this microarchitecture family.

    Then, maybe the "Sophia" cores could start a new numbering series.
  • GeoffreyA - Thursday, May 27, 2021 - link

    "What a terrible naming scheme!"

    They should battle it out with Intel's Marketing arm to see who's the best in the field of naming.
  • eastcoast_pete - Tuesday, May 25, 2021 - link

    Disappointed in the design choice of the new LITTLE cores. I have the strong suspicion that the IPC comparison of the 510 LITTLE core to the A73 (the 510 getting close to the A73) is with one 510 core per complex, maximal cache and cache bandwidth etc, which, of course, is highly theoretical. After all, the 510s are designed to come in pairs sharing resources for a reason. I am underwhelmed by this design, ARM's own power/perf curves show very little if any difference to A55 until one gets to the high end of the power curve, at which point the 710 big cores would have taken over. Unfortunately, Apple's power/perf crown for efficiency cores remains quite and comfortably safe. As an Android user, however, I remain stuck with ARM's designs, as none of the design houses (QC, Samsung) is even attempting custom core designs for smartphone SoCs. We are seeing the downside of a monopoly here
  • mode_13h - Tuesday, May 25, 2021 - link

    > I remain stuck with ARM's designs, as none of the design houses (QC, Samsung)
    > is even attempting custom core designs for smartphone SoCs.

    Qualcomm is saying they're using their Nuvia acquisition to make new mobile cores.
  • eastcoast_pete - Tuesday, May 25, 2021 - link

    Would be great to see QC roll out new, ARM-based but home-made cores again; however, even if they do, the custom designs will most likely be big cores, which get paired with the 510s. But, maybe QC proves me wrong. That would be nice.
  • nandnandnand - Tuesday, May 25, 2021 - link

    "Because the new complex also only takes up a single interface on the DSU, it also opens up the possibility of designs larger than 8 “cores”, something I hope won’t happen, or hopefully only happens through more middle or big cores."

    Nah, I want a 24-core smartphone posthaste.
  • Kamen Rider Blade - Tuesday, May 25, 2021 - link

    There's no point in putting 24-cores in a SmartPhone, other than to drain your battery faster.

    At the highest end, I think 12-cores in a Top of the line ARM CPU is enough for SmartPhone purposes with this configuration:

    2x BIG; 8x Balanced; 2x little cores.

    That would be enough for most power users to get everything they need out of their CPU's.
  • spaceship9876 - Tuesday, May 25, 2021 - link

    It would be nice if they released a cortex-A35 successor as that is very old.
  • nandnandnand - Tuesday, May 25, 2021 - link

    Plus it could fulfill the efficiency role that A510 apparently fails at.
  • mode_13h - Wednesday, May 26, 2021 - link

    If they're going to continue with that product segment, then ARMv9 will virtually force them to.
  • mode_13h - Tuesday, May 25, 2021 - link

    Can anyone explain the color splotches in the floorplan plots? What are we supposed to glean from those?
  • vvid - Wednesday, May 26, 2021 - link

    >> Can anyone explain the color splotches in the floorplan plots?
    Each color marks specific unit: ALU, FPU, Instruction Decode, Branch predictor, Load/Store, etc
  • mode_13h - Thursday, May 27, 2021 - link

    Thank you!
  • Kamen Rider Blade - Tuesday, May 25, 2021 - link

    So we went from ARM's big.LITTLE

    I prefer the BIG.little stylization.

    to

    BIG.Balanced.little as the new paradigm between ARM Core Types.
  • Fulljack - Wednesday, May 26, 2021 - link

    that's why it's called DynamIQ, you know... as in "dynamic".
  • phoenix_rizzen - Tuesday, May 25, 2021 - link

    Interesting. Wonder if Samsung and/or Qualcomm will be using the A510 as the basis for a smartwatch SoC. A pair of these should provide a huge performance increase over the A7-based SoCs, but use much less power than anything using a "big" core (wasn't there a Samsung watch SoC that used a big core?).
  • EthiaW - Tuesday, May 25, 2021 - link

    An ideal mobile SoC configuration should be 2xX2+4xA710+2xA510. There is only so much background work to do and as many as four little cores do not make sense.
  • Fulljack - Wednesday, May 26, 2021 - link

    I'm thinking of 1×X2 + 3×A710 + 2×A510, and gives more room for GPU.
  • docola - Tuesday, May 25, 2021 - link

    question: does this mean if i buy a mobile phone today,
    that starting within a year from now it will eventually be useless because all
    apps will be moving to 64 bit, which my phone wont support?

    Or will my phone have access to plenty of the man 32 bit apps for 3-4 years to come?
    (if thats the cas then i think i'll buy a stupidly cheap phone till next year)

    thanks~
  • phoenix_rizzen - Tuesday, May 25, 2021 - link

    Android phones have supported 64-bit OSes and apps since the Snapdragon 810, many many years ago.

    Android stopped accepting new 32-bit apps into the Play Store in 2019.

    Android has essentially been 64-bit only for over 2 years now.
  • mode_13h - Wednesday, May 26, 2021 - link

    > Android phones have supported 64-bit OSes and apps since the Snapdragon 810,
    > many many years ago.

    You mean 8xx. I got a Nexus 5X in like 2015 that had a Snapdragon 808 with 2x A57 and 4x A53.
  • docola - Tuesday, May 25, 2021 - link

    does the shift to 64 bit apps mean that todays phone
    will start being unable to run apps next year?
  • Wilco1 - Tuesday, May 25, 2021 - link

    No. Pretty much all phones are 64-bit today and thus support 64-bit apps already.
  • Silver5urfer - Tuesday, May 25, 2021 - link

    What's the use when all of these end up in planned obsolescence devices which have a max life of 2-3 years. They should make this "Days charging" whatever into reality by making the phones with removable batteries.

    As for laptops, same thing but different skin. Most of the BGA laptops will die fast because of their Heatsink and non replaceable components and high heat due to thin and light designs (mostly for x86) and then the Batteries for all those machines after market there's no way anyone can make use of their HW for more years, esp if the HW is all soldered. For eg an MXM laptop can take many generations of the GPUs it used to be the case for most machines until now since nowadays Turing based GPUs Quadro cards are also non standard.

    So all in all get excited for same performance benefits that user will see, my SD835 phone is quick and fast and reliable yeah SD888 would be definitely faster but how much it would impact in the normal tasks of Maps / Browser / Videos / Music ? Games maybe but I don't play on smartphones. I presume it is same for all those SD855, 865 phones. Even the iPhones from A11 and up.

    Bonus we don't get to control even 1 bit with hardcore locks on phones from OS level Filesystem nerfs from Goolag to the HW side of having no 3.5mm jacks and SD slots. But yea people love to get excited for new shiny stuff.
  • Silver5urfer - Tuesday, May 25, 2021 - link

    Forgot that 32bit only thing. Apple does it everyone follows lol, on Phones the apps may be very limited for 32bit I do not know how many but some useful software might still be not updated but it might work, there is an app called DiskUsage that makes the WinDirStat type look and it's blazing fast plus works with root too. unfortunately that might be not working in the latest garbage Android 12 since it won't have storage access with SAF and Scoped disaster. There could be others..

    Thankfully it's not Windows which has a ton of applications on 32Bit. So pretty much some of the phones are getting outdated with this 64Bit only bs change, but since it's anyways planned obsolescence it won't matter by the time 64Bit only CPUs come like this phones which support 32bit would be dead, barring those amazing phones which have custom rom with removable batteries, they should be used for nice souvenirs if there's a DAC then like a DAP.
  • GeoffreyA - Thursday, May 27, 2021 - link

    "Apple does it everyone follows"

    Popular opinion (more so when led by Apple) is like a wildfire, sweeping everything in its path.
  • mode_13h - Wednesday, May 26, 2021 - link

    Blame consumers. Serviceable devices are more bulky and add some cost. In particular, the modular phone concept has been pursued a couple times, but never got traction.

    I don't like disposable hardware, but that's a sad reality of phones, tablets, and ultra-portables. I wanted to do the next best thing and use a phone with open software, but MeeGo and Firefox Phone both failed and none of the current Linux-based phone projects seem to be gaining much traction.

    FWIW, my current phone does have a replaceable battery, which I had to do after the original one started swelling. It's not *easily* replaceable, but you can still do it or pay someone a little money to do it for you.
  • melgross - Thursday, May 27, 2021 - link

    Serviceable mobile devices are also less reliable, and have smaller batteries.
  • del42sa - Wednesday, May 26, 2021 - link

    Bulldozer strikes back . LoL

    PS: it´s interesting design though
  • GeoffreyA - Thursday, May 27, 2021 - link

    Shows that AMD wasn't completely bonkers.
  • del42sa - Thursday, May 27, 2021 - link

    it wasn´t a bad idea or bad arch. it was just very bad implemented
  • usiname - Wednesday, May 26, 2021 - link

    a510 vs a73 - 25% slower with 35% better power efficiency, 4 years of inovations for you.
  • ET - Wednesday, May 26, 2021 - link

    I think it's clear from the DVFS slide on the A510 page of the article that the main difference between the A55 and A510 is that the A510 can reach much higher performance with much higher power use.

    So yes, anyone expecting much better efficiency will be disappointed.

    A510 does lead the way to pure "small core" designs that reach reasonable performance on one hand and low power on the other hand.

    Still, I haven't seen a figure comparing core size, and that would be a very important measure.
  • mode_13h - Thursday, May 27, 2021 - link

    > anyone expecting much better efficiency will be disappointed.

    Efficiency can come from a smaller process node. The A55 and A510 were compared at ISO-process.
  • RSAUser - Wednesday, May 26, 2021 - link

    Basically interesting for cases when you don't want to add an A73, e.g. It's pretty big news in the watch space where it's been the same 4/5yo architecture for a very long time.
  • mode_13h - Thursday, May 27, 2021 - link

    > It's pretty big news in the watch space

    I'm actually surprised people are even using A55s in smartwatches, or that ARM is targeting the A510 at them. I'd figured the most they could get away with would be the A35.

    I guess pairing a couple A55s with some A35s might be a way to get responsiveness *and* battery life. Is that something people do?
  • mode_13h - Wednesday, May 26, 2021 - link

    It'd be interesting to see how efficient the A73 would be, if you dropped its clock to match the A510's performance.
  • AntonErtl - Thursday, May 27, 2021 - link

    Yes. ARM gives some flowery wordings for the lower performance of the A510 compared to A73 (and Andrei reworded in the way ARM wants us to think: "very similar IPC and frequency capabilities whilst consuming a lot less power"; looking at the numbers given by ARM, the A510 has >20% less performance than the A73, at 35% less power. The DVFS stuff I have seen makes me expect that the A73 has the same or lower power at the same performance, if you lower the clock by 20% (or whatever the slowness factor of the A510 is).

    Andrei already showed us in his Exynos 9820 review that the A75 has better Perf and Perf/W for nearly all of the performance range of the A55. So I find it surprising that ARM went for another in-order design for the little core of ARMv9, instead of something like an ARMv9-enabled A75. For me it will certainly be an interesting microarchitecture to study, but I guess it will take some time until it appears in some Odroid or Raspi board.
  • mode_13h - Saturday, May 29, 2021 - link

    > the A75 has better Perf and Perf/W for nearly all of the performance range of the A55.
    > So I find it surprising that ARM went for another in-order design for the little core of ARMv9

    You're forgetting about PPA, though. The A510 is probably a lot smaller (ISO-process) than the A75.

    > I guess it will take some time until it appears in some Odroid or Raspi board.

    Look for A76-enabled SBCs late this year or early next. Rockchip's RK3588 will have 4x A76.

    Raspberry Pi will probably be stuck on A72 or A73 for a couple more generations, since they plan to stay on 28 nm, for a while. Meanwhile, the Allwinner SoC in ODROID's N2 is made on 12 nm.
  • AntonErtl - Sunday, May 30, 2021 - link

    Looking at the Exynos 9820 die shot, te A55 is ~3.4 times smaller than the A75, but it also has ~3.4 times lower top performance and a similar factor at the lowest common perf/W point, and from the looks of the line, in between. I doubt that the A510 is better in perf/area. But maybe it's the difference that ARM is claiming between the workloads Andrei used for evaluating performance (SPEC CPU2006) and what the A55 and A510 are doing in practice; if they mainly wait for peripherals, I can believe that their performance does not matter much.

    Thanks for the info on SBCs to be expected.
  • Wereweeb - Wednesday, May 26, 2021 - link

    I'll ignore all the warfare in the comments, and just say this: imagine a 16-'core' A510 SoC. Sorry.
  • mode_13h - Wednesday, May 26, 2021 - link

    So, if you built a HPC CPU with A510 @ one core per complex, 2x 128-bit SVE2, and max L2 cache, how would area-efficiency (PPA) and power-efficiency (PPW) compare with a V1-based chip on the same node?

    Let's assume the workload has enough concurrency to scale up to all the A510 cores, and that there's enough ILP that the A510's lack of OoO isn't a significant impediment.
  • Shakal - Thursday, May 27, 2021 - link

    Pardon my ignorance but what exactly is an "Alternate path predictor"? They mention that for the X2 core but I've not found any reference to what it is. I've heard of path based predictors but how does the alternate come into play?
  • ballsystemlord - Friday, May 28, 2021 - link

    Spelling and grammar errors (there are lots!):

    I read through everything but the conclusion.

    "From a microarchitectural standpoint this is interesting as it means Arm will have been able to kick out some cruft in the design."
    "has", not "have" and subtract "will":
    "From a microarchitectural standpoint this is interesting as it means Arm has been able to kick out some cruft in the design."

    "Even though it's a in-order core,..."
    "an" not "a":
    "Even though it's an in-order core,..."

    "...and since then we haven't had seen any updates to Arm's little cores, to the point of it being seen as large weakness of last few generations of mobile SoCs."
    You need an "a" and subtract "had":
    "...and since then we haven't seen any updates to Arm's little cores, to the point of it being seen as a large weakness of last few generations of mobile SoCs."

    "The new design if a clean-sheet microarchitecture from Arm's Cambridge team which the engineers had been working on the past 4 years, ..."
    "is" not "if":
    "The new design is a clean-sheet microarchitecture from Arm's Cambridge team which the engineers had been working on the past 4 years, ..."

    "... the performance impact and deficit is said to only a few percent versus having a pipeline dedicated for each core."
    Add a "be":
    "... the performance impact and deficit is said to be only a few percent versus having a pipeline dedicated for each core."

    "The dual-ring structure is used to reduce the latencies and hops between ring-stops and in shorten the paths between the cache slices and cores."
    "to", not "in":
    "The dual-ring structure is used to reduce the latencies and hops between ring-stops and to shorten the paths between the cache slices and cores."

    "Architecturally, one important change to the capabilities of the DSU-110 is support for MTE tags, a upcoming security and debugging feature promising to greatly help with memory safety issues."
    "an" not "a":
    "Architecturally, one important change to the capabilities of the DSU-110 is support for MTE tags, an upcoming security and debugging feature promising to greatly help with memory safety issues."

    "The SLC can server as both a bandwidth amplifier as well as reducing external memory/DRAM transactions, reducing system power reduction."
    "serve", not "server" and "consumption", not "reduction":
    "The SLC can serve as both a bandwidth amplifier as well as reducing external memory/DRAM transactions, reducing system power consumption."

    "Overall, the new system IP announced today is very interesting, but the one question that's one has to ask oneself is exactly who these net interconnects are meant for."
    Excess "'s". Refactoring makes more sense.
    "Overall, the new system IP announced today is very interesting, but we have to ask who exactly these net interconnects are meant for."
  • ChrisGX - Monday, May 31, 2021 - link

    There is one part of Andrei's analysis of the X2 core that I don't get. I do get the scepticism about ARM's optimistic estimate of a 30% lift in peak performance being on offer given the dismal underperformance of Samsung's 5nm silicon but my reading of what ARM has said is that the 16% performance gain for the X2 is ISO process, i.e. on the same silicon process at the same power and frequency. Am I wrong to read this as (effectively) an IPC gain without an energy cost associated with it? (Let us ignore for the moment such good news will likely be dashed due of Samsung's iffy silicon.) I know that sounds like a very rosy picture but isn't that the picture that ARM painted? In this context I don't get Andrei's suggestion of a lineal increase in power for that peak performance gain.

    Personally, I find the claim of a 16% performance gain hard to believe (and the 30% number after unspecified silicon process improvements and processor clock boost, presumably, even harder to believe). Still, I want to be clear on what ARM is claiming and what I have missed (if anything). Any comments would be welcome.
  • ChrisGX - Monday, May 31, 2021 - link

    I have just reviewed Andrei's analysis again and I note he referred to a power increase (not a lineal power increase in proportion to the 16% performance increase) drawing particularly attention to the increased cache size.
  • ChrisGX - Wednesday, June 2, 2021 - link

    Regarding the projections of a 30% peak performance increase for a premium mobile SoC in 2022 I can't see how to get to that performance number (after a 16% IPC increase) without a) the prime X2 core being clocked at around 3.3GHz - 3.35GHz and b) corresponding silicon process improvements that permit lower voltages (at the increased core frequency). That implies a process that is better than TSMC's N5.

    For an 8 core X2 based SoC for consumer computers that performs at a peak rate of 1.4x the performance of a Core i5-1135G7 (which would represent a truly stunning level of performance) I think the SoC would have to be clocked at around 3.7GHz - 3.8GHz (again on a process that is markedly better than TSMC's N5). Performance like that, of course, won't come without elevating core and SoC power consumption to a significant degree.

    Getting performance outcomes as good as that doesn't seem especially likely to me.
  • mode_13h - Wednesday, June 2, 2021 - link

    Thanks for the analysis. If correct, this could mark the opening of a significant credibility gap, in ARM's projections.
  • ChrisGX - Sunday, June 6, 2021 - link

    I just had a look at the PPA Improvements that TSMC has advertised for its N4 process (there are unconfirmed claims that Qualcomm will be using N4 for the next premium SoC for flagship Android mobile phones) and I don't see ARM's projected performance numbers being reached on that process. N3 would do it but we won't see that before 2H2022. Without inviting thermal problems a performance improvement of 24% at 3.2GHz might fall within the bounds of possibility. (Note: Information on the N4 process is thin but I have assumed 7% more performance will be available at the same power compared to N5. With additional performance improvements of 16% from IPC gains - without pushing the power budget - a performance lift of 24% seems feasible.)

    https://www.anandtech.com/show/16639/tsmc-update-2...
  • rohn287 - Thursday, December 2, 2021 - link

    Just asking, why not use 2-X2 cores with 2 higher clocked A710 and 2 normal A710. This will help reduce heat and increase performance in Android phones. Similar to Apples approach.
  • The Futuristic - Saturday, April 2, 2022 - link

    I know it's too late for comment, but the processors with this core have just entered the market. Depending on them and comparing them with Apple A15 E cores especially, I think they should start using cortex A710 as E cores instead of cortex A55. Apple E cores consumes around 0.44watt, cortex A78 in dimensity 1200 at 6nm uses 1.16W for same performance. So A710 is 30% efficient for same and taking it even further on 5nm. It will close the gap between Apple E cores and cortex cores. So 2x cortex X2 + 4x cortex A710 configured CPU, will catch Apple A15 in multi-core atleast.
  • yeeeeman - Wednesday, May 11, 2022 - link

    we're getting very close to the cortex x3 announcement.
  • yeeeeman - Saturday, June 4, 2022 - link

    seems like arm is missing the end of may announcements this year. anyone knows why?

Log in

Don't have an account? Sign up now