So you estimate 128 ALUs per core/ 256 ALUs total for the A630? I must have read the article wrong because I thought you had estimated 256 ALUs per core /512 total. I was under the impression also that the Adreno 540 was a quad core GPU with 64 ALUs per core/ 256 Total
i think 727GFLOPs is too high for Adreno 630 and i believe it only slightly above tegra x1(>512GFLOPs) after i compare score in 3dmark sling shot extreme unlimited graphics. Adreno 630 maybe only have 384 ALU in total.
A540 is a quad cluster GPU, A630 is dual cluster with 2x more ALU's & it seams A640 is three cluster one on lower clock speed. Also meaning it will have a bit higher mm2 size.
According to wikipedia (unverified source), adreno 630 has 256 ALUs managing 4x 16-bit FMA calculations per cycle yielding 727 gflops at 710MHz. Assuming adreno 640 is similar, it has 384 ALUs managing 4x 16-bit FMA. 1.2x performance puts it at 872gflops, which suggests a 568MHz frequency.
With this generation of mobile SoCs, I'm finally getting pretty curious as to how these compare with typical workloads to regular desktop CPUs. Being able to handle 4K60p encode and decode, etc, means that these have gotta be powerful enough for 99% of the every day uses most people have. Putting them in cases with decent cooling (as opposed to a severely thermally-constrained situation like a cellphone or even a tablet) has really gotta be a very nice solution for a lot of people. Or just dock your phone with something like the DeX.
I often thought about that - especially I wonder if it take hardware to movie pixels more on say a 28in screen than a 7in screen even at same resolution.
Just for information I have Note 8 with DeX and it not that impressive. Of Samsung wants you get a Note 9.
"if it take hardware to movie pixels more" What? And the physical size of the pixels is not important to the device supplying the picture.
4k60 encode / decode does not have anything to do with general performance, those are fixed function blocks in the chips. Although I'm also curious how they would perform in general windows tasks against a variety of laptop and desktop CPU solutions. It's a shame hardly any ARM based solutions outside of smartphones are price competitive against Intel Atom, Core or AMD stuff. Snapdragon 835 / 845 laptops with Windows mode S (I think that's what it's called) going for $800 and more is ridiculous.
For the performance part of the problem, I just found a small interview with Travis Lanier, senior director of product management at Qualcomm, and asked a few questions that everyone should be interested in. 1.L2 of 855 is 512KB 256KB 128KB, L3 2MB, system cache 3MB 2.Adreno640 adds 50% alu, using AI 3.Trepn profiler no longer maintains 4.The peak power consumption of 855 and 845, no specific value, only indicates almost
re: The Prime core being on the same power domain as the other three big cores, though that's not as ideal as it could have been, the other cores waking up doesn't mean they're at max power doing so, right?
No not max power, but they are probably at a higher voltage level than they would need to be on, so consuming more power than they would if they had a separate power plane.
Probably if cores 2,3,4 wake up, core 1 will be slowed down to their speeds and voltage adjusted down accordingly because of total power/TDP, so it does not matter that they are on the same voltage plane.
Someday Apple will have to either make themeselves Baseband which consolidate 802.11ax and ay, or they will need a vendor willing to do this. I am somewhat betting 2019 won't have any Wireless Improvement from Apple.
ay will continue to be useless in phones (60+-9Ghz does not penetrate anything, including your hand and head, and think about all that RF power being actually ABSORBED by your body).
Apple annd everybody else integrating ax is practically certain, it is a straightforward upgrade of ac.
45% faster is exactly in single-core, not multi-core according to previous geekbench leaks (single core in s855 is now 3500-3700 up from 2400-2500 in s845).
Well Web benchmarks will be most interesting as they will show the performance gain per two performance core's and most corresponding to the rest of real world software and actual usage.
Yeah those app launch times seem all sorts of sketchy. The iPhone usually dominates app load times, with the occasional challenger like the Oneplus 6 having temporary wins (i.e before iOS12, and then before the XS)
We're still trying to find out more. Qualcomm is specifically calling it out as a feature, but we're not currently aware of any reason that PBR wouldn't work on any Adreno 600 GPU. It's basically just a combination of shader programs and design principles.
Me by the highest clocked big core is actually designed to be core one of four in order to balance true utilisation to capacity as you know how first core is always most (highest) utilised one & how scheduler is never able to completely balance that. Me be we all have a wrong picture of this in traditional cluster design. Anyway at least like that it would have some sense delivering a bit more actual performance to theoretical one simply by better utilisation. We will see how it works.
slightly OT: What stops desktop GPU vendors (all 3) from including 4K/360 HEVC/MP4/VP9 encode/decode in cmplete hardware when the mobile vendors can do?
Well, for a start, most desktops don't have a need for 360-degree capture...
As discrete video cards, perhaps the argument is that as long as they can do it at sufficient performance in shaders, pure hardware is unnecessary? After all it is extra space. Power and temperature constraints are not as great on desktop which is the number-one driver of this feature.
If there is only one (CPU-demanding) thread active, the common power domain will not hurt much: The three other A76 will be idle and consume little, despite high voltage.
If there are many CPU-demanding threads active, the common power domain will probably not hurt, either: Then you cannot afford high voltage for any of the fast cores, so the fastest core will run below its maxumum voltage and clock.
Where the design may hurt is if there are two CPU-demanding threads active, but one would need to see exact data to know how much. It may not be a big deal.
@Andrei: thanks for this preview; also @Ian: thanks to you and Andrei for coverage of the QC event. @Andrei @Ian: any comments, mention by anybody from QC on why the three lower perf big cores are on the same power domain as the high frequency core, despite the potential to save battery by not doing so? I am probably not the only one whose initial thought was "not again", and remembering the 808 and 810 debacle; I would hope that QC had some really good reasons to leave the 1+3 on the same power domain, and it's not just a case of "we'll fix that next time". @Andrei: Question: How aware does a camera/video app have to be to take advantage of the new kit in the 855, especially the 4K HDR video? I am asking due to some (negative) experiences with recent phones by big name phone makers (here: LG) that simply didn't use many of the features in the QC flagship SoC that the phones had.
Regarding the GPU figures: I hope (think) it might be that QC tries the under promise, then over deliver marketing strategy. They were already king of the Android hill with the 845, so they can afford to lowball their numbers.
"Regarding the GPU figures: I hope (think) it might be that QC tries the under promise, then over deliver marketing strategy. They were already king of the Android hill with the 845, so they can afford to lowball their numbers."
Or 7nm gave them room to optimize, because in termally and/or poer-constrained environments more ALUs at optimal frequency/voltage is always better than fewer ALUs at higher-than-optimal frequency and voltage (because performance scales AT BEST linearly with frequency while power increases much faster, especially if voltage increase is necessary to maintain the higher frequency) in those "embarassingly parallel" workloads.
Eventially 100MHz at 0.01V is going to be the best. :)
"because the cores aren’t running on separate voltage planes it means the actual benefits here in real-world applications are just going to be quite minor. The net result is that the setup is leaving a lot of power efficiency on the table: the voltage supplied to both core groups is always going to be the greater of whatever is being asked for, even if one of the two groups could operate on (much) less voltage."
But the whole package is power limited. When cores 2,3,4 are idle, Prime core can operate at its highest frequency and cores 2,3,4 don't use much power despite being on the same voltage plane. When cores 2,3,4 are busy, I am pretty sure Core 1 will be downclocked and voltage reduced accordingly, so cores 2,3,4 do not operate on voltage higher than necessary. I will be surprised if at 8-core MT+GPU loads the 4 big cores can even sustain 2.42GHz in real power-and-heat-limited phones.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
47 Comments
Back to Article
Desierz - Wednesday, December 5, 2018 - link
"one thing I did note in my Huawei Mate 20 review is that the Pixel 3 and OnePlus 3 still felt faster in terms of application launch time"I think it would be better to actually have numbers, rather than what you 'felt'..
III-V - Monday, December 10, 2018 - link
It's a pre-dive. Who cares?Wardrive86 - Wednesday, December 5, 2018 - link
So you estimate 128 ALUs per core/ 256 ALUs total for the A630? I must have read the article wrong because I thought you had estimated 256 ALUs per core /512 total. I was under the impression also that the Adreno 540 was a quad core GPU with 64 ALUs per core/ 256 TotalWardrive86 - Wednesday, December 5, 2018 - link
The 727 Gflop Fmadds you quoted in the earlier article would assume 512 ALUs at 710 mhzAndrei Frumusanu - Thursday, December 6, 2018 - link
I corrected this, it was a brain fart on my part.icalic - Thursday, December 6, 2018 - link
hi Andrei, could you test Adreno 630 on clpeak GFLOPs benchmark and share the result?https://play.google.com/store/apps/details?id=kr.c...
i think 727GFLOPs is too high for Adreno 630 and i believe it only slightly above tegra x1(>512GFLOPs) after i compare score in 3dmark sling shot extreme unlimited graphics. Adreno 630 maybe only have 384 ALU in total.
Andrei Frumusanu - Thursday, December 6, 2018 - link
clPeak isn't accurate.Wardrive86 - Thursday, December 6, 2018 - link
Thank you for the clarificationZolaIII - Thursday, December 6, 2018 - link
A540 is a quad cluster GPU, A630 is dual cluster with 2x more ALU's & it seams A640 is three cluster one on lower clock speed. Also meaning it will have a bit higher mm2 size.Rudde - Monday, December 10, 2018 - link
According to wikipedia (unverified source), adreno 630 has 256 ALUs managing 4x 16-bit FMA calculations per cycle yielding 727 gflops at 710MHz. Assuming adreno 640 is similar, it has 384 ALUs managing 4x 16-bit FMA. 1.2x performance puts it at 872gflops, which suggests a 568MHz frequency.tyger11 - Wednesday, December 5, 2018 - link
With this generation of mobile SoCs, I'm finally getting pretty curious as to how these compare with typical workloads to regular desktop CPUs. Being able to handle 4K60p encode and decode, etc, means that these have gotta be powerful enough for 99% of the every day uses most people have. Putting them in cases with decent cooling (as opposed to a severely thermally-constrained situation like a cellphone or even a tablet) has really gotta be a very nice solution for a lot of people. Or just dock your phone with something like the DeX.HStewart - Wednesday, December 5, 2018 - link
I often thought about that - especially I wonder if it take hardware to movie pixels more on say a 28in screen than a 7in screen even at same resolution.Just for information I have Note 8 with DeX and it not that impressive. Of Samsung wants you get a Note 9.
Death666Angel - Thursday, December 6, 2018 - link
"if it take hardware to movie pixels more" What?And the physical size of the pixels is not important to the device supplying the picture.
4k60 encode / decode does not have anything to do with general performance, those are fixed function blocks in the chips.
Although I'm also curious how they would perform in general windows tasks against a variety of laptop and desktop CPU solutions. It's a shame hardly any ARM based solutions outside of smartphones are price competitive against Intel Atom, Core or AMD stuff. Snapdragon 835 / 845 laptops with Windows mode S (I think that's what it's called) going for $800 and more is ridiculous.
jasonslg - Wednesday, December 5, 2018 - link
For the performance part of the problem, I just found a small interview with Travis Lanier, senior director of product management at Qualcomm, and asked a few questions that everyone should be interested in.1.L2 of 855 is 512KB 256KB 128KB, L3 2MB, system cache 3MB
2.Adreno640 adds 50% alu, using AI
3.Trepn profiler no longer maintains
4.The peak power consumption of 855 and 845, no specific value, only indicates almost
halcyon - Thursday, December 6, 2018 - link
Gy!halcyon - Thursday, December 6, 2018 - link
I mean Thank You (really need that 5minute edit time window in 2018, @AnandTech).jasonslg - Thursday, December 6, 2018 - link
not me . I just quote media reports.Ian Cutress - Thursday, December 6, 2018 - link
I've been speaking with him today and I got a few more details. Update when I can get a few minutes in front of my laptop.tipoo - Thursday, December 6, 2018 - link
re: The Prime core being on the same power domain as the other three big cores, though that's not as ideal as it could have been, the other cores waking up doesn't mean they're at max power doing so, right?Zoolook13 - Friday, December 7, 2018 - link
No not max power, but they are probably at a higher voltage level than they would need to be on, so consuming more power than they would if they had a separate power plane.peevee - Tuesday, December 11, 2018 - link
Probably if cores 2,3,4 wake up, core 1 will be slowed down to their speeds and voltage adjusted down accordingly because of total power/TDP, so it does not matter that they are on the same voltage plane.iwod - Thursday, December 6, 2018 - link
SD855 App launch time is 2-4x faster than A12 on iOS?Something isn't right.
And no one cares about 45% faster in MultiCore performance. Single Core, give more Cache I am expecting some jump. How far away is it compared to A12?
iwod - Thursday, December 6, 2018 - link
Someday Apple will have to either make themeselves Baseband which consolidate 802.11ax and ay, or they will need a vendor willing to do this. I am somewhat betting 2019 won't have any Wireless Improvement from Apple.peevee - Tuesday, December 11, 2018 - link
ay will continue to be useless in phones (60+-9Ghz does not penetrate anything, including your hand and head, and think about all that RF power being actually ABSORBED by your body).Apple annd everybody else integrating ax is practically certain, it is a straightforward upgrade of ac.
lionking80 - Thursday, December 6, 2018 - link
45% faster is exactly in single-core, not multi-core according to previous geekbench leaks (single core in s855 is now 3500-3700 up from 2400-2500 in s845).haukionkannel - Thursday, December 6, 2018 - link
So the one fast core is 45% faster than in 845 and those 3 not so fast cores Are about the same as 845 or a Little bit faster than those?Trixanity - Thursday, December 6, 2018 - link
Still much faster than 845. It's 'only' cache and clock speed separating the prime core from the performance cores (that we know of).lionking80 - Thursday, December 6, 2018 - link
I would expect that also the medium 3 cores to be around 25% faster than those A75 cores in S845.ZolaIII - Thursday, December 6, 2018 - link
Well Web benchmarks will be most interesting as they will show the performance gain per two performance core's and most corresponding to the rest of real world software and actual usage.tipoo - Thursday, December 6, 2018 - link
Yeah those app launch times seem all sorts of sketchy. The iPhone usually dominates app load times, with the occasional challenger like the Oneplus 6 having temporary wins (i.e before iOS12, and then before the XS)B3an - Thursday, December 6, 2018 - link
The stuff about PBR makes no sense. Many mobile games already use PBR and run perfectly fine on the Adreno 630 for example.You say "more details to come", any estimates on that?
Ryan Smith - Thursday, December 6, 2018 - link
We're still trying to find out more. Qualcomm is specifically calling it out as a feature, but we're not currently aware of any reason that PBR wouldn't work on any Adreno 600 GPU. It's basically just a combination of shader programs and design principles.ZolaIII - Thursday, December 6, 2018 - link
Me by the highest clocked big core is actually designed to be core one of four in order to balance true utilisation to capacity as you know how first core is always most (highest) utilised one & how scheduler is never able to completely balance that.Me be we all have a wrong picture of this in traditional cluster design. Anyway at least like that it would have some sense delivering a bit more actual performance to theoretical one simply by better utilisation. We will see how it works.
tuxRoller - Thursday, December 6, 2018 - link
Adding a third (2.5?) tier that consists of a single fast core should make the scheduling problem a bit easier.matten - Thursday, December 6, 2018 - link
Is it for someone clear that the X50 modem will support the EU mmWave band (26Ghz) or will it still be only the 28Ghz band?mayankleoboy1 - Thursday, December 6, 2018 - link
Qualcomm not talking about CPU performance AND efficiency in loud capital letters is worrisomeI feel a rerun of SD810..
@Andrei Frumusanu : WDYT?
Wilco1 - Friday, December 7, 2018 - link
https://www.anandtech.com/show/13686/snapdragon-85...mayankleoboy1 - Thursday, December 6, 2018 - link
slightly OT:What stops desktop GPU vendors (all 3) from including 4K/360 HEVC/MP4/VP9 encode/decode in cmplete hardware when the mobile vendors can do?
GreenReaper - Thursday, December 6, 2018 - link
Well, for a start, most desktops don't have a need for 360-degree capture...As discrete video cards, perhaps the argument is that as long as they can do it at sufficient performance in shaders, pure hardware is unnecessary? After all it is extra space. Power and temperature constraints are not as great on desktop which is the number-one driver of this feature.
Intel's Kaby Lake can do all three at least for 4K (except not VP9 10-bit encode):
https://en.wikipedia.org/wiki/Intel_Quick_Sync_Vid...
AMD's Vega core can't do VP9 (but if you just want to record arguably HEVC is better):
https://en.wikichip.org/wiki/amd/athlon/200ge#Grap...
AntonErtl - Thursday, December 6, 2018 - link
If there is only one (CPU-demanding) thread active, the common power domain will not hurt much: The three other A76 will be idle and consume little, despite high voltage.If there are many CPU-demanding threads active, the common power domain will probably not hurt, either: Then you cannot afford high voltage for any of the fast cores, so the fastest core will run below its maxumum voltage and clock.
Where the design may hurt is if there are two CPU-demanding threads active, but one would need to see exact data to know how much. It may not be a big deal.
Samus - Thursday, December 6, 2018 - link
This seems really hard to optimize for with the caches all over the place like thatpeevee - Tuesday, December 11, 2018 - link
True. Even UBLAS will have a fit. But then if you do compute on mobile CPUs these days you do it wrong to begin with.eastcoast_pete - Thursday, December 6, 2018 - link
@Andrei: thanks for this preview; also @Ian: thanks to you and Andrei for coverage of the QC event.@Andrei @Ian: any comments, mention by anybody from QC on why the three lower perf big cores are on the same power domain as the high frequency core, despite the potential to save battery by not doing so? I am probably not the only one whose initial thought was "not again", and remembering the 808 and 810 debacle; I would hope that QC had some really good reasons to leave the 1+3 on the same power domain, and it's not just a case of "we'll fix that next time".
@Andrei: Question: How aware does a camera/video app have to be to take advantage of the new kit in the 855, especially the 4K HDR video? I am asking due to some (negative) experiences with recent phones by big name phone makers (here: LG) that simply didn't use many of the features in the QC flagship SoC that the phones had.
Regarding the GPU figures: I hope (think) it might be that QC tries the under promise, then over deliver marketing strategy. They were already king of the Android hill with the 845, so they can afford to lowball their numbers.
peevee - Tuesday, December 11, 2018 - link
"Regarding the GPU figures: I hope (think) it might be that QC tries the under promise, then over deliver marketing strategy. They were already king of the Android hill with the 845, so they can afford to lowball their numbers."Or 7nm gave them room to optimize, because in termally and/or poer-constrained environments more ALUs at optimal frequency/voltage is always better than fewer ALUs at higher-than-optimal frequency and voltage (because performance scales AT BEST linearly with frequency while power increases much faster, especially if voltage increase is necessary to maintain the higher frequency) in those "embarassingly parallel" workloads.
Eventially 100MHz at 0.01V is going to be the best. :)
ballsystemlord - Friday, December 7, 2018 - link
What platform(s) will the AI support? Android? Vanilla Linux? Apple? And which AI tooling will be supported? How about the CV?Thanks!
peevee - Tuesday, December 11, 2018 - link
"because the cores aren’t running on separate voltage planes it means the actual benefits here in real-world applications are just going to be quite minor. The net result is that the setup is leaving a lot of power efficiency on the table: the voltage supplied to both core groups is always going to be the greater of whatever is being asked for, even if one of the two groups could operate on (much) less voltage."But the whole package is power limited. When cores 2,3,4 are idle, Prime core can operate at its highest frequency and cores 2,3,4 don't use much power despite being on the same voltage plane.
When cores 2,3,4 are busy, I am pretty sure Core 1 will be downclocked and voltage reduced accordingly, so cores 2,3,4 do not operate on voltage higher than necessary. I will be surprised if at 8-core MT+GPU loads the 4 big cores can even sustain 2.42GHz in real power-and-heat-limited phones.
Ritesh Benjwal - Wednesday, February 6, 2019 - link
Such a in-depth article. Thanx for sharing.....This is also similar to Snapdragon 855.
You can check... https://techforyouths.com/snapdragon-855/