Comments for Russia’s Elbrus 8CB Microarchitecture: 8-core VLIW on TSMC 28nm

Russia’s Elbrus 8CB Microarchitecture: 8-core VLIW on TSMC 28nm

by Dr. Ian Cutress on 6/1/2020 8:00 AM EST

Posted in
CPUs
28nm
TSMC
Elbrus
SPARC

Post Your Comment
Please log in or sign up to comment.

Comments Locked

93 Comments

Back to Article

erinadreno - Monday, June 1, 2020 - link
The state funded groups had to put all money into chip development. However, to make something useful, paying developers and marketing are also important. None of ARM, AMD, Intel are state-funded, but they almost run everything. Kinda sad that true innovation will likely never happen.
PixyMisa - Monday, June 1, 2020 - link
What is this "true innovation"?

VLIW doesn't work very well for general-purpose workloads, which is why Intel abandoned it.
azfacea - Monday, June 1, 2020 - link
exactly intel/hp and others squandered tens of billions on VLIW and eventually came accept their fate.

1. the compilers are impossible to write as static analysis does not permit the same level of out of order parallelism that a CPU based re-order/scheduler can deliver.
2. even the mediocre compiler performance you do get, is tied to uArch, and have to be thrown away next generation

if you need massive SIMD performance or something thats a diff story, but it doesnt work for general purpose CPU
FunBunny2 - Monday, June 1, 2020 - link
"if you need massive SIMD performance or something thats a diff story, but it doesnt work for general purpose CPU"

yeah, but... since the Big Story these days is Big Data (whether in flat files or RDBMS), which is (nearly?) by definition SIMD, there may/ought to be a significant market. may haps Intel/HP bailed too early?
Freeb!rd - Monday, June 1, 2020 - link
So you think AMD gave up too soon on VLIW4 & VLIW5?
https://www.tomshardware.com/reviews/a10-4600m-tri...

Maybe the cost and efficiency just wasn't there.
AlB80 - Tuesday, June 2, 2020 - link
There is no need to increase single thread performance for GPU. GPUs have massive thread parallelism and pipeline even without speculative execution is enough.
azfacea - Monday, June 1, 2020 - link
big data and AI are two diff things. VLIW is liability for general purpose computing, not an asset. VLIW can work if you are building a custom chip aimed at some AI or video transcoding application or something like that but good luck competing with AI start-ups from 28 nm, nevermind NV, AMD, ....

what russia needs more than anything else is semi manufacturing capacity where they have big deficit right now and possibly much bigger 2mrw. x86 is out of patent, and ARM licenses core IP on the cheap, RISC-V is out there, high performance CPU core designs are proliferating everywhere no need to redefine the general purpose ISA/architecture.
mode_13h - Monday, June 1, 2020 - link
> what russia needs more than anything else is semi manufacturing capacity where they have big deficit right now

Look at how much $$$ China has poured into this, and they're still nowhere close to TSMC or Samsung. How on Earth do you expect Russia ever to develop cutting edge fabrication in any kind of reasonable timescale?
FunBunny2 - Monday, June 1, 2020 - link
if the USofA can strongarm (he, he) ASML not to sell certain places, fabrication is moot.
azfacea - Monday, June 1, 2020 - link
thats percisely why its so important, isnt it? if they dont make their own semi, their fate is in uncle sam's hand. I am not a fan of putin, far be it for me to worry about russia's semi future. but if i am being honest, it should be quite scary for russia to be marching into the age of AI with non-existent or highly vulnerable semi industry.

today they can trade oil barrels for wafers, 2mrw oil may not be there, but tech will not be less important, it will be more important, especially if AI is going to do the thing its probably going to do
brucethemoose - Monday, June 1, 2020 - link
SMIC is making some huge strides. I don't think they'll ever overtake western fabs, but the gap is closing pretty quickly.
mode_13h - Monday, June 1, 2020 - link
No doubt, but how much did it cost to get there? Do you think Russia has that kind of $$$ to throw at the problem, especially when sanctions continue to take their toll?
bagamut - Tuesday, June 2, 2020 - link
Russia has that kind of $$$, much more than Saudis, and ~5th economic in the world to support that, but there is no urgent need to do that now. Semiconductor industry for military and aerospace applications exists and works. Competition with mass production semi is not possible, cuz this is all regulated by politics. China is an example of that.
FunBunny2 - Tuesday, June 2, 2020 - link
"Russia has that kind of $$$"

whaaaaaaa?? Russia's GDP is minuscule: no better 11th globally. New York, Texas, and California are bigger.
mshigorin - Monday, June 1, 2020 - link
We use to deliver asymmetric solutions to problems at hand. For one, there's an interesting development in... well, let it stay interesting for a while ;-)
nolem - Sunday, September 4, 2022 - link
Besides the point, that all semiconductor comapies in russia aree now again working like in the soviet era. They never were able to build something in 10 years tech delays to the west. The only way was to exploit the techs from the states of the Warsaw Pact like the gdr resources until 1989. Therefore i totally agree. They will never be able under this goverment to develop anything beyond 5 years tech delay to the current government. This is rediculous.
mode_13h - Monday, June 1, 2020 - link
IA64 didn't have SIMD instructions. However, if remained a going concern for long enough, they could've added them.
mode_13h - Monday, June 1, 2020 - link
It's funny how many IA64 "experts" there are, on here.

The only person I've seen that seems to *actually* know anything about IA64 is @SarahKerigan.
azfacea - Monday, June 1, 2020 - link
and here we go. the predictable smug *expert* who shows up to educate us, who actually happens to be as clueless as smug they are. but if I were expose you and use some of your own smugness, then you'd start ramping up accusations of rudeness and toxicity. thats the ultimate guaranteed outcome of such threads. I wont play your game and i think i am done here with this thread.

i never said IA64 mr *expert* which is an ISA, i said VLIW. i was referring to the trend of shifting logic from the CPU to the compiler that failed to achieve its goals, precisely for the reasons i mentioned. ... now here we go, i am not doing it ... go talk to the other experts in this thread.
mode_13h - Monday, June 1, 2020 - link
Notice that I didn't call myself an expert, but at least I know more than the 1 or 2 bullet points you seem to have latched onto.

Instead of trying to tear me down, why don't you read up, yourself. Then you can actually be somewhat educated on the matter, next time.

How is it fair to pre-accuse me of playing the "toxicity and rudeness" card? That's not my M.O. I try to focus on the merits of a post. It's only those without merit that take cheap shots.

Speaking of merits:

>>> intel/hp and others squandered tens of billions on VLIW and eventually came accept their fate.

> i never said IA64 mr *expert* which is an ISA

You clearly implied that IA64 was VLIW, which it's not. EPIC was about working around the limitations of VLIW, including those you cited. To that end, it *seems* that this chip from Elbrus is even further behind.
regsEx - Tuesday, June 2, 2020 - link
Argumentum ad verecundiam. Not really a cultural way of doing discussion.
https://en.wikipedia.org/wiki/Argument_from_author...
mode_13h - Tuesday, June 2, 2020 - link
My point was quite simply that a number of people on here like to talk about IA64, but they clearly don't understand it well enough to be making the points they're reaching for.

I'll admit that it's not exactly a polite thing to say, but sometimes the situation calls for a summation or a meta-level comment, rather than getting bogged down in a point-by-point refutation.

It's not a thing I say with any animus. If people want to educate themselves about it, I'd be happy to see that, and it would benefit us all by enabling a more enlightened and enlightening discussion.

As for my "appear to authority", I just wanted to make it clear that I didn't mean for my words to apply to that poster.
mode_13h - Monday, June 1, 2020 - link
Funny thing is that VLIW and SIMD are orthogonal. In fact, if you have a wide enough SIMD, you don't even need VLIW. Kind of like modern GPUs.
mshigorin - Monday, June 1, 2020 - link
Those GPUs aren't exactly general purpose CPUs through that, erm? Try running e.g. bzflag purely on your NVIDIA GPU.
Alexvrb - Monday, June 1, 2020 - link
Except you don't have to run GPUs by themselves. You're using general purpose CPUs plus accelerators geared towards your purpose, and balance them according to your needs. Using multiple designs (as needed) that excel at different tasks is better than a single design that is mediocre at everything.
mshigorin - Tuesday, June 2, 2020 - link
> that is mediocre at everything
This is not true (as generalizations that wide tend to be); my point was that one *can* actually use e2k *without* extra accelerators "as is" for quite a few task classes (which is what happened with x86 before either, recall hardware modems vs softmodems for example), *and* one can add accelerators as well. So far I'd only like a "js accelerator" for my Elbrus 801-PC as I've mentioned in this discussion already, and it's not that bad as it is (with quite some improvement potential on the existing architecture as the developers say).
Come to Moscow when covid19 starts getting old, visit Yandex Museum and witness folks playing games and watching videos in 4K on a 801-PC with RX580 (yes, a GPU), maybe that will help to improve generalization ;-)
mode_13h - Monday, June 1, 2020 - link
Right. I was just replying to @azfacea's apparent conflation of SIMD and VLIW, and citing modern GPUs as an example of "massive SIMD performance" that doesn't involve VLIW.
mshigorin - Monday, June 1, 2020 - link
Both statements are actually wrong.
1. the compilers have a better chance to realize that parallelism during compile-time than half-a-chip CISC decoder has in realtime; and sometimes the pipeline would need to be over 1000 instructions long for the hardware scheduler to be able to predict branches correctly (there's a known case where a 500 MHz e2k v2 would outperform a core2duo at 1.5 GHz or so, for example). You meant something way less more dogmatic, I am sure.
2. I've started e2kv4 works using my existing e2kv3 system -- they differ even in the maximum number of elementary operations per clock (23/25) but the code ran just fine, I've measured about 1% difference between p7zip build for v3/v4 and running on v4 (no optimization by hand though).

If Intel/HP failed at an impossible task, they just lacked those Russian hackers that one is required to have ready for a job. And I know some lcc guys ;-)
azfacea - Monday, June 1, 2020 - link
its funny how i am hearing the same things in 2020 that i was reading in kernel mailing lists in 2003, i guess the whole industry is not as smart as you are and hasnt been able to figure this out in 20 years. of course "You meant something way less more dogmatic, I am sure."
mode_13h - Monday, June 1, 2020 - link
He posted data and you countered with rhetoric? If you're not feeling bad for yourself, I'll feel bad for you.

The classic dilemma of VLIW:

"Runtime is when you know the most about how to schedule code, but it's also too late to do much about it."

I read that somewhere in the run up to Itanium's launch. Anyway, profile-driven recompilation isn't a bad way to split the difference.

Also, not sure where you got the idea about compilers being "impossible to write".

And Itanium failed because "the whole industry is not as smart as you"? Are you sure about that?
mshigorin - Tuesday, June 2, 2020 - link
He also seems to have confused Intel and friends with "the whole industry", amusingly.
We've heard that before -- with "the whole world(tm)" under French command, then German, and now there's a next arrogant outsider on their history lessons. It's not just about IT.
Just to be clear, I am not to be considered smart at all -- learned myself reading before 3, won some olympiads, earned my M.Sc. in chemistry, and by any margin am no compiler developer at all, let alone VLIW/EPIC ones.
But a humble user and distromaker can have their voice on the facts observed, eh? :-)
mode_13h - Monday, June 1, 2020 - link
IA64 wasn't VLIW, it was EPIC - Explicitly Parallel Instruction Computing. The idea is to make the data dependencies explicit, which simplifies dynamic scheduling but doesn't prevent it. Theoretically, EPIC CPUs could OoO and even speculative execution.

In true VLIW, the scheduling is 100% compile-time.
mshigorin - Monday, June 1, 2020 - link
Well Elbrus does speculative execution indeed, it's mentioned in Chapter 6 of the book in question: http://ftp.altlinux.org/pub/people/mike/elbrus/doc...
mode_13h - Monday, June 1, 2020 - link
Sweet! Thank you!
:)
mshigorin - Monday, June 1, 2020 - link
> None of ARM, AMD, Intel are state-funded
Pfft, do you even know the origin of Pentim Pro or, well, the whole ARM affair? In that sense, MCST isn't state funded either.
laurakmullinax - Tuesday, June 9, 2020 - link
Follow guide we have for you and you can make 90 dollars every hour… Our agents earn around $12k a month… Join them… and start working from comfort of your home! All you need is a computer and a internet connection and you are ready to start. Learn how to make a steady income for yourself on following web adress… WWW. iⅭash68.ⅭＯⅯ
Elstar - Monday, June 1, 2020 - link
State-funded technology can work well when the fundamental problem is constrained by research. That being said, once the problem area is well understood, state-funded technology tends to fall behind what the market can produce because state-funded technology is built to please bureaucrats, not customers.

So with this perspective, we shouldn't surprise us that this design uses VLIW, which gets the superficial "MIPS/dollar" that politicians want, but struggles with general purpose code.
Xex360 - Monday, June 1, 2020 - link
Agreed, State run programmes should focus more on cutting edge technologies, where the immediate financial benefits aren't clear or low, it took more than a half a century for the private sector to send people to space with much more advanced technologies and knowledge.
FunBunny2 - Monday, June 1, 2020 - link
"much more advanced technologies and knowledge"

you really think so? the Falcon 9 is the same motor as in the 1957 Vanguard, which blew up 8 of 11 instances. yes, rather than dials and switches there's a GUI, but that's just another example of lipstick on a pig. yes, today's microprocessors allow for returnable boosters, at the cost of lower payload. but without the GPS satellites (that's a guess as I type) and other infrastructure, much of it paid for (and some built) by the Damn Gummint, Falcon 9 would be very little different from the Vanguard. could landing be done simply with inertial guidance? I doubt it, but then I'm not a rocket scientist.
MamiyaOtaru - Monday, June 1, 2020 - link
what, you think we don't have more advanced technologies and knowledge now than we did 50 years ago? Sure the limited instance of a rocket motor has been pretty well solved and was a long time ago, but everything surrounding it has advanced by leaps and bounds, and it still took this long for private industry to catch up to what the government was doing then with respect to space flight, which was his point. What's yours? I'm really not sure.
melgross - Monday, June 1, 2020 - link
Well no, it’s not the “same” motor. There are general classes of rocket motors, and a number of new ones are based on one of those classes, or types. But that’s doesn’t make it the same motor.

If it were the same motor, it would also be blowing up all the time.
FunBunny2 - Monday, June 1, 2020 - link
"But that’s doesn’t make it the same motor."

if you read up the wiki pages for both, you'll see that the only meaningful difference between them (modulo control mechanisms) is size, and thus thrust. the actual motor mechanism and fuel are identical. the point is that Musk/Tesla didn't do anything 'disruptive' or 'innovative' here that couldn't have been done in 1957, had there been the level of cpu power that we have today. and Musk/Tesla had not a thing to do with building that technology, he/they are simply consumer of same.
Wilco1 - Monday, June 1, 2020 - link
You think SLS has more innovation? That literally reuses the Space Shuttle engines and boosters. And yet they are still many years away from a launch. All that reuse makes certainly things easy and quick!
Spunjji - Wednesday, June 3, 2020 - link
Confusing SpaceX with Tesla seems an elementary error for someone making this sort of comment, and last I knew Musk didn't really run SpaceX, GwynnShotwell did.

Confusing the basic design of a rocket motor for the specific implementation - including materials - is equally derpy.

Even if the only difference were the application of "CPU power", figuring out how to apply that computing power to get the intended result would still be an achievement.

In summary: I don't know enough about rockets to summarily declare that you're talking bollocks, but every indicator implies that you're probably talking complete bollocks.
FunBunny2 - Wednesday, June 3, 2020 - link
"Even if the only difference were the application of "CPU power", figuring out how to apply that computing power to get the intended result would still be an achievement."

it's called Newtonian physics, which Musk didn't invent, either.
FunBunny2 - Wednesday, June 3, 2020 - link
"last I knew Musk didn't really run SpaceX,"

may be, but he showed up his shiny, smiling face at the launch, didn't he??

as to the 'innovation' of reuse, here's how accurate simple (well, a little bit of control) ballistic 'return' of an object (well, nucular warhead) is:
"This is simply the radius of the circle that the warhead has a 50 percent chance of falling into when aimed at the center. CEP is about 90–100 m for the Trident II and Peacekeeper missiles."
the wiki: https://en.wikipedia.org/wiki/Multiple_independent...

and that was just Newton and 1970's compute. so yeah, Falcon 9 booster landing on a tail of rocket flame is neato, but not, by any means, earth shattering.
FunBunny2 - Monday, June 1, 2020 - link
the 'big deal' is the return of the first stage booster. but that's not the first returned, reused booster. the SRBs of the space shuttle did that decades ago:
"The Space Shuttle Solid Rocket Booster (Space Shuttle SRB) was the first solid-propellant rocket to be used for primary propulsion on a vehicle used for human spaceflight[1] and provided the majority of the Space Shuttle's thrust during the first two minutes of flight. After burnout, they were jettisoned and parachuted into the Atlantic Ocean where they were recovered, examined, refurbished, and reused. " the wiki

the notion that Musk is breaking new ground in space flight is just PR. it's annoying.
bigvlada - Tuesday, June 2, 2020 - link
First let me say that I agree on the PR part. Elon Musk twitter brigade and their reality distortion field reminds me of Steve Jobs and his followers.

There was a gallant attempt at SSTO (single stage to orbit) vehicles even in the Apollo days. Chrysler Serv was on of the contenders for the shuttle role. Astronauts disliked it because they weren't needed for every mission.

Fast forward to nineties and behold the McDonnel Douglas DC-X. SSTO prototype device that did launch and land vertically and needed just a skeleton staff to launch it. That was an suborbital vehicle. It's successor, DC-Y should have been the first true SSTO. But hen, it's wasn't Shuttle so it was killed.

https://en.wikipedia.org/wiki/McDonnell_Douglas_DC...
Spunjji - Wednesday, June 3, 2020 - link
Once again you're talking *around* what the big achievements were - namely that it was a privately funded, liquid-fuelled, propulsive-landing reusable booster that was financially viable and massively undercut other offerings in the industry. The Russian, French and US rocket industries were all saying it wouldn't work properly at the costs quoted until it did, repeatedly.

The funny bit here is that I don't even care that much about SpaceX - it's just that you've clearly decided to be the exact antithesis of all the annoying MuskRats out there on social media, and it's equally silly.
FunBunny2 - Wednesday, June 3, 2020 - link
"it was a privately funded"

hardly. Musk didn't fund the development and then go sell rockets, a la Ford making the Mustang. Musk got the money up front, in the form of a Damn Gummint contract, just like all the contractors that built the rest of the American space program.
mshigorin - Monday, June 1, 2020 - link
Well I'm only a rocket amateur (nitrofilm and gunpowder, ya'know) but Buran got landed automatically under Elbrus guidance, from what I've heard...
nasdaq13 - Sunday, June 7, 2020 - link
We also had a PS-2000. https://computer-museum.ru/histussr/11-1.htm
mshigorin - Monday, June 1, 2020 - link
The only "real world" problem -- or should I say woe? -- is lots of javascrapt and JS JIT being underoptimized on e2k so far (the guys tell that there's still a lot to do there). Said that, I use Elbrus 801-PC for my daily work at Basealt Ltd, and no one forced me to switch from my then-recent i5 based notebook two years ago.
Wilco1 - Monday, June 1, 2020 - link
For comparison, peak double FLOPS per CPU:

Elbrus @ 1.5GHz: 36
Cortex-X1 @ 3GHz: 48
A64FX @ 2.2GHz: 70.4
DanNeely - Monday, June 1, 2020 - link
With Elbrus built on an ancient 28nm process that comparison doesn't look bad. OTOH as a desktop part it's core is probably a lot bigger than ARMs mobile cores are; and it ignores that VLIW for general purpose compute has never really worked in the real world. It can be useful in more constrained scenarios, eg AMD VLIW5 was a good fit for DX9s 5 step setup; but fitting enough ops to fill up the VLIW was hard enough with DX10's more flexible design that later models dropped down to VLIW4; before they dropped it entirely with GCN.
mshigorin - Monday, June 1, 2020 - link
I know a Russian hacker who contested AMD developers with optimizing that code and did it in under three months -- the job they couldn't complete in a year... the commits are out there in radeonsi driver.
And as I've said above, the only real world application where I really with things would run apx. 1.5X faster on 801-PC is javascrapt in fx52 they've ported for e2k so far (with fx68 and some chromium reportedly underway either) -- still e.g. maps.yandex.ru work good for me already, and it's one of the most js-intensive oft-used webapps for me.
mshigorin - Monday, June 1, 2020 - link
*where I really wish, of course :-)
AlB80 - Monday, June 1, 2020 - link
Elbrus 5 gen can issue 6 fma vector instructions (vector width = 2 doubles / 4 floats).
ARM Cortex-X1 can issue 4 fma vector instructions (4 / 8).
Fujitsu A64FX can issue 2 fma vector instructions (16 / 32).
Jorgp2 - Monday, June 1, 2020 - link
How does it compare to Itanium?
mshigorin - Monday, June 1, 2020 - link
It's alive.
EntityFX - Monday, June 1, 2020 - link
Hi there, I've tested Elbrus 4C, 1C+, 8C, 8C2 (CB)

Results here: https://github.com/EntityFX/anybench/tree/master/r...
And Excel results here: https://github.com/EntityFX/anybench/raw/master/do...

# E2K (Elbrus) compare table and some x86-64, arm

|Test |E8C2 |E8C |E2S |E1C+ |Xeon 6128 |Atom Z8350|Orange Pi PC2|
|---------------------|---------|---------|---------|---------|----------|----------|-------------|
|Dhrystone |8 974,78 |7779,40 |3 548,80 |4 302,53 |25 195,31 |4 677,30 |2 949,12 |
|Whetstone |2 037,62 |1 748,37 |970,80 |1 277,55 |5 850,41 |2 085,24 |980,26 |
|Whetstone MP |16 194,00|13 818,00|2 455,00 |1 312,00 |123 854,00|6 636,00 |3 798,00 |
|Coremark |5 510,19 |4 907,57 |2 364,24 |2 901,49 |28 210,73 |6 893,09 |3 869,72 |
|Coremark MP |39 941,90|35 395,62|9 078,68 |2 848,32 |335 312,61|23 814,68 |14 901,28 |
|Linpack |1 269,79 |1 075,27 |674,68 |814,76 |6 105,95 |1 021,44 |163,44 |
|Scimark 2 (Composite)|472,24 |511,43 |- |379,23 |2 427,42 |509,44 |191,59 |
|MP MFLOPS (32 ops/w) |378976 |139265 |35 782,00|15 676,00|343 556,00|10 665,00 |6 033,00 |
mode_13h - Monday, June 1, 2020 - link
Thanks for this.

The last row really confuses me, though. The Xeon 6128 is way faster in everything else, how does the E8C2 manage that win?
EntityFX - Monday, June 1, 2020 - link
void triadplus2(int n, float a, float b, float c, float d, float e, float f, float g, float h, float j, float k, float l, float m, float o, float p, float q, float r, float s, float t, float u, float v, float w, float y, float *x)
{
int i;

for(i=0; i<n; i++)
x = (x+a)*b-(x+c)*d+(x+e)*f-(x+g)*h+(x+j)*k-(x+l)*m+(x+o)*p-(x+q)*r+(x+s)*t-(x+u)*v+(x+w)*y;
}

Compiles good for the e2k (VLIW).
mode_13h - Monday, June 1, 2020 - link
Thanks.

Which compiler? ...or is it proprietary?

Do you have to do any profile-driven recompilation to get good performance on that?

Do you use an loop-unrolling or software-pipelining options or #pragrmas?
mshigorin - Tuesday, June 2, 2020 - link
lcc is proprietary unfortunately, and will likely stay this way -- just as icc :-(
That's a major hurdle but I hear that modern gcc backend effort is also underway (but rather -O0 at the moment or so).
There's PGO (and LTO) already but I'd have to ask my fellow colleague to provide meaningful feedback on those (I'm a chemist after all, and he's a nuclear physics guy ;-).
See also these two chapters:
http://ftp.altlinux.org/pub/people/mike/elbrus/doc... (generic)
http://ftp.altlinux.org/pub/people/mike/elbrus/doc... (e2k)
EntityFX - Monday, June 1, 2020 - link
The e2k ASM example is here: https://github.com/EntityFX/anybench/blob/master/a...
Wilco1 - Monday, June 1, 2020 - link
Woah, that's some serious code bloat there!

The issue is that the compiler options don't enable fma, avx, avx2 or avx512 if available.
My 3700X gets 346372 with the prebuilt binary but with -mcpu=native -mavx2 -mfma I get 865520.

So assume all of these results are inaccurate.
Wilco1 - Monday, June 1, 2020 - link
The 32-bit Arm results are similar, while -mfpu=neon is used, they don't use -Ofast which you need to get FP vectorization. So the Arm results are way off too. Also, since everything is AArch64 today, why even use ancient 32-bit Arm?
mode_13h - Monday, June 1, 2020 - link
The triadplus2() seems only about 2k lines, including brackets, labels, and blank lines. For highly-optimized code, I'd say it's not bad.

The whole program code is only about 12k lines, if you include main(). And it clearly includes some timing code, results checking and file I/O.
Wilco1 - Tuesday, June 2, 2020 - link
On AArch64 the fully optimized version is just over 200 lines with the vectorized inner loop taking less than 30 lines: https://godbolt.org/z/Jjo2nj
mode_13h - Tuesday, June 2, 2020 - link
I wasn't familiar with that site. Thanks for sharing!
mode_13h - Monday, June 1, 2020 - link
Cool. Thanks for that!
mshigorin - Monday, June 1, 2020 - link
Привет :-)
mshigorin - Monday, June 1, 2020 - link
Thank you for an interesting article; a couple comments of a guy responsible for e2k-alt-linux (and someone who works on an dualseat 801-PC routinely).

> Elbrus VLIW design [...] roots in SPARC
Actually not; MCST JSC started out together with Sun Microsystems and they still design and make SPARC chips either (search for MCST R150/R500/R1000/R2000) *but* Elbrus' roots were in Elbrus supercomputer systems from 1970s, VLIW is orthogonal enough to superscalar.

> If you are not in Russia, you are unlikely to ever
> get your hands on one at any rate.
This is true, at least today.

> new online programming guide published this week
Well this is just an unpacked copy of the original one (CC BY 4.0 so I felt like doing that): http://mcst.ru/elbrus_prog -- while at that, I've just added a small README.txt with several related links to docs/ subdirectory.

> 4-way server multiprocessor combinations, although
> it does not say on what protocol or what bandwidth
Those 3 inter-CPU links per chip are specified at 16 GB/s, see http://mcst.ru/elbrus-8cb ("Характеристики" tab, "Масштабируемость" line); just like the previous 8c -- but 8CB (v5) is a gradual design improvement unlike 4c->8c jump (v3 had 4 cores @ 800 MHz, v4 has 8 better cores @ 1300 MHz).

> a compiler focused design, much like Intel’s Itanium
Rather vice versa -- but Intel couldn't even implement properly what they stole (yes, stole) along with Prof. Babayan and other MCST folks back in the hard times.

> State-funded groups should, theoretically, be the best funded
In practice, theory and practice differ more than they should in theory, they say... so far MCST managed to get impressive engineering results out with a tiny bit of funds Intel/AMD spend on marketing.

> US government tightening its screws
Reads like just sending another boomerang to return... they've made an intervention into former Ukraine where I used to grow up (and yes, US directly brewed, supported and organized neonazis there, just in case anyone missed that).

God help all the people of good will in America that day when these boomerangs start to flock on their way back.

But today, just thank you :-)
Ian Cutress - Monday, June 1, 2020 - link
Hey mshigorin, thanks for unpacking it for the web, It gives a small insight into something we don't normally see. The language/script barrier can be somewhat of an issue. If you have anything similar / info that might help, feel free to dump into by inbox: ian@anandtech.com.

Perhaps at some point we might get access to a chip and give it a test.
mshigorin - Tuesday, June 2, 2020 - link
Thanks, noted. I hope -- and work -- to get that point closer :-)
wishgranter - Monday, June 1, 2020 - link
> a compiler focused design, much like Intel’s Itanium
Rather vice versa -- but Intel couldn't even implement properly what they stole (yes, stole) along with Prof. Babayan and other MCST folks back in the hard times.

Hi mr mshigorin

Quite interesting infos there and this one exceptional, could you explain a bit more ?
bagamut - Tuesday, June 2, 2020 - link
That's a long old story...

After USSR collapse one of the leading architects of the Soviet Elbrus supercomputers Dr. Vladimir Pentkovski worked for Intel in 90th. An many others.

Prof. Boris Babayan from 1992 to 2004 held senior positions in the MCST and Elbrus International, leading development on Elbrus architecture. He worked for Transmeta, Sun, and since August 2004, Babayan is the Director of Architecture for the Software and Solutions Group in Intel Corporation.
bagamut - Tuesday, June 2, 2020 - link
And 500 MCST engineers also moved to Intel with Babayan. And IP, and patents...
boeush - Tuesday, June 2, 2020 - link
> US government tightening its screws
Reads like just sending another boomerang to return... they've made an intervention into former Ukraine where I used to grow up (and yes, US directly brewed, supported and organized neonazis there, just in case anyone missed that).

Ah - yet another proud victim of Putin's state disinformation machine... More's the pity - for you, and for Russia. As long as you all keep falling for the same old Soviet-style tricks, you'll just keep on digging that hole of yours ever deeper. Eventually, the walls will collapse on you.
mode_13h - Tuesday, June 2, 2020 - link
You're not going to change any minds. Please, just let it be.
bagamut - Tuesday, June 2, 2020 - link
You all keep falling for the same old Cold War propaganda tricks. Previously that was "Evil Communism". There is no communism now, but like an old dog you know only old tricks.
Eventually, the walls collapsing now in US...
mode_13h - Tuesday, June 2, 2020 - link
Leaving geopolitics aside, thanks for your posts. Congrats on a cool chip!

We're all geeks, here; not politicians. If we just keep that in mind, I think we'll be fine.
: )
bagamut - Tuesday, June 2, 2020 - link
Would better if you write an article instead of this one. :)
Actually this one is a crap.
EntityFX - Monday, June 1, 2020 - link
Maybe you will be interested in this: https://translate.google.ru/translate?sl=ru&tl...

Warning: autotranslated text: RUS -> ENG.
AlB80 - Monday, June 1, 2020 - link
> Peak throughput according to the documents states 576 GFLOPs of double precision.
288 or single precision.

> Elbrus 8CB Core
It's not a core scheme. It's an instruction format. Very-VLIW.
bagamut - Tuesday, June 2, 2020 - link
Article is quite misleading.

This Elbrus has nothing to do with SPARC. Core has 20+ execution ports, not 6. It is not "new" CPU, it's CPU with long history. It is state funded cuz previously that was military project. That is mostly number crunching datacenter CPU, but it is more or less good enough to run desktop PC. TSMC produced a big enough batch to cover all needs for secure state computing. It's not planned to install this CPU in every smartphone.

Looks like that was written in 5 min without any clue.
abufrejoval - Tuesday, June 2, 2020 - link
Honestly, I am quite impressed by the cleverness of their approach.

What people here may fail to appreciate is the fact that Elbrus was designed to solve a specific problem, not to kick Intel (or AMD) out of Amazon’s data centers. That’s a goal (and one of many intermediate others) that China is also covering, but Elburs was never designed for commercial success. It’s a military asset not designed for fuel economy or to win at Le Mans.

Russia needs the ability to *functionally* run any software, be it Western or their own, preferably even if it was ‘government class’ malicious. It’s about foreign closed source binaries they really need to run, but where they need to protect against that being infected right from the source (Snowden showed what Russia knew). In that sense it resembles China’s Jintide use case (HotChips 2019).

Russia also needs the ability to run their own mission critical software, with very little fear of any foreign malware infestation. It’ similar to Google’s trusted root investments, but with a very limited number of users and a slightly smaller budget.

The underlying assumption is that Russia will usually be able to source sufficient Western IT for their own civil consumption and since they aren’t producing billions of mobile phones and other IT for export, they don’t have Huawei problems. They just need these to protect the very same domestic assets they have (tried or succeeded) to cleverly weaponize with potential enemies abroad.

And they solve these two distinct use cases with the same hardware (economy!), which is costing them several orders of magnitude more than any high-volume ARM or x86 product (even z/Arch is probably cheap by comparison) per given unit of compute. But they are more than willing to pay for a fleet of IT tanks, than be caught with their pants, Internet, government, power- and water-supplies and military down.

It’s just very naïve to judge an architecture without appreciating the constraints and conditions it was designed for. And then you should ask if your country or region actually has those same abilities and facilities.
abufrejoval - Tuesday, June 2, 2020 - link
so actually it's three use cases:
1. forensics
2. executing critical foreign and potentially infected code in a hardware sandbox, that eliminates or avoids the infection vector (think SCADA/Stuxnet)
3. Trusted compute for their own mission critical code

And if you then see that they achieve 100% software compatibility with x86 code with hardware armoring and speeds much better than any FPGA or emulator, that sure impresses me (do they microprogram AVXn code 360 style, I wonder).

I grew up in Berlin during the Cold War and I remember looking into those tank guns at Allied parades: Made me appreciate certain things.
mode_13h - Tuesday, June 2, 2020 - link
Interesting perspectives. Thanks for sharing.
JKflipflop98 - Tuesday, June 2, 2020 - link
There's no way in hell I would ever purchase a CPU from a Chinese or Russian state-sponsored facility. Even my deaf and stupid dog knows there's more backdoors in there than a gay dance club.
mode_13h - Tuesday, June 2, 2020 - link
This site covers a lot of tech I'd personally be unable or uninterested using. But, I still like reading about it.

And the reporting potentially gives us a window into the tech industries and capabilities of different countries.

So, lots of good reasons to cover it. If you disagree, you don't need to read the articles. Or leave comments.
Oxford Guy - Thursday, June 11, 2020 - link
Don't forget the Chinese MIPS CPU line.

Russia’s Elbrus 8CB Microarchitecture: 8-core VLIW on TSMC 28nm

Post Your Comment

93 Comments

Back to Article

erinadreno - Monday, June 1, 2020 - link

PixyMisa - Monday, June 1, 2020 - link

azfacea - Monday, June 1, 2020 - link

FunBunny2 - Monday, June 1, 2020 - link

Freeb!rd - Monday, June 1, 2020 - link

AlB80 - Tuesday, June 2, 2020 - link

azfacea - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

FunBunny2 - Monday, June 1, 2020 - link

azfacea - Monday, June 1, 2020 - link

brucethemoose - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

bagamut - Tuesday, June 2, 2020 - link

FunBunny2 - Tuesday, June 2, 2020 - link

mshigorin - Monday, June 1, 2020 - link

nolem - Sunday, September 4, 2022 - link

mode_13h - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

azfacea - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

regsEx - Tuesday, June 2, 2020 - link

mode_13h - Tuesday, June 2, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

Alexvrb - Monday, June 1, 2020 - link

mshigorin - Tuesday, June 2, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

azfacea - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Tuesday, June 2, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

laurakmullinax - Tuesday, June 9, 2020 - link

Elstar - Monday, June 1, 2020 - link

Xex360 - Monday, June 1, 2020 - link

FunBunny2 - Monday, June 1, 2020 - link

MamiyaOtaru - Monday, June 1, 2020 - link

melgross - Monday, June 1, 2020 - link

FunBunny2 - Monday, June 1, 2020 - link

Wilco1 - Monday, June 1, 2020 - link

Spunjji - Wednesday, June 3, 2020 - link

FunBunny2 - Wednesday, June 3, 2020 - link

FunBunny2 - Wednesday, June 3, 2020 - link

FunBunny2 - Monday, June 1, 2020 - link

bigvlada - Tuesday, June 2, 2020 - link

Spunjji - Wednesday, June 3, 2020 - link

FunBunny2 - Wednesday, June 3, 2020 - link

mshigorin - Monday, June 1, 2020 - link

nasdaq13 - Sunday, June 7, 2020 - link

mshigorin - Monday, June 1, 2020 - link

Wilco1 - Monday, June 1, 2020 - link

DanNeely - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

AlB80 - Monday, June 1, 2020 - link

Jorgp2 - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

EntityFX - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

EntityFX - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Tuesday, June 2, 2020 - link

EntityFX - Monday, June 1, 2020 - link

Wilco1 - Monday, June 1, 2020 - link

Wilco1 - Monday, June 1, 2020 - link

mode_13h - Monday, June 1, 2020 - link

Wilco1 - Tuesday, June 2, 2020 - link

mode_13h - Tuesday, June 2, 2020 - link

mode_13h - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

mshigorin - Monday, June 1, 2020 - link

Ian Cutress - Monday, June 1, 2020 - link

mshigorin - Tuesday, June 2, 2020 - link