Gonna say this one time.

I don’t post this stuff because I am looking for someone to tell me there’s not a scholarly article, or a deeper dive. I post it because of trends I have seen in the reporting of day to day events, and emerging threats.

I didn’t get here on scholarly articles on emerging threats

As this one gets closer to being truly weaponized… You need to know that SPECTRE and Meltdown cannot be patched..

threatpost.com/attacks-slaught

I mean, I hate it too… all my life has levels been focused on one version of x86 or another.

But SPECTRE is not a ghost. It is real. It can do damage.

@thegibson we have to tools to solve these problems, but I’ve had little luck convincing anyone with the resources to get it done that it’s real.

These two are just the beginning, and as long as we rely on static logic we’ll have computers that can’t be fixed.

I wrote a (weirdly patriotic?) post about using FPGA to solve this and many other systemic vulnerabilities our computers have, but I’m not sure how to push it forward.

jasongullickson.com/computatio

@requiem @thegibson If I may be allowed to be pedantic here, I ask that my words be considered with some gravity.

The issue isn't static logic. The issue is divorcing instruction decoding from instruction set design to attain performance goals not originally built into the ISA.

It takes, for example, several clock cycles just to decode x86 instructions into a form that can then be readily executed. Several clocks to load the code cache. Several clocks to translate what's in the code cache into a pre-decoded form in the pre-decode cache. Several clocks to load a pre-decode line into the instruction registers (yes, plural) of the instruction fetch unit. A clock to pass that onto the first of (I think?) three instruction decode stages in the core. Three more clocks after that, you finally have a fully decoded instruction that the remainder of the pipelines (yes, plural) can potentially execute.

Of course, I say potentially because there's register renaming happening, there's delays caused by waiting for available instruction execution units to become available in the first place, there's waiting for result buses to become uncontested, ...

The only reason all this abhorrent latency is obscured is because the CPU literally has hundreds of instructions in flight at any given time. Gone are the days when it was a technical achievement that the Pentium had 2 concurrently running instructions. Today, our CPUs, have literally hundreds.

(Consider: a 7-pipe superscalar processor with 23 pipeline stages, assuming no other micro-architectural features to enhance performance, still offers 23*7=161 in-flight instructions, assuming you have some other means of keeping those pipes filled.)

This is why CPU vendors no longer put cycle counts next to their instructions anymore. Instructions are pre-decoded into short programs, and it's those programs (strings of "micro-ops", hence micro-op caches, et. al.) which are executed by the core on a more primitive level.

Make no mistake: the x86 instruction set architecture we all love to hate today has been shambling undead zombie for decades now. RISC definitely won, which is why every x86-compatible processor has been built on top of RISC cores since the early 00s, if not earlier. Intel just doesn't want everyone to know it because the ISA is such a cash cow these days. Kind of like how the USA is really a nation whose official measurement system is the SI system, but we continue to use imperial units because we have official definitions that maps one to the other.

Oh, but don't think that RISC is immune from this either. It makes my blood boil when people say, "RISC-V|ARM|MIPS|POWER is immune."

No, it's not. Neither is MIPS, neither is ARM, neither is POWER. If your processor has any form of speculative execution and depends on caches for maintaining instruction throughputs, which is to say literally all architectures on the planet since the Pentium-Pro demonstrated its performance advantages over the PowerPC 601, you will be susceptible to SPECTRE. Full stop. That's laws of physics talking, not Intel or IBM.

Whether it's implemented as a sea-of-gates in some off-brand ASIC or if it's an FPGA, or you're using the latest nanometer-scale process node by the most expensive fab house on the planet, it won't matter -- SPECTRE is an artifact of the micro-architecture used by the processor. It has nothing whatsoever to do with the ISA. It has everything to do with performance-at-all-costs, gotta-keep-them-pipes-full mentality that drives all of today's design requirements.

I will put the soapbox back in the closet now. Sorry.

Follow

@vertigo @requiem @TheGibson
What if it's the compiler that does the speculation, VLIW-style?

@wolf480pl @requiem @thegibson Everything is much more deterministic in that scenario, and is immune to SPECTRE.

SPECTRE depends on changing between user and kernel modes of operation. The idea is to exploit failed speculation into kernel space. Under these conditions, you're still running in user-space, but the caches now have privileged information in them. How much depends on which paths were speculated in the kernel, and flushing those cache lines in favor of new user-mode content takes time. Hence, the timing side-channel.

With a compiler for a VLIW architecture, this can't occur, because speculation never happens across a privilege boundary. The cache is always hot with the working set of the process currently running.

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!