Erik McClure

RISC Is Fundamentally Unscalable


Today, there was an announcement about a new RISC-V chip, which has got a lot of people excited. I wish I could also be excited, but to me, this is just a reminder that RISC architectures are fundamentally unscalable, and inevitably stop being RISC as soon as they need to be fast. People still call ARM a “RISC” architecture despite ARMv8.3-A adding a FJCVTZS instruction, which is “Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero”. Reduced instruction set, my ass.

The reason this keeps happening is because the laws of physics ensure that no RISC architecture can scale under load. The problem is that a modern CPU is so fast that just accessing the L1 cache takes anywhere from 3-5 cycles. This is part of the reason modern CPUs rely so much on register renaming, allowing them to have hundreds of internal registers that are used to make things go fast, as opposed to the paltry 90 registers actually exposed, 40 of which are just floating point registers for vector operations. The fundamental issue that CPU architects run into is that the speed of light isn’t getting any faster. Even getting an electrical signal from one end of a CPU to the other now takes more than one cycle, which means the physical layout of your CPU now has a significant impact on how fast operations take. Worse, the faster the CPU gets, the more this lag becomes a problem, so unless you shrink the entire CPU or redesign it so your L1 and L2 caches are physically closer to the transistors that need them, the latency from accessing those caches can only go up, not down. The CPU might be getting faster, but the speed of light isn’t.

Now, obviously RISC CPUs are very complicated architectures that do all sorts of insane pipelining to try and execute as many instructions at the same time as possible. This is necessary because, unless your data is already loaded into registers, you might spend more cycles loading data from the L1 cache than doing the actual operation! If you hit the L2 cache, that will cost you 13-20 cycles by itself, and L3 cache hits are 60-100 cycles. This is made worse by the fact that complex floating-point operations can almost always be performed faster by encoding the operation in hardware, often in just one or two cycles, when manually implementing the same operation would’ve taken 8 or more cycles. The FJCVTZS instruction mentioned above even sets a specific flag based on certain edge-cases to allow an immediate jump instruction to be done afterwards, again to minimize hitting the cache.

All of this leads us to single instruction multiple data (SIMD) vector instructions common to almost all modern CPUs. Instead of doing a complex operation on a single float, they do a simple operation to many floats at once. The CPU can perform operations on 4, 8, or even 16 floating point numbers at the same time, in just 3 or 4 cycles, even though doing this for an individual float would have cost 2 or 3 cycles each. Even loading an array of floats into a large register will be faster than loading each float individually. There is no escaping the fact that attempting to run instructions one by one, even with fancy pipelining, will usually result in a CPU that’s simply not doing anything most of the time. In order to make things go fast, you have to do things in bulk. This means having instructions that do as many things as possible, which is the exact opposite of how RISC works.

Now, this does not mean CISC is the future. We already invented a solution to this problem, which is VLIW - Very Large Instruction Word. This is what Itanium was, because researchers at HP anticipated this problem 30 years ago and teamed up with Intel to create what eventually became Itanium. In Itanium, or any VLIW architecture, you can tell the CPU to do many things at once. This means that, instead of having to build massive vector processing instructions or other complex specialized instructions, you can build your own mega-instructions out of a much simpler instruction set. This is great, because it simplifies the CPU design enormously while sidestepping the pipelining issues of RISC. The problem is that this is really fucking hard to compile, and that’s what Intel screwed up. Intel assumed that compilers in 2001 could extract the instruction-level parallelism necessary to make VLIW work, but in reality we’ve only very recently figured out how to reliably do that. 20 years ago, we weren’t even close, so nobody could compile fast code for Itanium, and now Itanium is dead, even though it was specifically designed to solve our current predicament.

With that said, the MILL instruction set uses VLIW along with several other innovations designed to compensate for a lot of the problems discussed here, like having deferred load instructions to account for the lag time between requesting a piece of data and actually being able to use it (which, incidentally, also makes MILL immune to Spectre because it doesn’t need to speculate). Sadly, MILL is currently still vaporware, having not materialized any actual hardware despite it’s promising performance gains. One reason for this might be that any VLIW architecture has a highly unique instruction set. We’re used to x86, which is so high-level it has almost nothing to do with the underlying CPU implementation. This is nice, because everyone implements the same instruction set and your programs all work on it, but it means the way instructions interact is hard to predict, much to the frustration of compiler optimizers. With VLIW, you would very likely have to recompile your program for every single unique CPU, which is a problem MILL has spent quite a bit of time on.

MILL, and perhaps VLIW in general, may have a saving grace with WebAssembly, precisely because it is a low-level assembly language that can be efficiently compiled to any architecture. It wouldn’t be a problem to have unique instruction sets for every single type of CPU, because if you ship WebAssembly, you can simply compile the program for whatever CPU it happens to be running on. A lot of people miss this benefit of WebAssembly, even though I think it will be critical in allowing VLIW instruction sets to eventually proliferate. Perhaps MILL will see the light of day after all, or maybe someone else can come up with a VLIW version of RISC-V that’s open-source. Either way, we need to stop pretending that pipelining RISC is going to work. It hasn’t ever worked and it’s not going to work, it’ll just turn into another CISC with a javascript floating point conversion instruction.

Every. Single. Time.


Migrating To A Static Blog


I’ve finished constructing a new personal website for myself using hugo, and I’m moving my blog over there so I have more control over what gets loaded, and more importantly, so the page doesn’t attempt to load Blogger’s 5 MB worth of bloated javascript nonsense just to read some text. It also fixes math and code highlighting while reading on mobile. If you reached this post using Blogger, you’ll be redirected or will soon be redirected to the corresponding post on my new website.

All comments have been preserved from the original posts, but making new comments is currently disabled - I haven’t decided if I want to use Disqus or attempt something else. An RSS feed is available on the bottom of the page for tracking new posts that should mimic the Blogger RSS feed, if you were using that. If something doesn’t work, poke me on twitter and I’ll try to fix it.

I implemented share buttons with simple links, without embedding any crazy javascript bullshit. In fact, the only external resource loaded is a Google tracking ID for pageviews. Cloudflare is used to enforce an HTTPS connection over the custom domain even though the website is hosted on Github Pages.

Hopefully, the new font and layout is easier to read than Blogger’s tiny text and bullshit theme nonsense.


How To Avoid Memorizing Times Tables


I was recently told that my niece was trying to memorize her times tables. As an applied mathematician whose coding involves plenty of multiplication, I was not happy to hear this. Nobody who does math actually memorizes times tables, and furthermore, forcing a child to memorize anything is probably the worst possible thing you can do in modern society. No one should memorize their times tables, they should learn how to calculate them. Forcing children to memorize useless equations for no reason is a great way to either ensure they hate math, teach them they should blindly memorize and believe anything adults tell them, or both. So for any parents who wish to teach their children how to be critical thinkers and give them an advantage on their next math test, I am going to describe how to derive the entire times tables with only 12 rules.

  1. Anything multiplied by 1 is itself. Note that I said anything, that includes fractions, pies, cars, the moon, or anything else you can think of. Multiplying it by 1 just gives you back the same result.

  2. Any number multiplied by 10 has a zero added on the end. 1 becomes 10, 2 becomes 20, 72 becomes 720, 9999 becomes 99990, etc.

  3. Any single digit multiplied by 11 simply adds itself on the end instead of 0. 1 becomes 11, 2 becomes 22, 5 becomes 55, etc. This is because you never need to multiply something by eleven. Instead, multiply it by 10 (add a zero to it) then add itself.

    \[ \begin{aligned} 11*11 = 11*(10 + 1) = 11*10 + 11 = 110 + 11 = 121\\ 12*11 = 12*(10 + 1) = 12*10 + 12 = 120 + 12 = 132 \end{aligned} \]

  4. You can always reverse the numbers being multiplied and the same result comes out. $$ 12*2 = 2*12 $$, $$ 8*7 = 7*8 $$, etc. This is a simple rule, but it’s very easy to forget, so keep it in mind.

  5. Anything multiplied by 2 is doubled, or added to itself, but you only need to do this up to 9. For example, $$ 4*2 = 4 + 4 = 8 $$. Alternatively, you can count up by 2 that many times:

    \[ 4*2 = 2 + 2 + 2 + 2 = 4 + 2 + 2 = 6 + 2 = 8 \]
    To multiply any large number by two, double each individual digit and carry the result. Because you multiply each digit by 2 separately, the highest result you can get from this is 18, so you will only ever carry a 1, just like in addition.
    \[ \begin{aligned} \begin{matrix} 3 & 6\\ & 2\\ \hline & \\ & \\ \hline & \end{matrix}\quad \begin{matrix} 3 & 6\\ & 2\\ \hline 1 & 2\\ & \\ \hline & \end{matrix}\quad \begin{matrix} 3 & 6\\ & 2\\ \hline 1 & 2\\ 6 & \\ \hline & \end{matrix}\quad \begin{matrix} 3 & 6\\ & 2\\ \hline 1 & 2\\ 6 & \\ \hline 7 & 2 \end{matrix} \end{aligned} \]
    This method is why multiplying anything by 2 is one of the easiest operations in math, and as a result the rest of our times table rules are going to rely heavily on it. Don’t worry about memorizing these results - you’ll memorize them whether you want to or not simply because of how often you use them.

  6. Any number multiplied by 3 is multiplied by 2 and then added to itself. For example:

    \[ 6*3 = 6*(2 + 1) = 6*2 + 6 = 12 + 6 = 18 \]
    Alternatively, you can add the number to itself 3 times: $$ 3*3 = 3 + 3 + 3 = 6 + 3 = 9 $$

  7. Any number multiplied by 4 is simply multiplied by 2 twice. For example: $$ 7*4 = 7*2*2 = 14*2 = 28 $$

  8. Any number multiplied by 5 is the same number multiplied by 4 and then added to itself.

    \[ 6*5 = 6*(4 + 1) = 6*4 + 6 = 6*2*2 + 6 = 12*2 + 6 = 24 + 6 = 30 \]
    Note that I used our rule for 4 here to break it up and calculate it using only 2. Once kids learn division, they will notice that it is often easier to calculate 5 by multiplying by 10 and halving the result, but we assume no knowledge of division.

  9. Any number multiplied by 8 is multiplied by 4 and then by 2, which means it’s actually just multiplied by 2 three times. For example: $$ 7*8 = 7*4*2 = 7*2*2*2 = 14*2*2 = 28*2 = 56 $$

  10. Never multiply anything by 12. Instead, multiply it by 10, then add itself multiplied by 2. For example: $$ 12*12 = 12*(10 + 2) = 12*10 + 12*2 = 120 + 24 = 144 $$

  11. Multiplying any single digit number by 9 results in a number whose digits always add up to nine, and whose digits decrease in the right column while increasing in the left column.

    \[ \begin{aligned} 9 * 1 = 09\\ 9 * 2 = 18\\ 9 * 3 = 27\\ 9 * 4 = 36\\ 9 * 5 = 45\\ 9 * 6 = 54\\ 9 * 7 = 63\\ 9 * 8 = 72\\ 9 * 9 = 81 \end{aligned} \]
    10, 11, and 12 can be calculated using rules for those numbers.

  12. For both 6 and 7, we already have rules for all the other numbers, so you just need to memorize 3 results:

    \[ \begin{aligned} 6*6 = 36\\ 6*7 = 42\\ 7*7 = 49 \end{aligned} \]
    Note that $$ 7*6 = 6*7 = 42 $$. This is where people often forget about being able to reverse the numbers. Every single other multiplication involving 7 or 6 can be calculated using a rule for another number.

And there you have it. Instead of trying to memorize a bunch of numbers, kids can learn rules that build on top of each other, each taking advantage of the rules established before it. It’s much more engaging then trying to memorize a giant table of meaningless numbers, a task that’s so mind-numbingly boring I can’t imagine forcing an adult to do it, let alone a small child. More importantly, this task teaches you what math is really about. It’s not about numbers, or adding things together, or memorizing a bunch of formulas. It’s establishing simple rules, and then combining those rules together into more complex rules you can use to solve more complex problems.

This also establishes a fundamental connection to computer science that is often glossed over. Both math and programming are repeated abstraction and generalization. It’s about combining simple rules into a more generalized rule, which can then be abstracted into a simpler form and combined to create even more complex rules. Programs start with machine instructions, while math starts with propositions. Programs have functions, and math has theorems. Both build on top of previous results to create more powerful and expressive tools. Both require a spark of creativity to recognize similarities between seemingly unrelated concepts and unite them in a more generalized framework.

We can demonstrate all of this simply by refusing to memorize our times tables.


Ignoring Outliers Creates Racist Algorithms


Have you built an algorithm that mostly works? Does it account for almost everyone’s needs, save for a few weird outliers that you ignore because they make up 0.0001% of the population? Congratulations, your algorithm is racist! To illustrate how this happens, let’s take a recent example from Facebook. My friend’s message was removed for “violating community standards”. Now, my friend has had all sorts of ridiculous problems with Facebook, so to test my theory, I posted the exact same message on my page, and then had him report it.

Golly gee, look at that, Facebook confirmed the message I sent does not violate community guidelines, but he’s still banned for 12 hours for posting the exact same thing. What I suspect happened is this: Facebook has gotten mad at my friend for having a weird name multiple times, but he can’t prove what his name is because he doesn’t have access to his birth certificate because of family problems, and he thinks someone’s been falsely reporting a bunch of his messages. The algorithm for determining whether or not something is “bad” probably took these misleading inputs, combined it with a short list of so-called “dangerous” topics like “terrorism”, and then decided that if anyone reported one of his messages, it was probably bad. On the other hand, I have a very western name and nobody reports anything I post, so either the report actually made it to a human being, or the algorithm simply decided it was probably fine.

Of course, the algorithm was wrong about my friend’s message. But Facebook doesn’t care. I’m sure a bunch of self-important programmers are just itching to tell me we can’t deal with all the edge-cases in a commercial algorithm because it’s infeasible to account for all of them. What I want to know is, have any of these engineers ever thought about who the edge-cases are? Have they ever thought about the kind of people who can’t produce birth certificates, or don’t have a driver’s license, or have strange names that don’t map to unicode properly because they aren’t western enough?

Poor people. Minorities. Immigrants. Disabled people. All these people they claim to care about, all this talk of diversity and equal opportunity and inclusive policies, and they’re building algorithms that by their very nature will exclude those less fortunate than them. Facebook’s algorithm probably doesn’t even know that my friend is asian, yet it’s still discriminating against him. Do you know who can follow all those rules and assumptions they make about normal people? Rich people. White people. Privileged people. These algorithms benefit those who don’t need help, and disproportionately punish those who don’t need any more problems.

What’s truly terrifying is that Silicon Valley wants to run the world, and it wants to automate everything using a bunch of inherently flawed algorithms. Algorithms that might be impossible to perfect, given the almost unlimited number of edge-cases that reality can come up with. In fact, as I am writing this article, Chrome doesn’t recognize “outlier” as a word, even though Google itself does.

Of course, despite this, Facebook already built an algorithm that tries to detect “toxicity” and silences “unacceptable” opinions. Even if they could build a perfect algorithm for detecting “bad speech”, do these companies really think forcibly restricting free speech will accomplish anything other than improving their own self-image? A deeply cynical part of me thinks the only thing these companies actually care about is looking good. A slightly more optimistic part of me thinks a bunch of well-meaning engineers are simply being stupid.

You can’t change someone’s mind by punching them in the face. Punching people in the face may shut them up, but it does not change their opinion. It doesn’t fix anything. Talking to them does. I’m tired of this industry hiding problems behind shiny exteriors instead of fixing them. That’s what used car salesmen do, not engineers. Programming has devolved into an art of deceit, where coders hide behind pretty animations and huge frameworks that sweep all their problems under the rug, while simultaneously screwing over the people who were supposed to benefit from an “egalitarian” industry that seems less and less egalitarian by the day.

Either silicon valley needs to start dealing with people that don’t fit in neat little boxes, or it will no longer be able to push humanity forward. If we’re going to move forward as a species, we have to do it together. Launching a bunch of rich people into space doesn’t accomplish anything. Curing cancer for rich people doesn’t accomplish anything. Inventing immortality for rich people doesn’t accomplish anything. If we’re going to push humanity forward, we have to push everyone forward, and that means dealing with all 7 billion outliers.

I hope silicon valley doesn’t drag us back to the feudal age, but I’m beginning to think it already has.


I Used To Want To Work For Google


A long time ago I thought Google was this magical company that truly cared about engineering and solving problems instead of maximizing shareholder value. Then Larry Page became CEO and I realized they were not a magical unicorn and lamented the fact that they had been transformed into “just another large company”. Several important things happened between that post and now: Microsoft got a new CEO, so I decided to give them a shot and got hired there. I quit right before Windows 10 came out because I knew it was going to be a disaster. More recently, it’s become apparent that Google had gone far past simply being a behemoth unconcerned with the cries of the helpless and transformed into something outright malevolent. It’s silenced multiple reporters, blocked windows phone from accessing youtube out of spite, and successfully gotten an entire group of researchers fired by threatening to pull funding (but that didn’t stop them).

This is evil. This is horrifying. This is the kind of stuff Microsoft did in the 90s that made everyone hate it so much they still have to fight against the repercussions of decisions made two decades ago because of the sheer amount of damage they did and lives they ruined. I’m at the point where I’d rather go back to Microsoft, whose primary sin at this point is mostly just being incompetent instead of outright evil, rather than Google, who is actually doing things that are fundamentally morally wrong. These are the kinds of decisions that are bald-faced abuses of power, without any possible “good intention” driving them. It’s vile. There is no excuse.

As an ex-Microsoft employee, I can assure you that at no point did I think Microsoft was doing something evil while I was there. I haven’t seen Microsoft do anything outright evil since I left, either. The few times they came close they backed off and apologized later. Microsoft didn’t piss people off by being evil, it pissed people off by being dumb. I was approached by a Google recruiter shortly after I left and I briefly considered going to Google because I considered them vastly more competent, and I still do. However, no amount of engineering competency can make me want to work for a company that actively does things I consider morally reprehensible. This is the same reason I will never work for Facebook. I’ve drawn a line in the sand, and I find myself in the surprising situation of being on the opposite side of Google, and discovering that Microsoft, of all companies, isn’t with them.

I always thought I’d be able to mostly disregard the questionable things that Google and Microsoft were doing and compare them purely on the competency of their engineers. However, it seems that Google has every intention of driving me away by doing things so utterly disgusting I could never work there and still be able to sleep at night. This worries me deeply, because as these companies get larger and larger, they eat up all the other sources of employment. Working at a startup that isn’t one of the big 5 won’t help if it gets bought out next month. One friend of mine with whom I shared many horror stories with worked at LinkedIn. He was not happy when he woke up one day to discover he now worked for the very company he had heard me complaining about. Even now, he’s thinking of quitting, and not because Microsoft is evil - they’re just so goddamn dumb.

The problem is that there aren’t many other options, short of starting your own company. Google is evil, Facebook is evil, Apple is evil if you care about open hardware, Microsoft is too stupid to be evil but might at some point become evil again, and Amazon is probably evil and may or may not treat it’s employees like shit depending on who you ask. Even if you don’t work directly for them, you’re probably using their products or services. At some point, you have to put food on the table. This is why I generally refuse to blame someone for working for an evil company because the economy punishes you for trying to stand up for your morals. It’s not the workers fault, here, it’s Wall Street incentivizing rotten behavior by rewarding short-term profits instead of long-term growth. A free market optimizes to a monopoly. Monopolies are bad. I don’t know what people don’t get about this. We’re fighting over stupid shit like transgender troops or gay rights instead of just treating other human beings with decency, all the while letting rich people rob us blind as they decimate the economy. This is stupid. I would daresay it’s almost more stupid than the guy at Microsoft who decided to fire all the testers.

But I guess I’ll take unrelenting stupidity over retaliating against researchers for criticizing you. At least until Microsoft remembers how to be evil. Then I don’t know what I’ll do.

I don’t know what anyone will do.


Avatar

Archive

  1. 2026
  2. 2025
  3. 2024
  4. 2023
  5. 2022
  6. 2021
  7. 2020
  8. 2019
  9. 2018
  10. 2017
  11. 2016
  12. 2015
  13. 2014
  14. 2013
  15. 2012
  16. 2011
  17. 2010
  18. 2009