Blogs - Erik McClure

Money Is Fake. It's Not Real. It's Made Up.

Death: No. Humans need fantasy to be human. To be the place where the falling angel meets the rising ape.
Susan: With tooth fairies? Hogfathers?
Death: Yes. As practice, you have to start out learning to believe the little lies.
Susan: So we can believe the big ones?
Death: Yes. Justice, mercy, duty. That sort of thing.

I want to start this by saying that I am in favor of a wealth tax. We should be increasing taxes on the wealthy and raising minimum wage, because we know that steadily increasing the relative buying power of the poor is the best way to improve an economy. However, none of this happens in a vacuum. When we talk about income equality, I have become distressed at the amount of ignorance on display about the economy, systemic societal problems, and even what money actually is.

One Pixel Wealth is a webpage from 2021 that helps visualize how truly insane the amount of wealth that the richest people have actually is. While the visualization is great at putting in perspective just how much Jeff Bezos’ wealth is on paper, it links to a refutation of the Paper Billionaire Argument to dispute the idea that Jeff Bezos doesn’t really have that much money in liquid assets. The paper billionaire argument is that, because most wealth is in stocks or bonds, selling it all at once would flood the market and crater the total value of those assets.

The proposed counter-argument is incredibly bad. It demonstrates a total lack of understanding about macro-economic forces. Ironically, this is because it cannot appreciate the scale of its own arguments, the exact issue that One Pixel Wealth is trying to address. Let me paraphrase the key points in this counter-argument:

The Paper Billionaire argument doesn’t work, because you can liquidate the wealth over time in a controlled sell-off, which executives do regularly.
Given that $122 trillion worth of stock changes hands in the US every year, you could liquidate a trillion dollars over five years and only constitute 0.16% of all the trading.
Because 50% of all US households own stock, you will always be able find people to buy the stock the billionaires are selling, it’s not just other billionaires that will buy it.
Even if the paper bilionaire argument was true, if selling all the stock would lose 80% of it’s value, that would leave behind $700 billion.

To start, #1 and #2 don’t work for a very simple reason: A stock’s value represents the market’s confidence in the stock producing future value. Owning stock, in some circumstances, is interpreted as having confidence in that future value. If the market loses confidence in your company, it doesn’t matter what assets you have, your stock price will crater if the market thinks you’ll start losing money. If the CEO starts liquidating their position (which they must state their intention to sell stock ahead of time, years before it completes), the market will panic and the stock price will implode at the mere announcement of the liquidation, let alone actually selling any stock. Elon Musk right now should make it painfully obvious that he was only ever the richest man in the world on paper, because he just lost $107 billion dollars this year! He only bought Twitter for $44 billion! You simply cannot make the combined GDPs of Bulgaria, Croatia, Iceland and Uruguay evaporate if that money was actually real in any sense.

Money does not represent physical assets. Money is supposed to represent human labor, and there is a fixed amount of human labor available on the planet. When someone dies or is incapacitated, it goes down. When someone graduates into the labor force, or becomes more skilled, it goes up. In ancient times, “human labor” was heavily correlated to how much physical activity someone could do, like lifting things or harvesting food. However, our modern economy is dominated by specialist jobs done by highly skilled laborers. So for the sake of analysis, we can say that the GDP of the entire planet should ideally represent the maximum amount of labor the entire human race could do, if we assigned everyone to the job they are most qualified for. We could then increase the total amount of labor we can do by either building machines or improving our skills.

This leads into why point #3 is complete nonsense. It reminds me of when Ben Shapiro, when talking about climate change, asked “you think that people aren’t going to just sell their homes and move?”

The entire point of wealth inequality is that the top 1% holds more money than the entire middle-class. That’s literally the problem! How can everyone else possibly buy all the stock the billionaires are selling if it would require all of their savings? Who are you selling the stocks to?! This isn’t how money works! One Pixel Wealth even tries to claim that if we just gave all the poor people in america a bunch of money it would fix poverty, while linking to a study that only applies to local economies. The world’s largest economy is NOT a local economy! These measures only work when the global economy can absorb the difference, which means making changes gradually or in small, localized areas.

Of course, even if you somehow magically liquidated all your assets and acquired $700 billion dollars in real, liquid cash, it’s not actually $700 billion dollars. It’s like saying that there are gold asteroids worth $10000 quadrillion dollars - the value would plummet if you actually had that much gold. Since money represents human labor, which is a limited resource, simply having more money does not let you do more things. $700 billion dollars is enough to hire 12 billion people for 1 day working at minimum wage ($7.25), but you can’t actually do that, because there’s only 7.8 billion people in the entire world. Having $700 billion in liquid assets would decrease the value of money itself. That’s what inflation is. People claim that some billions of dollars will be enough to eradicate malaria or provide drinking water to everyone, but it’s never that simple because these are always geopolitical issues. Bill Gates has donated billions of dollars since 2005 towards fighting malaria and we only got a vaccine 16 years later. We’re surrounded by so many dumb problems we can solve with more money that we’ve forgetten that some problems are really, truly, fundamentally difficult problems that cannot be solved by throwing money at them. At some point there are just too many cooks in the kitchen.

Note that this labor distribution problem applies to liquid assets, which is one theory on why inflation had (until 2021), remained fairly low despite the amount of wealth increasing to ridiculous amounts. Wealthy people are acting as gigantic money sinks - by absorbing all of the “money”, the actual amount of real, liquid cash in the economy increased at a modest rate, so inflation remained stable. Now, inflation has started to skyrocket in 2022, and some people blame the stimulus payments, but the reality is that the low interest rates during the pandemic, combined with other complex macroeconomic forces, likely caused it, although nobody knows for sure. If wealthy people started actually spending all their money at once, as people seem to want them to do, the amount of liquid assets would skyrocket and so would inflation.

I keep saying that money is supposed to represent human labor, because it’s really an approximation. Someone can be more productive at one job than another, so the amount of human labor is not a knowable value in the first place. Instead, it helps to think of money as representing percieved power imbalance (conservatives often make the mistake of thinking it represents actual power imbalance, which it does not). This power imbalance can come from economic, diplomatic, or military factors. Basically, money is just the current state of global geopolitics. You cannot fight wealth inequality by just redistributing money. Simply taking money from rich people does not fix the systemic issues that created the power imbalance in the first place, because it’s not actually wealth inequality, it’s power inequality, and that is a political issue, not economic. Money is simply our way of quantifying that imbalance. The government’s unwillingness to tax rich people is because of the power imbalance, not the cause of it. If politicians are unwilling to go after rich people, it’s because those rich people hold an alarming amount of sway over politicians, which makes them keys to power.

It means that we have allowed power to accumulate in dangerously high concentrations, and we need to deal with this at a political level before we get an economic solution. We must elect leaders that help tackle power inequality (like break up huge corporations) before we can make progress on wealth inequality. Basically, go vote.

Published on December 17, 2022 at 5:16pm

We Need New Motherboards Before GPUs Collapse Under Their Own Gravity

You can’t have a 4-slot GPU. You just can’t.

We have finally left sanity behind, with nvidia’s 4000 series cards yielding a “clown car” of absurd GPU designs, as GamersNexus put it. These cards are so huge they need “GPU Support Sticks”, which are an actual real thing now. The fact that we insist on relegating the GPU to interfacing with the system while hanging off of a single, increasingly absurd PCIe 6.0 x16 slot that can push 128 GBps is completely insane. There is no real ability to just pick the GPU you want and then pair it with a cooler that is actually attached to the motherboard. The entire cooling solution has to be in the card itself and we are fast reaching the practical limitations here due to gravity and the laws of physics. Top-heavy GPUs are now essentially giant levers pulling on the PCIe slot, with the only possible anchor point that is above the center of mass being the bracket on one side.

A 4090 series card will demand a whopping 450 W, which dwarfs the Ryzen 9 5900X peak power consumption of only 140 W. That’s over 3 times as much power! The graphics card is now drawing more power than the entire rest of the computer! We’ll have to wait for benchmarks to be sure, but the laws of thermodynamics suggest that the GPU will now also be producing more heat than every other component of the PC, combined. And this is the thing we have hanging off of a PCIe slot that doesn’t have any other way of mounting a cooling solution to the motherboard?!

What the FUCK are we doing?!

Look, I’m not a hardware guy. I just write all the shader code that makes GPUs cry. I don’t actually know how we should fix this problem, because I don’t know what designs are thermally efficient or not. I do know, however, that something has to change. Maybe we can make motherboards with a GPU slot next to the CPU slot and have a unified massive radiator sitting on top of them - or maybe it’s a better idea to put the two processor units on opposite ends of the board. I don’t know, just do something so I can use a cooling solution that is actually screwed into the fucking motherboard instead of requiring a “GPU Support Stick” so gravity doesn’t rip it out of the PCIe slot.

As an example of alternative solutions, here is an MXM form-factor for laptops that allow them to provide custom cooling solutions appropriate for the laptop.

In fact, the PCIe spec itself actually contains a rear-bracket mount that, if anyone was paying attention, would help address this problem:

See that funky looking metal thing labeled “2” on the diagram? That sure looks like a good alternative to a “support stick” if anyone ever actually paid attention to the spec. Or maybe this second bracket doesn’t work very well and we need to rethink how motherboards work entirely. Should we have GPU VRAM slots alongside CPU RAM slots? Is that even possible? (Nope.) Or maybe we can come up with an alternative form factor for GPU cards that you can actually attach to the motherboard with screws?

I have no idea what is or isn’t practical, but please, just do something before the GPUs collapse under their own gravity and create strange new forms of matter inside my PC case.

Published on September 22, 2022 at 4:30pm

C++ Constructors, Memory, and Lifetimes

What exactly happens when you write Foo* foo = new Foo();? A lot is packed into this one statement, so lets try to break it down. First, this example is allocating new memory on the heap, but in order to understand everything that’s going on, we’re going to have to explain what it means to declare a variable on the stack. If you already have a good understanding of how the stack works, and how functions do cleanup before returning, feel free to skip to the new statement.

Stack Lifetimes

Describing the stack is very often glossed over in many other imperative languages, despite the fact that those languages still have one (functional languages are an entirely different level of weird). Let’s start with something very simple:

int foobar(int b)
{
  int a;
  a = b;
  return a;
}

Here, we are declaring a function foobar that takes an int and returns an int. The first line of the function declares a variable a of type int. This is all well and good, but where is the integer?. On most modern platforms, int resolves to a 32-bit integer that takes up 4 bytes of space. We haven’t allocated any memory yet, because no new statement happened and no malloc() was called. Where is the integer?

The answer is that the integer was allocated on the stack. If you aren’t familiar with the computer science data structure of the same name, your program is given a chunk of memory by the operating system that is organized into a stack structure, hence the name. It’s like a stack of plates - you can push items on top of the stack, or you can remove items from the top of the stack, but you can’t remove things from the middle of the stack or all the plates will come crashing down. So if we push something on top of the stack, we’re stuck with it until we get rid of everything on top of it.

When we called our function, the parameter int b was pushed on to the stack. Parameters take up memory, so on to the stack they go. Hence, before we ever reach the statement int a, 4 bytes of memory were already pushed onto our stack. Here’s what our stack looks like at the beginning of the function if we call it with the number 90 (assuming little-endian):

int a tells the compiler to push another 4 bytes of memory on to the stack, but it has no initial value, so the contents are undefined:

a = b assigns b to a, so now our stack looks like this:

Finally, return a tells the compiler to evaluate the return expression (which in our case is just a so there’s nothing to evaluate), then copy the result into a chunk of memory we reserved ahead of time for the return value. Some programmers may assume the function returns immediately once the return statement is executed - after all, that’s what return means, right? However, the reality is that the function still has to clean things up before it can actually return. Specifically, we need to return our stack to the state it was before the function was called by removing everything we pushed on top of it in reverse order. So, after copying our return value a, our function pops the top of the stack off, which is the last thing we pushed. In our case, that’s int a, so we pop it off the stack. Our stack now looks like this:

The moment from which int a was pushed onto the stack to the moment it was popped off the stack is called the lifetime of int a. In this case, int a has a lifetime of the entire function. After the function returns, our caller has to pop off int b, the parameter we called the function with. Now our stack is empty, and the lifetime of int b is longer than the lifetime of int a, because it was pushed first (before the function was called) and popped afterwards (after the function returned). C++ builds it’s entire concept of constructors and destructors on this concept of lifetimes, and they can get very complicated, but for now, we’ll focus only on stack lifetimes.

Let’s take a look at a more complex example:

int foobar(int b)
{
  int a;
  
  {
    int x;
    x = 3;
    
    {
      int z;
      int max;
      
      max = 999;
      z = x + b;
      
      if(z > max)
      {
        return z - max;
      }
      
      x = x + z;
    }
    
    // a = z; // COMPILER ERROR!
    
    {
      int ten = 10;
      a = x + ten;
    }
  } 
  
  return a;
}

Let’s look at the lifetimes of all our parameters and variables in this function. First, before calling the function, we push int b on to the stack with the value of whatever we’re calling the function with - say, 900. Then, we call the function, which immediately pushes int a on to the stack. Then, we enter a new block using the character {, which does not consume any memory, but instead acts as a marker for the compiler - we’ll see what it’s used for later. Then, we push int x on to the stack. We now have 3 integers on the stack. We set int x to 3, but int a is still undefined. Then, we enter another new block. Nothing interesting has happened yet. We then push both int z and int max on to the stack. Then we assign 999 to int max and assign int z the value x + b - if we passed in 900, this means z is now equal to 903, which is less than the value of int max (999), so we skip the if statement for now. Then we assign x to x + z, which will be 906.

Now things get interesting. Our topmost block ends with a } character. This tells the compiler to pop all variables declared inside that block. We pushed int z on to the stack inside this block, so it’s gone now. We cannot refer to int z anymore, and doing so will be a compiler error. int z is said to have gone out of scope. However, we also pushed int max on to the stack, and we pushed it after int z. This means that the compiler will first pop int max off the stack, and only afterwards will it then pop int z off the stack. The order in which this happens will be critical for understanding how lifetimes work with constructors and destructors, so keep it in mind.

Then, we enter another new scope. This new scope is still inside the first scope we created that contains int x, so we can still access x. We define int ten and initialize it with 10. Then we set int a equal to x + ten, which will be 916. Then, our scope ends, and int ten goes out of scope, being popped off the stack. Immediately afterwards, we reach the end of our first scope, and int x is popped off the stack.

Finally, we reach return a, which copies a to our return value memory segment, pops int a, and returns to our caller, who then pops int b. That’s what happens when we pass in 900, but what happens if we pass in 9000?

Everything is the same until we reach the if statement, whose condition is now satisfied, which results in the function terminating early and returning z - max. What happens to the stack?

When we reach return z - max, the compiler evaluates the statement and copies the result (8004) out. Then it starts popping everything off the stack (once again, in the reverse order that things were pushed). The last thing we pushed on to the stack was int max, so it gets popped first. Then int z is popped. Then int x is popped. Then int a is popped, the function returns, and finally int b is popped by the caller. This behavior is critical to how C++ uses lifetimes to implement things like smart pointers and automatic memory management. Rust actually uses a similar concept, but it uses it for a lot more than C++ does.

`new` Statements

Okay, now we know how lifetimes work and where variables live when they aren’t allocated, but what happens when you do allocate memory? What’s going on with the new statement? To look at this, let’s use a simplified example:

int* foo = new int();

Here we have allocated a pointer to an integer on the stack (which will be 8 bytes if you’re on a 64-bit system), and assigned the result of new int() to it. What happens when we call new int()? In C++, the new operator is an extension of malloc() from C. This means it allocates memory from the heap. When you allocate memory on the heap, it never goes out of scope. This is what most programmers are familiar with in other languages, except that most other languages handle figuring out when to deallocate it and C++ forces you to delete it yourself. Memory allocated on the heap is just there, floating around, forever, or until you deallocate it. So this function has a memory leak:

int bar(int b)
{
  int* a = new int();
  *a = b;
  return *a;
}

This is the same as our first example, except now we allocate a on the heap instead of the stack. So, it never goes out of scope. It’s just there, sitting in memory, forever, until the process is terminated. The new operator looks at the type we passed it (which is int in this case) and calls malloc for us with the appropriate number of bytes. Because int has no constructors or destructors, it’s actually equivelent to this:

int bar(int b)
{
  int* a = (int*)malloc(sizeof(int));
  *a = b;
  return *a;
}

Now, people who are familiar with C will recognize that any call to malloc should come with a call to free, so how do we do that in C++? We use delete:

int bar(int b)
{
  int* a = new int();
  *a = b;
  int r = *a;
  delete a;
  return r;
}

IMPORTANT: Never mix new and free or malloc and delete. The new/delete operators can use a different allocator than malloc/free, so things will violently explode if you treat them as interchangeable. Always free something from malloc and always delete something created with new.

Now we aren’t leaking memory, but we also can’t do return *a anymore, because it’s impossible for us to do the necessary cleanup. If we were allocating on the stack, C++ would clean up our variable for us after the return statement, but we can’t put anything after the return statement, so there’s no way to tell C++ to copy the value of *a and then manually delete a without introducing a new variable r. Of course, if we could run arbitrary code when our variable went out of scope, we could solve this problem! This sounds like a job for constructors and destructors!

Constructors and `delete`

Okay, let’s put everything together and return to our original statement in a more complete example:

struct Foo
{
  // Constructor for Foo
  Foo(int b)
  {
    a = b;
  }
  // Empty Destructor for Foo
  ~Foo() {}
  
  int a;
};

int bar(int b)
{
  // Create
  Foo* foo = new Foo(b);
  int a = foo->a;
  // Destroy
  delete foo;
  return a; // Still can't return foo->a
}

In this code, we still haven’t solved the return problem, but we are now using constructors and destructors, so let’s walk through what happens. First, new allocates memory on the heap for your type. Foo contains a 32-bit integer, so that’s 4 bytes. Then, after the memory is allocated, new automatically calls the constructor that matches whatever parameters you pass to the type. Your constructor doesn’t need to allocate any memory to contain your type, since new already did this for you. Then, this pointer is assigned to foo. Then we delete foo, which calls the destructor first (which does nothing), and then deallocates the memory. If you don’t pass any parameters when calling new Type(), or you are creating an array, C++ will simply call the default constructor (a constructor that takes no parameters). This is all equivelent to:

int bar(int b)
{
  // Create
  Foo* foo = (Foo*)malloc(sizeof(Foo));
  new (foo) Foo(b); // Special new syntax that ONLY calls the constructor function (this is how you manually call constructors in C++)
  int a = foo->a; 
  // Destroy
  foo->~Foo(); // We can, however, call the destructor function directly
  free(foo);
  
  return a; // Still can't return foo->a
}

This uses a special new syntax that doesn’t allocate anything and simply lets us call the constructor function directly on our already allocated memory. This is what the new operator is doing for you under the hood. We then call the destructor manually (which you can do) and free our memory. Of course, this is all still useless, because we can’t return the integer we allocated on the heap!

Destructors and lifetimes

Now, the magical part of C++ is that constructors and destructors are run when things are pushed or popped from the stack ^[1]. The fact that constructors and destructors respect variable lifetimes allows us to solve our problem of cleaning up a heap allocation upon returning from a function. Let’s see how that works:

struct Foo
{
  // Default constructor for Foo
  Foo()
  {
    a = new int();
  }
  // Destructor frees memory we allocated using delete
  ~Foo()
  {
    delete a;
  }
  
  int* a;
};

int bar(int b)
{
  Foo foo;
  *foo.a = b;
  return *foo.a; // Doesn't leak memory!
}

How does this avoid leaking memory? Let’s walk through what happens: First, we declare Foo foo on the stack, which pushes 4 bytes on to the stack, and then C++ calls our default constructor. Inside our default constructor, we use new to allocate a new integer and store it in int* a. Returning to our function, we then set our integer pointer foo.a to b. Then, we return the value stored in foo.a from the function^[2]. This copies the value out of foo.a first by dereferencing the pointer, and then C++ calls our destructor ~Foo before Foo foo is popped off the stack. This destructor deletes int* a, ensuring we don’t leak any memory. Then we pop off int b from the stack and the function returns. If we could somehow do this without constructors or destructors, it would look like this:

int bar(int b)
{
  Foo foo;
  foo.a = new int();
  *foo.a = b;
  int retval = *foo.b;
  delete a;
  return retval;
}

The ability to run a destructor when something goes out of scope is an incredibly important part of writing good C++ code, becuase when a function returns, all your variables go out of scope when the stack is popped. Thus, all cleanup that is done during destructors is gauranteed to run no matter when you return from a function. Destructors are gauranteed to run even when you throw an exception! This means that if you throw an exception that gets caught farther up in the program, you won’t leak memory, because C++ ensures that everything on the stack is correctly destroyed when processing exception handling, so all destructors are run in the same order they normally are.

This is the core idea behind smart pointers - if a pointer is stored inside an object, and that object deletes the pointer in the destructor, then you will never leak the pointer because C++ ensures that the destructor will eventually get called when the object goes out of scope. Now, if implemented naively there is no way to pass the pointer into different functions, so the utility is limited, but C++11 introduced move semantics to help solve this issue. We’ll talk about those later. For now, let’s talk about different kinds of lifetimes and what they mean for when constructors and destructors are called.

Static Lifetimes

Because any struct or class in C++ can have constructors or destructors, and you can put structs or classes anywhere in a C++ program, this means that there are rules for how to safely invoke constructors and destructors in all possible cases. These different possible lifetimes have different names. Global variables, or static variables inside classes, have what’s called “static lifetime”, which means their lifetime begins when the program starts and ends once the program exits. The exact order these constructors are called, however, is a bit tricky. Let’s look at an example:

struct Foo
{
  // Default constructor for Foo
  Foo()
  {
    a = new int();
  }
  // Destructor frees memory we allocated using delete
  ~Foo()
  {
    delete a;
  }
  
  int* a;
  static Foo instance;
};

static Foo GlobalFoo;

int main()
{
  *GlobalFoo.a = 3;
  *Foo::instance.a = *GlobalFoo.a;
  return *Foo::instance.a;
}

When is instance constructed? When is GlobalFoo constructed? Can we safely assign to GlobalFoo.a immediately? The answer is that all static lifetimes are constructed before your program even starts, or more specifically, before main() is called. Thus, by the time your program has reached your entry point (main()), C++ gaurantees that all static lifetime objects have already been constructed. But what order are they constructed in? This gets complicated. Basically, static variables are constructed in the order they are declared in a single .cpp file. However, the order these .cpp files are constructed in is undefined. So, you can have static variables that rely on each other inside a single .cpp file, but never between different files.

Likewise, all static lifetime objects get deconstructed after your main() function returns, and once again, this order is random, although it should be in the reverse order they were constructed in. Technically this should be respected even if an exception occurs, but because the compiler can assume the process will terminate immediately after an unhandled exception occurs, this is unreliable.

Static lifetimes still apply for shared libraries, and are constructed the moment the library is loaded into memory - that’s LoadLibrary on Windows and dlopen on Linux. Most kernels provide a custom function that fires when the shared library is loaded or unloaded, and these functions fall outside of the C++ standard, so there’s no gaurantee about whether the static constructors have actually been called when you’re inside the DllLoad, but almost nobody actually needs to worry about those edge cases, so for any normal code, by the time any function in your DLL can be called by another program, you can rest assured all static and global variables have had their constructors called. Likewise, they are destructed when the shared library is unloaded from memory.

While we’re here, there are a few gotchas in the previous example that junior programmers should know about. You’ll notice that I did not write static Foo* = new GlobalFoo(); - this will leak memory!. In this case, C++ doesn’t actually call the destructor because Foo doesn’t have a static lifetime, the pointer it’s stored in does!. So the pointer will get it’s constructor called before the program starts (which does nothing, because it’s a primitive), and then the pointer will have it’s destructor called after main() returns, which also does nothing, which means Foo never actually gets deconstructed or deallocated. Always remember that C++ is extremely picky about what you do. C++ won’t magically extend Foo’s lifetime to the lifetime of the pointer, it will instead do exactly what you told it to do, which is to declare a global pointer primitive.

Another thing to avoid is to not accidentally write Foo::instance.a = GlobalFoo.a;, because this doesn’t copy the integer, it copies the pointer from GlobalFoo to Foo::instance. This is extremely bad, because now Foo::instance will leak it’s pointer and instead try to free GlobalFoo’s pointer, which was already deleted by GlobalFoo, so the program will crash, but only AFTER successfully returning 3. In fact, it will crash outside of the main() function completely, which is going to look very weird if you don’t know what’s going on.

Implicit Constructors and Temporary Lifetimes

Lifetimes in C++ can get complicated, because they don’t just apply to function blocks, but also function parameters, return values, and expressions. This means that, for example, if we are calling a function, and we construct a new object inside the function call, there is an implicit lifetime that exists for the duration of the function call, which is well-defined but very weird unless you’re aware of exactly what’s going on. Let’s look at a simple example of a function call that constructs an object:

class Foo
{
  // Implicit constructor for Foo
  Foo(int b)
  {
    a = b;
  }
  // Empty Destructor for Foo
  ~Foo() {}
  
  int a;
}

int get(Foo foo)
{
  return foo.a;
}

int main()
{
  return get(3);
}

To understand what’s going on here, we need to understand implicit constructors, which are a “feature” of C++ you never wanted but got anyway. In C++, all constructors that take exactly 1 argument are implicit, which means the compiler will attempt to use call them to satisfy a type transformation. In this case, we are trying to pass 3 into the get() function. 3 has the type int, but get() takes an argument of type Foo. Normally, this would just cause an error, because the types don’t match. But because we have a constructor for Foo that takes an int, the compiler actually calls it for us, constructing an object of type Foo and passing it into the function! Here’s what it looks like if we do this ourselves:

int main()
{
  return get(Foo(3));
}

C++ has “helpfully” inserted this constructor for us inside the function call. So, now that we know our Foo object is being constructed inside the function call, we can ask a different question: When does the constructor get called, exactly? When is it destructed? The answer is that all the expressions in your function call are evaluated first, from left-to-right. Our expression allocated a new temporary Foo object by pushing it onto the stack and then calling the constructor. However, do be aware that compilers aren’t always so great about respecting initialization order in function calls or other initialization lists. But, ostensibly, they’re supposed to be evaluated from left-to-right.

So, once all expressions inside the parameteres have been evaluated, we then push the parameters on to the stack and copy the results of the expressions into them, allocate space on the stack for the return value, and then we enter the function. Our function executes, copies a return value into the space we reserved, finishes cleaning up, and returns. Then we do something with the return value and pop our parameters off the stack. Finally, after all the function parameter boilerplate has been finished, our expressions go out of scope in reverse order. This means that destructors are called from right-to-left after the function returns. This is all roughly equivilent to doing this:

int main()
{
  int b;
  {
    Foo a = Foo(3); // Construct Foo
    b = get(a); // Call function and copy result
  } // Deconstruct Foo
  return b;
}

This same logic works for all expressions - if you construct a temporary object inside an expression, it exists for the duration of the expression. However, the exact order that C++ evaluates expressions is extremely complicated and not always defined, so this is a bit harder to nail down. Generally speaking, an object gets constructed right before it’s needed to evaluate the expression, and gets deconstructed afterwards. These are “temporary lifetimes”, because the object only briefly exists inside the expression, and is deconstructed once the expression is evaluated. Because C++ expressions are not always ordered, you should not attempt to rely on any sort of constructor order for arbitrary expressions. As an example, we can inline our previous get() function:

int main()
{
  return Foo(3).a;
}

This will allocate a temporary object of type Foo, construct it with 3, copy out the value from a, and then deconstruct the temporary object before the return statement is evaluated. For the most part, you can just assume your objects get constructed before the expression happens and get destructed after it happens - try not to rely on ordering more specific than that. The specific ordering rules are also changing in C++20 to make it more strict, which means how strict the ordering is will depend on what compiler you’re using until everyone implements the standard properly.

For the record, if you don’t want C++ “helpfully” turning your constructors into implicit ones, you can use the explicit keyword to disable that behavior:

struct Foo
{
  explicit Foo(int b)
  {
    a = b;
  }
  ~Foo() {}
  
  int a;
};

Static Variables and Thread Local Lifetimes

Static variables inside a function (not a struct!) operate by completely different rules, because this is C++ and consistency is for the weak.

struct Foo
{
  explicit Foo(int b)
  {
    a = b;
  }
  ~Foo() {}
  
  int a;
};

int get()
{
  static Foo foo(3);
  
  return foo.a;
}

int main()
{
  return get() + get();
}

When is foo constructed? It’s not when the program starts - it’s actually only constructed the first time the function gets called. C++ injects some magic code that stores a global flag saying whether or not the static variable has been initialized yet. The first time we call get(), it will be false, so the constructor is called and the flag is set to true. The second time, the flag is true, so the constructor isn’t called. So when does it get destructed? After main() returns and the program is exiting, just like global variables!

Now, this static initialization is gauranteed to be thread-safe, but that’s only useful if you intend to share the value through multiple threads, which usually doesn’t work very well, because only the initialization is thread-safe, not accessing the variable. C++ has introduced a new lifetime called thread_local which is even weirder. Thread-local static variables only exist for the duration of the thread they belong to. So, if you have a thread-local static variable in a function, it’s constructed the first time you call the function on a per-thread basis, and destroyed when each thread exits, not the program. This means you are gauranteed to have a unique instance of that static variable for each thread, which can be useful in certain concurrency situations.

I’m not going to spend any more time on thread_local because to understand it you really need to know how C++ concurrency works, which is out of scope for this blog post. Instead, let’s take a brief look at Move Semantics.

Move Semantics

Let’s look at C++’s smart pointer implementation, unique_ptr<>.

int get(int* b)
{
  return *b;
}

int main()
{
  std::unique_ptr<int> p(new int());
  *p = 3;
  int a = get(p.get());
  return a;
}

Here, we allocate a new integer on the heap by calling new, then store it in unique_ptr. This ensures that when our function returns, our integer gets freed and we don’t leak memory. However, the lifetime of our pointer is actually excessively long - we don’t need our integer pointer after we’ve extracted the value inside get(). What if we could change the lifetime of our pointer? The actual lifetime that we want is this:

int get(int* b)
{
  return *b;
  // We want the lifetime to end here
}

int main()
{
  // Lifetime starts here
  std::unique_ptr<int> p(new int());
  *p = 3;
  int a = get(p.get());
  return a;
  // Lifetime ends here
}

We can accomplish this by using move semantics:

int get(std::unique_ptr<int>&& b)
{
  return *b;
  // Lifetime of our pointer ends here
}

int main()
{
  // Lifetime of our pointer starts here
  std::unique_ptr<int> p(new int());
  *p                = 3;
  int a             = get(std::move(p));
  return a;
  // Lifetime of p ends here, but p is now empty
}

By using std::move, we transfer ownership of our unique_ptr to the function parameter. Now the get() function owns our integer pointer, so as long as we don’t move it around again, it will go out of scope once get() returns, which will delete it. Our previous unique_ptr variable p is now empty, and when it goes out of scope, nothing happens, because it gave up ownership of the pointer it contained. This is how you can implement automatic memory management in C++ without needing to use a garbage collector, and Rust actually uses a more sophisticated version of this built into the compiler.

Move semantics can get very complex and have a lot of rules surrounding how temporary values work, but we’re not going to get into all that right now. I also haven’t gone into the many different ways that constructors can be invoked, and how those constructors interact with the different ways you can initialize objects. Hopefully, however, you now have a grasp of what lifetimes are in C++, which is a good jumping off point for learning about more advanced concepts.

^[1] Pedantic assembly-code analysts will remind us that the stack allocations usually happen exactly once, at the beginning of the function, and then are popped off at the very end of the function, but the standard technically doesn’t even require a stack to exist in the first place, so we’re really talking about pushing and popping off the abstract stack concept that the language uses, not what the actual compiled assembly code really does.

^[2] We’re dereferencing the pointer here because we want to return the value of the pointer, not the pointer itself! If you tried to return the pointer itself from the function, it would point to freed memory and crash after the function returned. Trying to return pointers from functions is a common mistake, so be careful if you find yourself returning a pointer to something. It’s better to use unique_ptr to manage lifetimes of pointers for you.

Published on May 1, 2021 at 11:04pm

Factorio Is The Best Technical Interview We Have

There’s been a lot of hand-wringing over The Technical Interview lately. Many people realize that inverting a binary tree on a whiteboard has basically zero correlation to whether or not someone is actually a good software developer. The most effective programming test anyone’s come up with is still Fizzbuzz. One consequence of this has been an increased emphasis on Open Source Contributions, but it turns out these aren’t a very good metric either, because most people don’t have that kind of time.

The most effective programming interview we have now is usually some kind of take-home project, where a candidate is asked to fix a bug or implement a small feature within a few days. This isn’t great because it takes up a lot of time, and they could recieve outside help (or, if the feature is sufficiently common, google it). On the other hand, some large companies have instead doubled-down on whiteboard style interviews by subjecting prospective engineers to multiple hour-long online coding assessments, with varying levels of invasive surveillience.

All these interviewing methods pale in comparison to a very simple metric: playing Factorio with someone. Going through an entire run of Factorio is almost the best possible indication of how well someone deals with common technical problems. You can even tweak the playthrough based on the seniority of the position you’re hiring for to get a better sense of how they’ll function in that role.

Factorio?

Factorio is a game about automation. The best introduction is probably this trailer, but in essence, your job is to build an automated factory capable of launching a rocket into space.

You begin with nothing. You mine stone manually to craft a smelter that can smelt iron ore you mined into iron plates, which you then use to build a coal-driven automatic miner. You could grab the iron ore from the miner and put it in the smelter yourself, but it’s more efficient to use an inserter to do the inserting for you. Then you can use the iron this gives you to make another miner, which automates coal mining. Then you can use belts to take the coal and use an inserter to put it in the iron miner. Then you use the iron plates this tiny factory produces to make a third miner to start gathering copper, which then lets you craft copper wire, which lets you craft a circuit, which lets you build a water pump. Combined with a boiler and a steam engine, you can then build produce power, and use this power to run a research facility to unlock new technology, like assembly machines. Once you’ve unlocked assembly machines, you can use your circuits to craft an assembly machine that can craft copper wire for you, and insert this into an assembly machine that crafts circuits for you.

Eventually you unlock trains and robots and logistic systems which help you deal with the increasing logistic complexity the game demands, until you finally manage to launch a rocket into space.

Self-Direction

The beginning of the game starts with no goals and barely any direction. A senior developer should be able to explore the UI and figure out a goal, then establish a plan for accomplishing that goal. A junior developer should be able to perform a task that a senior developer has provided for them. An intern is expected to require quite a bit of mentoring, but a junior developer should be able to troubleshoot basic problems with their own code before requiring assistance from the senior developer. An intermediate developer should be able to operate independently once given a task, but is not expected to do any architecture design.

In more concrete terms, you might expect the following:

An Intern is generally expected to be able to fill in a pre-placed blueprint, and use belts to hook up their blueprint with something else, like an ore patch.
A Junior Developer should be able to build a production line by themselves, although it probably won’t be very optimal. They may need assistance from the senior developer on how to route the belts properly to all of the intermediate assembly machines.
An Intermediate Developer should be capable of designing a near-optimal production line (without beacons) once given direction, with minimal oversight.
The Senior Developer needs no direction, and is capable of determining what goals need to happen and designing a plan of action, then delegating these tasks to other coders.

Teamwork

A critical aspect of software development is the ability to work on a team. This means coordinating your efforts with other people, accomadating the needs of other people’s designs and cooperating with the team, instead of simply running off on your own and refusing to adjust your design to help integrate it with someone else’s work. This, naturally, arises all the time in Factorio, because base layout designs are limited by physical space. As a result, you need to carefully consider what other people are doing, and sometimes adjust your design to fit in size constraints or deal with someone else’s design that took more room than anticipated.

Anyone who simply runs off and starts doing things themselves or fixing problems without telling people is going to quickly earn the ire of their fellow players, for the exact same reasons cowboy programmers do. Luckily, Factorio includes a built-in equivelent to git blame, by showing you the last player that modified any entity. Thus, when people duct tape temporary solutions and don’t inform the team about the problem they were fixing, when their temporary solution finally blows up, people will find out. If people want to win they game, they’ll have to learn to cooperate well with their teammates.

Debugging

One of the most important skills for any programmer is their ability to debug problems. This is perhaps the most obvious parallel between Factorio and real software engineering. Something can go wrong very far away from the actual source of the problem. Being able to rapidly hone in on the real problem is a critical skill, and the thinking process is almost identical to tracing the cause of a crash in an actual program. If an assembly machine has stopped working, first you have to see if there are multiple outputs that got backed up. Then you have to check what ingredient it’s missing. Then you have to trace the ingredient back through your factory to find out where you’re making it, and repeat ad nauseum.

Factorio’s debugging gets fairly complicated quite quickly. As soon as you start working on oil processing you’ll be dealing with cracking, where you’re dealing with 3 different outputs and if any of them get backed up for any reason, the entire thing stops. There are cases where your entire factory can grind to a halt because you started researching something that doesn’t require yellow science, which stopped using up robot frames, which stopped using up electric engines, which stopped using lubricant, which stopped consuming heavy oil, which backed up and stopped oil production, which made you run out of petroleum, which broke plastic, which broke red circuits, which broke the rest of the factory. Seasoned players will anticipate scenarios like this and use circuits to construct self-balancing oil cracking to ensure the system is balanced and will only back up if petroleum backs up. A new player who is a good programmer, when presented with a factory that has collapsed, will usually be able to trace the issue back to the source, realize what’s happened, and promptly attempt to figure out a solution. On the other hand, if someone simply plops down a few storage tanks, unless they can provide a good reason (they are very confident we will never stop consuming lubricant in the future), then this is a red flag for how they approach problem solving in their programs.

Situations like these allow Factorio to closely mimic the complex interdependencies that programmers routinely deal with, and the complexity simply increases the more gameplay concepts are added. This closely follows the increased complexity that additional layers of abstraction introduce when attempting to debug a crash that could have potentially occured deep inside one of the frameworks you use.

Code Reviews

Often, initial designs need to be tweaked for performance or throughput. Good programmers will not only accept critique of their designs, but incorporate that feedback into their future work. If they disagree with a proposed change, they will provide a concrete reason for why they disagree so that the team can more accurately weigh the pros and cons of the proposed change.

Resisting feedback without providing good reasons is a well-known red flag, but what can also be problematic is a programmer who begrudgingly accepts proposed changes, but refuses to adjust future designs accordingly. They end up requiring constant reminders to adhere to some standard way of solving a problem while giving no reason for why they don’t seem to like the way the team is doing things. These can be ticking time-bombs in organizations, because when left unsupervised they can rapidly accumulate technical debt for other team members. This kind of problem is almost impossible to catch in a traditional interview, unless it’s an internship.

Code Style and Frameworks

Refusing to incorporate feedback is often just a slice of a much larger problem, where someone is unable to integrate properly into an existing framework being used. There are many ways to build a factory in Factorio, and each one requires standard methods of building pieces. Failing to adhere to standards can very quickly jam up an entire factory, often in subtle ways that aren’t necessarily obvious to a careless developer.

In the Main Belt design, a set of 4-8 chunk of belts, divided by 2 spaces to allow for underground belts, are placed in the center of the factory, and all production happens perpendicular to the belt. This design relies on several rules that can wreck havoc if not followed correctly. One, players must always use a splitter to pull items off of a belt, never redirecting the entire belt, otherwise using the empty space for a different belt of items means you’ll have permanently lost one entire belt of resources, even after upgrading belts. Two, all factories must be scalable in a direction perpendicular to the main belt. Failing to do this will rapidly result in either a massive waste of space, or a production line that cannot be scaled up because it’s surrounded by other production lines.

There are also different ways of building logistic networks. The simplest method is with passive provider chests, but another method uses a storage chest with a filter, which is used to solve the trashed item problem. Both of these methods require properly setting limiters in the right location. Passive provider chests generally are limited by chest space. Storage chests require hooking the inserter for the chest up to the logistics network and ensuring that less than N of an item exists before inserting it. Forgetting to perform these limiting steps is a massive waste of resources. Consistently forgetting to put limiters on outputs is a red flag for someone who is careless about performance in real-world applications.

In other cases, the team may be using some pre-designed blueprints, like a nuclear reactor design, or a bot factory. These can be extremely complex, but as long as people are willing to learn how to use them, they can be huge time-savers. Beware of candidates who don’t want to learn how to set up a new item in the bot factory simply because they can’t debug the complex logic that drives it, or ones that get frustrated learning how to use a bot factory despite the clear and obvious benefits.

Multithreading

Trains in Factorio are a direct analogue to multithreading: one train is one thread of execution, and each train intersection or train stop is a place in memory where two threads could potentially write at the same time. Train signals are locks, or mutexes. All bugs in train networks manifest in exactly the same way software race conditions do, because they’re literally physical race conditions. All of the tradeoffs apply here as well - if you make a lock too large, it slows down your throughput, because now the intersection is blocked for a longer period of time. Incorrectly signaled tracks routinely cause train deadlocks that are exactly the same as a software deadlock, because you end up with a circular lock dependency. The most common deadlock is when a train is too long and unexpectedly blocks a second intersection while waiting to enter one. This second intersection then prevents another train from leaving, preventing the first intersection from ever being unblocked.

The number of lanes of track in your network is equivilent to the number of cores available in your CPU. A single rail line is difficult to scale beyond a few threads because the entire system gets throughput limited very quickly, even with wait areas. The most common design is a two-lane design where each lane is a single direction, but this will eventually suffer from throughput issues when you need trains constantly being unloaded. Thus, large bases tend to have at least 4 lanes, with two outer lanes acting as bypasses to avoid the intersection whenever possible.

Missing signal problems in these systems can take a ridiculous amount of time to actually show up. A single missing signal in one rail network once caused a deadlock after functioning correctly for two weeks. This is remniscient of difficult to pin down race conditions in software that only occur once a month or so when under high contention.

Scaling

Just like in software, scaling up production in Factorio introduces new problems with initial designs, and often require complete redesigns that can pipe resources into factories as fast as possible, while taking advange of production modules and speed module beacons. Belt limits become problematic even at the fastest belt speed, forcing players to find ways to split designs up so that more belts can be put in later down the line, or split up their factories into modules.

Handling your logistics network itself becomes a logistics problem in the late game because of how problematic expansive bot networks are. You generally need to start segmenting the logistics network and either using trains to transport items between them, or build a requester chest/provider chest that propagates items across bounderies.

Managing trains in the late game necessitates switching to a pull architecture from a push architecture, because the push architecture can’t function in high throughput. This inevitably requires taking advantage of the Train Limits feature and learning how circuit networks can be used to encode basic logic, such that a station only requests a train when it is actually ready to completely fill the train with resources, instead of the common early game tactic of simply telling a bunch of trains to go to stations named “Iron Pickup”. This minimizes the number of trains you need while making sure all stops are served on the network.

Often times, limitations in the number of possible inputs to an assembly machine and inserter speed require redesigning factories around them, just like how high-speed computing requires being aware of subtle bottlenecks in how your CPU works. These bottlenecks are almost never a problem until you reach a certain scale, at which point they begin to dominate your efficiency.

Microservices and Plugin Architectures

Eventually, factories get so enormous they must abandon a simple main belt or spaghetti design and use a more scalable framework instead. To reach Megabase-scale, factories generally either use a train system or a module system, which corresponds roughly to microservices or a plugin-architecture.

A train-based megabase is sometimes referred to a “city-block” design, where trains surrounded factory blocks and control all input and output. Thus, each individual city-block is isolated from all other city-blocks, since all their input is “pure” in that it comes from the train network. This is almost identical to a micro-services architecture (over HTTP) or a multi-process design (using IPC), and has similar potential issues with input and output latency, because results cannot be continually provided, they must be emitted in “packets”, or trains along the network.

The plugin architecture seeks to maintain some semblence of a main-belt, but instead splits belts off through the factory and uses modular blocks that take standard inputs and standard outputs. Sometimes this can be achieved entirely through bots, but materials usually need to be belted over long distances. This closely resembles a plugin system for a monolithic application, and has similar tradeoffs.

These megabases mark the extreme upper end of a vanilla Factorio server. However, there are plenty of mods to make things much more complex.

Distributed Systems

Space Exploration is an overhaul of Factorio that adds an entire space segment of the game, and makes planets have limited resources, requiring players to colonize other planets and use rockets to transfer resources between planets. Because of the enormous latency involved with shipping materials between planets, coordinating these different bases winds up having similar problems to a globally distributed database system. Even the circuit network has to contend with latency, because an automatic request system loses track of items that have been launched but haven’t yet reached the target planet. Not accounting for this will result in a double-request for all the items you wanted, which is the exact same problem that distributed systems have when trying to ensure consistency.

Conclusion

Collectively, the software industry simply has no idea how to hire software developers. Factorio is probably the best technical interview we have right now, and that’s embarassing. It is also wildly impractical, taking over 20 hours in an initial multiplayer playthrough, or 8 hours if you have a lot of people and know what you’re doing. What’s the takeaway from this? I don’t know. We certainly can’t switch to using Factorio as an interviewing method - you might as well just give a candidate a take-home assignment.

At the very least, we can do better than whiteboard interviews.

Published on March 12, 2021 at 3:26pm

Why You Can't Use Prebuilt LLVM 10.0 with C++17

C++17 introduced an alignment argument to ::operator new(). It’s important to note that if you allocate something using aligned new, you absolutely must deallocate it using aligned delete, or the behavior is undefined. LLVM 10.x takes advantage of this alignment parameter, if the compiler supports it. That means if you are compiling on Windows with MSVC set to C++14, __cpp_aligned_new is not defined and the extra argument isn’t passed. Otherwise, if it’s compiled with MSVC set to C++17, __cpp_aligned_new is defined and the extra argument is passed.

There’s just one teeny tiny little problem - the check was in a header file.

Now, if this header file were completely internal and LLVM itself only ever called ::operator new() from inside it’s libraries, this wouldn’t be a problem. As you might have guessed, this was not the case. LLVM was calling allocate_buffer() from inside header files. The corresponding deallocate_buffer call was inside the same header file, but sadly, it was called from a destructor, and that destructor had been called from inside one of LLVM’s libraries, which meant it didn’t know about aligned allocations… Oops!

This means, if your program uses C++17, but LLVM was compiled with C++14, your program will happily pass LLVM aligned memory, which LLVM will then pass into the wrong delete operator, because it’s calling the unaligned delete function from C++14 instead of the aligned delete function from C++17. This results in strange and bizarre heap corruption errors because of the mismatched ::operator new and ::operator delete. Arguably, the real bug here is LLVM calling any allocation function from a header file in the first place, as this is just begging for ABI problems.

Of course, one might ask why a C++17 application would be linking against LLVM compiled with C++14. Well, you see, the prebuilt LLVM binaries were compiled against C++14… OOPS! Suddenly if you wanted to avoid compiling LLVM yourself, you couldn’t use C++17 anymore!

Luckily this has now been fixed after a bunch of people complained about the extremely unintuitive errors resulting from this mismatch. Unfortunately, as of this writing, LLVM still hasn’t provided a new release, which means you will still encounter this problem using the latest precompiled binaries of LLVM. Personally, I recompile LLVM using C++17 for my own projects, just to be safe, since LLVM does not gaurantee ABI compatibility in these situations.

Still, this is a particularly nasty case of an unintentional ABI break between C++ versions, which is easy to miss because most people assume C++ versions are backwards compatible. Be careful, and stop allocating memory in header files that might be deallocated inside your library!

Published on July 9, 2020 at 2:50am