Erik McClure

Pressure Based Anti-Spam for Discord Bots


Back when Discord was a wee little chat platform with no rate limiting whatsoever, it’s API had already been reverse-engineered by a bunch of bot developers who then went around spamming random servers with so many messages it would crash the client. My friends and I were determined to keep our discord server public, so I went about creating the first anti-spam bot for Discord.

I am no longer maintaining this bot beyond simple bugfixes because I have better things to do with my time, and I’m tired of people trying to use it because “it’s the best anti-spam bot” even after I deprecated it. This post is an effort to educate all the other anti-spam bots on how to ascend beyond a simple “mute someone if they send more than N messages in M seconds” filter and hopefully make better anti-spam bots so I don’t have to.

Architecture

There are disagreements over exactly how an anti-spam bot should work, with a wide-range of preferred behaviors that differ between servers, depending on the community, size of the server, rate of growth of the server, and the administrators personal preferences. Many anti-spam solutions lock down the server by forcing new members to go through an airlock channel, but I consider this overkill and the result of poor anti-spam bots that are bad at actually stopping trolls or responding to raids.

My moderation architecture goes like this:

-- Channels --  
#Moderation Channel  
#Raid Containment  
#Silence Containment  
#Log Channel

-- Roles --  
@Member  
@Silence  
@New

When first-time setup is run on a server, the bot queries and stores all the permissions given to the @everyone role. It adds all these permissions to the @Member role, then goes through all existing users and adds the @Member role to them. It then removes all permissions from @everyone, but adds an override for #Raid Containment so that anyone without the @Member role can only speak in #Raid Containment.

Then, it creates the @Silence role by adding permission overrides to every single channel on the server that prevent you from sending messages on that channel, except for the #Silence Containment channel. This ensures that, no matter what new roles might be added in the future, @Silence will always prevent you from sending messages on any channel other than #Silence Containment. The admins of the server can configure the visibility of the containment channels. Most servers make it so #Raid Containment is only visible to anyone without the @Member role, ensure #Silence Containment is only visible to anyone with the @Silence role, and hide #Log Channel from everyone except admins.

This architecture was chosen to allow the bot to survive a mega-raid, so that even if 500+ bot accounts storm the server all at once, if the bot goes down or is rate-limited, they can’t do anything because they haven’t been given the @Member role, and so the server is protected by default. This is essentially an automated airlock that only engages if the bot detects a raid, instead of forcing all new members to go through the airlock. Some admins do not like this approach because they prefer to personally audit every new member, but this is not viable for larger servers.

The @Member role being added can also short-circuit Discord’s own verification rules, which is an unfortunate consequence, but in practice it usually isn’t a problem. However, to help compensate for this, the bot can also be configured to add a temporary @New role to newer users that expires after a configurable amount of time. What this role actually does is up to the administrators - usually it simply disables sharing images or embeds.

Operation

The bot has two distinct modes of operation. Under normal operation, when a new member joins the server, they are automatically given @Member (and @New if it’s been enabled) and are immediately allowed to speak. If they trigger the anti-spam, or a moderator uses the !silence command, they will have the @Silence role added, any messages sent in the last 5 seconds (by default) will be deleted, and they’ll be restricted to speaking in #Silence Containment. By default, silences that happen automatically are never rescinded until a moderator manually unsilences someone. However, some server admins are lazy, so an automatic expiration can be added. A manual silence can also be configured to expire after a set period of time. If a user triggers the anti-spam filter again while they are in the containment channel, then the bot will automatically ban them.

It’s important to note that when silenced, users have both the @Member and the @Silence role. The reason is because many servers have opt-in channels that are only accessible if you add a role to yourself. Simply removing @Member would not suffice to keep them from talking in these channels, but @Silence can override these permissions and ensure they can’t speak in any channel without having to mess up anyone’s assigned roles.

The bot detects a raid if N joins happen within M seconds. It’s usually configured to be something like 3 joins in a 90 second interval for an active server. When this happens, the server enters raid mode, which automatically removes @Member from all the users that just triggered the raid alert. It sends a message pinging the moderators in #Moderator Channel and stops adding @Member to anyone who joins while it is still in raid mode. It also changes the server verification level, which actually does have an effect for the newly joined members, because they haven’t had any roles assigned to them yet. Moderators can either cancel the raid alert, which automatically adds @Member to everyone who was detected as part of the raid alert, or they can individually add @Member to people they have vetted. The raid mode automatically ends after M*2 seconds, which would be 180 in this example. A moderator can use a command to ban everyone who joined during the raid alert, or they can selectively add @Member to users who should be let in.

This method of dealing with raids ensures that the bot cannot be overwhelmed by the sheer number of joins. If it crashes or is taken down, the server stays in raid mode until the bot can recover, and the potential raiders will not be allowed in.

Pressure System

All the anti-spam logic in the bot is done by triggering an automatic silence, which gives someone the @Silence role. The bot determines when to silence someone by analyzing all the messages being sent using a pressure system. In essence, the pressure system is analogous to the gravity function used for calculating the “hotness” of a given post on a site like Hacker News.

Each message a user sends is assigned a “pressure score”. This is calculated from the message length, contents, whether it matches a filter, how many newlines it has, etc. This “pressure” is designed to simulate how disruptive the message is to the current chat. A wall of text is extremely disruptive, whereas saying “hi” probably isn’t. Attaching 3 images to your message is very disruptive, but embedding a single link isn’t that bad. Sending a blank message with 2000 newlines is incredibly disruptive, whereas sending a message with two paragraphs is probably fine. In addition, all messages are assigned a base pressure, the minimum amount of pressure that any message will have, even if it’s only a single letter.

My bot looks at the following values:

  • Max Pressure: The default maximum pressure is 60.
  • Base Pressure: All messages generate 10 pressure by default, so at most 6 messages can be sent at once.
  • Embed Pressure: Each image, link, or attachment in a message generates 8.3 additional pressure, which limits users to 5 images or links in a single message.
  • Length Pressure: Each individual character in a message generates 0.00625 additional pressure, which limits you to sending 2 maximum length messages at once.
  • Line Pressure: Each newline in a message generates 0.714 additional pressure, which limits you to 70 newlines in a single message.
  • Ping Pressure: Every single unique ping in a message generates 2.5 additional pressure, which limits users to 20 pings in a single message.
  • Repeat Pressure: If the message being sent is the exact same as the previous message sent by this user (copy+pasted), it generates 10 additional pressure, effectively doubling the base pressure cost for the message. This means a user copy and pasting the same message more than twice in rapid succession will be silenced.
  • Filter Pressure: Any filter set by the admins can add an arbitrary amount of additional pressure if a message triggers that filter regex. The amount is user-configurable.

And here a simplified version of my implementation. Note that I am adding the pressure piecewise, so that I can alert the moderators and tell them exactly which pressure trigger broke the limit, which helps moderators tell if someone just posted too many pictures at once, or was spamming the same message over and over:

if w.AddPressure(info, m, track, info.Config.Spam.BasePressure) {
	return true
}
if w.AddPressure(info, m, track, info.Config.Spam.EmbedPressure*float32(len(m.Attachments))) {
	return true
}
if w.AddPressure(info, m, track, info.Config.Spam.EmbedPressure*float32(len(m.Embeds))) {
	return true
}
if w.AddPressure(info, m, track, info.Config.Spam.LengthPressure*float32(len(m.Content))) {
	return true
}
if w.AddPressure(info, m, track, info.Config.Spam.LinePressure*float32(strings.Count(m.Content, "\n"))) {
	return true
}
if w.AddPressure(info, m, track, info.Config.Spam.PingPressure*float32(len(m.Mentions))) {
	return true
}
if len(m.Content) > 0 && m.Content == track.lastcache {
	if w.AddPressure(info, m, track, info.Config.Spam.RepeatPressure) {
		return true
	}
}
Once we’ve calculated how disruptive a given message is, we can add it to the user’s total pressure score, which is a measure of how disruptive a user is currently being. However, we need to recognize that sending a wall of text every 10 minutes probably isn’t an issue, but sending 10 short messages saying “ROFLCOPTER” in 10 seconds is definitely spamming. So, before we add the message pressure to the user’s pressure, we check how long it’s been since that user last sent a message, and decrease their pressure accordingly. If it’s only been a second, the pressure shouldn’t decrease very much, but if it’s been 3 minutes or so, any pressure they used to have should probably be gone by now. Most heat algorithms do this non-linearly, but my experiments showed that a linear drop-off tends to produce better results (and is simpler to implement).

My bot implements this by simply decreasing the amount of pressure a user has by a set amount for every N seconds. So if it’s been 2 seconds, they will lose 4 pressure, until it reaches zero. Here is the implementation for my bot, which is done before any pressure is added:

timestamp := bot.GetTimestamp(m)
track := w.TrackUser(author, timestamp)
last := track.lastmessage
track.lastmessage = timestamp.Unix()*1000 + int64(timestamp.Nanosecond()/1000000)
if track.lastmessage < last { // This can happen because discord has a bad habit of re-sending timestamps if anything so much as touches a message
	track.lastmessage = last
	return false // An invalid timestamp is never spam
}
interval := track.lastmessage - last

track.pressure -= info.Config.Spam.BasePressure * (float32(interval) / (info.Config.Spam.PressureDecay * 1000.0))
if track.pressure < 0 {
	track.pressure = 0
}
In essence, this entire system is a superset of the more simplistic “N messages in M seconds”. If you only use base pressure and maximum pressure, then these determine the absolute upper limit of how many messages can be send in a given time period, regardless of their content. You then tweak the rest of the pressure values to more quickly catch obvious instances of spamming. The maximum pressure can be altered on a per-channel basis, which allows meme channels to spam messages without triggering anti-spam. Here’s what a user’s pressure value looks like over time:

Pressure Graph

Because this whole pressure system is integrated into the Regex filtering module, servers can essentially create their own bad-word filters by assigning a huge pressure value to a certain match, which instantly creates a silence. These regexes can be used to look for other things that the bot doesn’t currently track, like all-caps messages, or for links to specific websites. The pressure system allows integrating all of these checks in a unified way that represents the total “disruptiveness” of an individual user’s behavior.

Worst Case Scenario

A special mention should go to the uncommon but possible worst-case scenario for spamming. This happens when a dedicated attacker really, really wants to fuck up your server for some reason. They can do this by creating 1000+ fake accounts with randomized names, and wait until they’ve all been on discord for long enough that the “new to discord” message doesn’t show up. Then, they have the bots join the server extremely slowly, at the pace of maybe 1 per hour. Then they wait another week or so, before they unleash a spam attack that has each account send exactly 1 message, followed by a different account sending another message.

If they do this fast enough, you can detect it simply by the sheer volume of messages being sent in the channel, and automatically put the channel in slow mode. However, in principle, it is completely impossible to automatically deal with this attack. Banning everyone who happens to be sending messages during the spam event will ban innocent bystanders, and if the individual accounts aren’t sending messages that fast (maybe one per second), this is indistinguishable from a normal rapid-fire conversation.

My bot has an emergency method for dealing with this where it tracks when a given user sent their first message in the server. You can then use a command to ban everyone who sent their first message in the past N seconds. This is very effective when it works, but it is trivial for spammers to get past once they realize what’s going on - all they need to do is have their fake accounts say “hi” once while they’re trickling in. Of course, a bunch of accounts all saying “hi” and then nothing else may raise enough suspicion to catch the attack before it happens.

Conclusion

Hopefully this article demonstrates how to make a reliable, extensible anti-spam bot that can deal with pretty much any imaginable spam attack. The code for my deprecated spam bot is available here for reference, but the code is terrible and most of it was a bad idea. You cannot add the bot itself to a server anymore, and the support channel is no longer accessible.

My hope is that someone can use these guidelines to build a much more effective anti-spam bot, and release me from my torment. I hate web development and I’d rather spend my time building a native webassembly compiler instead of worrying about weird intermittent cloudflare errors that temporarily break everything, or badly designed lock systems that deadlock under rare edge cases. Godspeed, bot developers.


Name Shadowing Should Be An Operator


I recently discovered that in Rust, this is a relatively common operation:

let foo = String::from("foo");
// stuff that needs ownership
let foo = &foo;  
Or this:
let mut vec = Vec::new();
vec.push("a");
vec.push("b");
let vec = vec; /* vec is immutable now */  
This is a particularly permissive form of name-shadowing, which allows you to re-declare a variable in an inner scope that shadows the name of a variable in an outer scope, making the outer variable inaccessible. Almost every single programming language in common use allows you to do this in some form or another. Rust goes a step further and lets you re-declare a variable inside the same scope as another variable.

This, to me, is pretty terrifying, because name-shadowing itself is often a dangerous operation. 90% of the time, name-shadowing causes problems with either temporary or index variables, such as i. These are all cases where almost you never want to name-shadow anything and doing so is probably a mistake.

for(int i = 0; i < width; ++i)
{
  for(int j = 0; j < height; ++j)
  {
    //
    // lots of unrelated code
    //
    
    float things[4] = {1,2,3,4};
    for(int i = 0; i < 4; ++i)
    	box[i][j] += things[i] // oops!
    //      ^ that is the wrong variable!
  }
}
These errors almost always crop up in complex for loop scenarios, which can obviously be avoided with iterators, but this isn’t always an option, espiecally in a lower-level systems programming language like Rust. Even newer languages like Go make it alarmingly easy to name-shadow, although they make it a lot harder to accidentally shoot yourself because unused variables are a compiler error:
foo, err := func()
if err != nil {
	err := bar(foo) // error: err is not used
}
println(err)
Unfortunately, Rust doesn’t have this error, only an unused-variables linter warning. If you want, you can add #![deny(clippy::shadow_unrelated)] to be warned about name-shadowing. However, a lot of Rust idioms depend on the ability to name-shadow, because Rust does a lot of type-reassignment, where the contents of a variable don’t change, but in a particular scope, the known type of the variable has changed to something more specific.
let foo = Some(5);
match foo {
    Some(foo) => println!("{}", foo),
    None => {},
}
Normally, in C++, I avoid name-shadowing at all costs, but this is partially because C++ shadowing rules are unpredictable or counter-intuitive. Rust, however, seems to have legitimate use cases for name-shadowing, which would be reasonable if name-shadowing could be made more explicit.

I doubt Rust would want to change it’s syntax, but if I were to work on a new language, I would define an explicit name-shadowing operator, either as a keyword or as an operator. This operator would redefine a variable as a new type and throw a compiler error if there is no existing variable to redefine. Attempting to redefine a variable without this operator would also throw a compiler error, so you’d have something like:

let foo = String::from("foo");
// stuff that needs ownership
foo := &foo;
Or alternatively:
let mut vec = Vec::new();
vec.push("a");
vec.push("b");
shadow vec = vec; /* vec is immutable now */
While the := operator is cleaner, the shadow keyword would be easier to embed in other constructions:
let foo = Some(5);
match foo {
    Some(shadow foo) => println!("{}", foo),
    None => {},
}
Which method would depend on the idiomatic constructions in the language syntax, but by making name-shadowing an explicit, rather than an implicit action, this allows you to get the benefits of name-shadowing while eliminating most of the dangerous situations it can create.

Unfortunately, most modern language design seems hostile to any feature that even slightly inconveniences a developer for the sake of code safety and reliability. Perhaps a new language in the future will take these lessons to heart, but in the meantime, people will continue complaining about unstable software, at least until we put the “engineer” back in “software engineering”.


A Rant On Terra


Metaprogramming, or the ability to inspect, modify and generate code at compile-time (as opposed to reflection, which is runtime introspection of code), has slowly been gaining momentum. Programmers are finally admitting that, after accidentally inventing turing complete template systems, maybe we should just have proper first-class support for generating code. Rust has macros, Zig has built-in compile time expressions, Nim lets you rewrite the AST however you please, and dependent types have been cropping up all over the place. However, with great power comes great responsibility undecidable type systems, whose undefined behavior may involve summoning eldritch abominations from the Black Abyss of Rěgne Ūt.

One particular place where metaprogramming is particularly useful is low-level, high-performance code, which is what Terra was created for. The idea behind Terra is that, instead of crafting ancient runes inscribed with infinitely nested variadic templates, just replace the whole thing with an actual turing-complete language, like say, Lua (technically including LuaJIT extensions for FFI). This all sounds nice, and no longer requires a circle of salt to ward off demonic syntax, which Terra is quick to point out. They espouse the magical wonders of replacing your metaprogramming system with an actual scripting language:

In Terra, we just gave in to the trend of making the meta-language of C/C++ more powerful and replaced it with a real programming language, Lua.

The combination of a low-level language meta-programmed by a high-level scripting language allows many behaviors that are not possible in other systems. Unlike C/C++, Terra code can be JIT-compiled and run interleaved with Lua evaluation, making it easy to write software libraries that depend on runtime code generation.

Features of other languages such as conditional compilation and templating simply fall out of the combination of using Lua to meta-program Terra

Terra even claims you can implement Java-like OOP inheritance models as libraries and drop them into your program. It may also cure cancer (the instructions were unclear).

As shown in the templating example, Terra allows you to define methods on struct types but does not provide any built-in mechanism for inheritance or polymorphism. Instead, normal class systems can be written as libraries. More information is available in our PLDI Paper.

The file lib/javalike.t has one possible implementation of a Java-like class system, while the file lib/golike.t is more similar to Google’s Go language.

I am here to warn you, traveler, that Terra sits on a throne of lies. I was foolish. I was taken in by their audacious claims and fake jewels. It is only when I finally sat down to dine with them that I realized I was surrounded by nothing but cheap plastic and slightly burnt toast.

The Bracket Syntax Problem

Terra exists as a syntax extension to Lua. This means it adds additional keywords on top of Lua’s existing grammar. Most languages, when extending a syntax, would go to great lengths to ensure the new grammar does not create any ambiguities or otherwise interfere with the original syntax, treating it like a delicate flower that mustn’t be disturbed, lest it lose a single petal.

Terra takes the flower, gently places it on the ground, and then stomps on it, repeatedly, until the flower is nothing but a pile of rubbish, as dead as the dirt it grew from. Then it sets the remains of the flower on fire, collects the ashes that once knew beauty, drives to a nearby cliffside, and throws them into the uncaring ocean. It probably took a piss too, but I can’t prove that.

To understand why, one must understand what the escape operator is. It allows you to splice an abstract AST generated from a Lua expression directly into Terra code. Here is an example from Terra’s website:

function get5()
  return 5
end
terra foobar()
  return [ get5() + 1 ]
end
foobar:printpretty()
> output:
> foobar0 = terra() : {int32}
> 	return 6
> end
But, wait, that means it’s… the same as the array indexing operator? You don’t mean you just put it inside like–

local rest = {symbol(int),symbol(int)}

terra doit(first : int, [rest])
  return first + [rest[1]] + [rest[2]]
end

What.

WHAT?!

You were supposed to banish the syntax demons, not join them! This abomination is an insult to Nine Kingdoms of Asgard! It is the very foundation that Satan himself would use to unleash Evil upon the world. Behold, mortals, for I come as the harbinger of despair:

function idx(x) return `x end
function gen(a, b) return `array(a, b) end

terra test()
  -- Intended to evaluate to array(1, 2) 0
  return [gen(1, 2)][idx(0)]
end

For those of you joining us (probably because you heard a blood-curdling scream from down the hall), this syntax is exactly as ambiguous as you might think. Is it two splice statements put next to each other, or is a splice statement with an array index? You no longer know if a splice operator is supposed to index the array or act as a splice operator, as mentioned in this issue. Terra “resolves this” by just assuming that any two bracketed expressions put next to each other are always an array indexing operation, which is a lot like fixing your server overheating issue by running the fire suppression system all day. However, because this is Lua, whose syntax is very much like a delicate flower that cannot be disturbed, a much worse ambiguity comes up when we try to fix this.

function idx(x) return `x end
function gen(a, b) return `array(a, b) end

terra test()
  -- This is required to make it evaluate to array(1,2)[0]
  -- return [gen(1, 2)][ [idx(0)] ]
  -- This doesn't work:
  return [gen(1, 2)][[idx(0)]]
  -- This is equivalent to:
  -- return [gen(1, 2)] "idx(0)"
end

We want to use a spliced Lua expression as the array index, but if we don’t use any spaces, it turns into a string because [[string]] is the Lua syntax for an unescaped string! Now, those of you who still possess functioning brains may believe that this would always result in a syntax error, as we have now placed a string next to a variable. Not so! Lua, in it’s infinite wisdom, converts anything of the form symbol"string" or symbol[[string]] into a function call with the string as the only parameter. That means that, in certain circumstances, we literally attempt to call our variable as a function with our expression as a string:

local lookups = {x = 0, y = 1, z = 2, w = 3 };
  vec.metamethods.__entrymissing = macro(function(entryname, expr)
    if lookups[entryname] then
      -- This doesn't work
      return `expr.v[[lookups[entryname]]]
      -- This is equivalent to
      -- return `expr.v "lookups[entryname]"
      -- But it doesn't result in a syntax error, becase it's equivalent to:
      -- return `extr.v("lookups[entryname]")
    else
      error "That is not a valid field."
    end
  end)

As a result, you get a type error, not a syntax error, and a very bizarre one too, because it’s going to complain that v isn’t a function. This is like trying to bake pancakes for breakfast and accidentally going scuba diving instead. It’s not a sequence of events that should ever be related in any universe that obeys causality.

It should be noted that, after a friend of mine heard my screams of agony, an issue was raised to change the syntax to a summoning ritual that involves less self-mutilation. Unfortunately, this is a breaking change, and will probably require an exorcism.

The Documentation Is Wrong

Terra’s documentation is so wrong that it somehow manages to be wrong in both directions. That is, some of the documentation is out-of-date, while some of it refers to concepts that never made it into master. I can only assume that a time-traveling gremlin was hired to write the documentation, who promptly got lost amidst the diverging timelines. It is a quantum document, both right and wrong at the same time, yet somehow always useless, a puzzle beyond the grasp of modern physics.

  • The first thing talked about in the API Reference is a List object. It does not actually exist. A primitive incarnation of it does exist, but it only implements map() and insertall(). Almost the entire section is completely wrong for the 1.0.0-beta1 release. The actual List object being described sits alone and forgotten in the develop branch, dust already beginning to collect on it’s API calls, despite those API calls being the ones in the documentation… somehow.
  • :printpretty() is a function that prints out a pretty string representation of a given piece of Terra code, by parsing the AST representation. On it’s face, it does do exactly what is advertised: it prints a string. However, one might assume that it returns the string, or otherwise allows you to do something with it. This doesn’t happen. It literally calls the print() function, throwing the string out the window and straight into the stdout buffer without a care in the world. If you want the actual string, you must call either layoutstring() (for types) or prettystring() (for quotes). Neither function is documented, anywhere.
  • Macros can only be called from inside Terra code. Unless you give the constructor two parameters, where the second parameter is a function called from inside a Lua context. This behavior is not mentioned in any documentation, anywhere, which makes it even more confusing when someone defines a macro as macro(myfunction, myfunction) and then calls it from a Lua context, which, according to the documentation, should be impossible.
  • Struct fields are not specified by their name, but rather just held in a numbered list of {name, type} pairs. This is documented, but a consequence of this system is not: Struct field names do not have to be unique. They can all be the same thing. Terra doesn’t actually care. You can’t actually be sure that any given field name lookup will result in, y’know, one field. Nothing mentions this.
  • The documentation for saveobj is a special kind of infuriating, because everything is technically correct, yet it does not give you any examples and instead simply lists a function with 2 arguments and 4 interwoven optional arguments. In reality it’s absolutely trivial to use because you can ignore almost all the parameters. Just write terralib.saveobj("blah", {main = main}) and you’re done. But there isn’t a single example of this anywhere on the entire website. Only a paragraph and two sentences explaining in the briefest way possible how to use the function, followed by a highly technical example of how to initialize a custom target parameter, which doesn’t actually compile because it has errant semicolons. This is literally the most important function in the entire language, because it’s what actually compiles an executable!
  • The defer keyword is critical to being able to do proper error cleanup, because it functions similar to Go’s defer by performing a function call at the end of a lexical scope. It is not documented, anywhere, or even mentioned at all on the website. How Terra manages to implement new functionality it forgets to document while, at the same time, documenting functionality that doesn’t exist yet is a 4-dimensional puzzle fit for an extra-dimensional hyperintelligent race of aliens particularly fond of BDSM.
  • You’d think that compiling Terra on Linux would be a lot simpler, but you’d be wrong. Not only are the makefiles unreliable, but cmake itself doesn’t seem to work with LLVM 7 unless you pass in a very specific set of flags, none of which are documented, because compiling via cmake isn’t documented at all, and this is the only way to compile with LLVM 7 or above on the latest Ubuntu release!

Perhaps there are more tragedies hidden inside this baleful document, but I cannot know, as I have yet to unearth the true depths of the madness lurking within. I am, at most, on the third or fourth circle of hell.

Terra Doesn’t Actually Work On Windows

Saying that Terra supports Windows is a statement fraught with danger. It is a statement so full of holes that an entire screen door could try to sell you car insurance and it’d still be a safer bet than running Terra on Windows. Attempting to use Terra on Windows will work if you have Visual Studio 2015 installed. It might work if you have Visual Studio 2013 installed. No other scenarios are supported, especially not ones that involve being productive. Actually compiling Terra on Windows is a hellish endeavor comparable to climbing Mount Everest in a bathing suit, which requires either having Visual Studio 2015 installed to the default location, or manually modifying a Makefile with the exact absolute paths of all the relevant dependencies. At least up until last week, when I submitted a pull request to minimize the amount of mountain climbing required.

The problem Terra runs into is that it tries to use a registry value to find the location of Visual Studio and then work out where link.exe is from there, then finds the include directories for the C runtime. This hasn’t worked since Visual Studio 2017 and also requires custom handling for each version because compiling an iteration of Visual Studio apparently involves throwing the directory structure into the air, watching it land on the floor in a disorganized mess, and drawing lines between vaguely related concepts. Good for divining the true nature of the C library, bad for building directory structures. Unfortunately, should you somehow manage to compile Terra, it will abruptly stop working the moment you try to call printf, claiming that printf does not actually exist, even after importing stdio.h.

Many Terra tests assume that printf actually resolves to a concrete symbol. This is not true and hasn’t been true since Visual Studio 2015, which turned several stdio.h functions into inline-only implementations. In general, the C standard library is under no obligation to produce an actual concrete symbol for any function - or to make sense to a mere mortal, for that matter. In fact, it might be more productive to assume that the C standard was wrought from the unholy, broiling chaos of the void by Cthulhu himself, who saw fit to punish any being foolish enough to make reasonable assumptions about how C works.

Unfortunately, importing stdio.h does not fix this problem, for two reasons. One, Terra did not understand inline functions on Windows. They were ephemeral wisps, vanishing like a mote of dust on the wind the moment a C module was optimized. A pull request fixed this, but it can’t fix the fact that the Windows SDK was wrought from the innocent blood of a thousand vivisected COMDAT objects. Microsoft’s version of stdio.h can only be described as an extra-dimensional object, a meta-stable fragment of a past universe that can only be seen in brief slivers, never all at once.

Luckily for the Terra project, I am the demonic presence they need, for I was once a Microsoftie. Long ago, I walked the halls of the Operating Systems Group and helped craft black magic to sate the monster’s unending hunger. I saw True Evil blossom in those dark rooms, like having only three flavors of sparkling water and a pasta station only open on Tuesdays.

I know the words of Black Speech that must be spoken to reveal the true nature of Windows. I know how to bend the rules of our prison, to craft a mighty workspace from the bowels within. After fixing the cmake implementation to function correctly on Windows, I intend to perform the unholy incantations required to invoke the almighty powers of COM, so that it may find on which fifth-dimensional hyperplane Visual Studio exists. Only then can I disassociate myself from the mortal plane for long enough to tackle the stdio.h problem. You see, children, programming for Windows is easy! All you have to do is s͏̷E͏l͏̢҉l̷ ̸̕͡Y͏o҉u͝R̨͘ ̶͝sơ̷͟Ul̴

For those of you who actually wish to try Terra, but don’t want to wait for me to fix everything a new release, you can embed the following code at the top of your root Terra script:

if os.getenv("VCINSTALLDIR") ~= nil then
  terralib.vshome = os.getenv("VCToolsInstallDir")
  if not terralib.vshome then
    terralib.vshome = os.getenv("VCINSTALLDIR")
    terralib.vclinker = terralib.vshome..[[BIN\x86_amd64\link.exe]]
  else
    terralib.vclinker = ([[%sbin\Host%s\%s\link.exe]]):format(terralib.vshome, os.getenv("VSCMD_ARG_HOST_ARCH"), os.getenv("VSCMD_ARG_TGT_ARCH"))
  end
  terralib.includepath = os.getenv("INCLUDE")

  function terralib.getvclinker()
    local vclib = os.getenv("LIB")
    local vcpath = terralib.vcpath or os.getenv("Path")
    vclib,vcpath = "LIB="..vclib,"Path="..vcpath
    return terralib.vclinker,vclib,vcpath
  end
end

Yes, we are literally overwriting parts of the compiler itself, at runtime, from our script. Welcome to Lua! Enjoy your stay, and don’t let the fact that any script you run could completely rewrite the compiler keep you up at night!

The Existential Horror of Terra Symbols

Symbols are one of the most slippery concepts introduced in Terra, despite their relative simplicity. When encountering a Terra Symbol, one usually finds it in a function that looks like this:

TkImpl.generate = function(skip, finish) return quote
    if [TkImpl.selfsym].count == 0 then goto [finish] end 
    [TkImpl.selfsym].count = [TkImpl.selfsym].count - 1
    [stype.generate(skip, finish)]
end end

Where selfsym is a symbol that was set elsewhere.

“Aha!” says our observant student, “a reference to a variable from an outside context!” This construct does let you access a variable from another area of the same function, and using it to accomplish that will generally work as you expect, but what it’s actually doing is much worse more subtle. You see, grasshopper, a symbol is not a reference to a variable node in the AST, it is a reference to an identifier.

local sym = symbol(int)
local inc = quote [sym] = [sym] + 1 end

terra foo()
  var [sym] = 0
  inc
  inc
  return [sym]
end

terra bar()
  var[sym] = 0
  inc
  inc
  inc
  return [sym]
end

Yes, that is valid Terra, and yes, the people who built this language did this on purpose. Why any human being still capable of love would ever design such a catastrophe is simply beyond me. Each symbol literally represents not a reference to a variable, but a unique variable name that will refer to any variable that has been initialized in the current Terra scope with that particular identifier. You aren’t passing around variable references, you’re passing around variable names.

These aren’t just symbols, they’re typed preprocessor macros. They are literally C preprocessor macros, capable of causing just as much woe and suffering as one, except that they are typed and they can’t redefine existing terms. This is, admittedly, slightly better than a normal C macro. However, seeing as there have been entire books written about humanity’s collective hatred of C macros, this is equivalent to being a slightly more usable programming language than Brainfuck. This is such a low bar it’s probably buried somewhere in the Mariana Trench.

Terra is C but the Preprocessor is Lua

You realize now, the monstrosity we have unleashed upon the world? The sin Terra has committed now lies naked before us.

Terra is C if you replaced the preprocessor with Lua.

Remember how Terra says you can implement Java-like and Go-like class systems? You can’t. Or rather, you will end up with a pathetic imitation, a facsimile of a real class system, striped down to the bone and bereft of any useful mechanisms. It is nothing more than an implementation of vtables, just like you would make in C. Because Terra is C. It’s metaprogrammable C.

There can be no constructors, or destructors, or automatic initialization, or any sort of borrow checking analysis, because Terra has no scoping mechanisms. The only thing it provides is defer, which only operates inside Lua lexical blocks (do and end)… sometimes, if you get lucky. The exact behavior is a bit confusing, and of course can only be divined by random experimentation because it isn’t documented anywhere! Terra’s only saving grace, the singular keyword that allows you to attempt to build some sort of pretend object system, isn’t actually mentioned anywhere.

Of course, Terra’s metaprogramming is turing complete, and it is technically possible to implement some of these mechanisms, but only if you either wrap absolutely every single variable declaration in a function, or you introspect the AST and annotate every single variable with initialization statuses and then run a metaprogram over it to figure out when constructors or destructors or assignment operators need to be called. Except, this might not work, because the (undocumented, of course) __update metamethod that is supposed to trigger when you assign something to a variable has a bug where it’s not always called in all situations. This turns catching assignments and finding the l-value or r-value status from a mind-bogglingly difficult, herculean task, to a near-impossible trial of cosmic proportions that probably requires the help of at least two Avengers.

There Is No Type System

If Terra was actually trying to build a metaprogramming equivalent to templates, it would have an actual type system. These languages already exist - Idris, Omega, F*, Ada, Sage, etc. but none of them are interested in using their dependent type systems to actually metaprogram low-level code (although F* can produce it). The problem is that building a recursively metaprogrammable type system requires building a proof assistant, and everyone is so proud of the fact they built a proof assistant they forget that dependent type systems can do other things too, like build really fast memcpy implementations.

Terra, on the other hand, provides only the briefest glimpse of a type system. Terra functions enjoy what is essentially a slightly more complex C type system. However, the higher-level Lua context is, well, Lua, which has five basic types: Tables, Functions, Strings, Booleans and Numbers (it also has Thread, Nil, Userdata and CData for certain edge cases). That’s it. Also, it’s dynamic, not static, so everything is a syntax or a runtime error, because it’s a scripting language. This means all your metaprogramming is sprinkled with type-verification calls like :istype() or :isstruct(), except the top came off the shaker and now the entire program is just sprinkles, everywhere. This is fine for when your metaprograms are, themselves, relatively simple. It is not fine when you are returning meta-programs out of meta-meta-functions.

This is the impasse I find myself at, and it is the answer to the question I know everyone wants to know the answer to. For the love of heaven and earth and all that lies between, why am I still using Terra?

The truth is that the project I’m working on requires highly complex metaprogramming techniques in order to properly generate type-safe mappings for arbitrary data structures. Explaining why would be an entire blog post on it’s own, but suffice to say, it’s a complex user interface library that’s intended to run on tiny embedded devices, which means I can’t simply give up and use Idris, or indeed anything that involves garbage collection.

What I really want is a low-level, recursively metaprogrammable language that is also recursively type-safe, in that any type strata can safely manipulate the code of any layer beneath it, preferably via algebriac subtyping that ensures all types are recursively a subset of types that contain them, ad nauseam. This would then allow you to move from a “low-level” language to a “high-level” language by simply walking up the tower of abstraction, building meta-meta-programs that manipulate meta-programs that generate low-level programs.

Alas, such beauty can only exist in the minds of mathematicians and small kittens. While I may one day attempt to build such a language, it will be nothing more than a poor imitation, forever striving for an ideal it cannot reach, cursed with a vision from the gods of a pristine language no mortal can ever possess.

I wish to forge galaxies, to wield the power of computation and sail the cosmos upon an infinite wave of creativity. Instead, I spend untold hours toiling inside LLVM, wondering why it won’t print “Hello World”.

In conclusion, everything is terrible and the universe is on fire.


RISC Is Fundamentally Unscalable


Today, there was an announcement about a new RISC-V chip, which has got a lot of people excited. I wish I could also be excited, but to me, this is just a reminder that RISC architectures are fundamentally unscalable, and inevitably stop being RISC as soon as they need to be fast. People still call ARM a “RISC” architecture despite ARMv8.3-A adding a FJCVTZS instruction, which is “Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero”. Reduced instruction set, my ass.

The reason this keeps happening is because the laws of physics ensure that no RISC architecture can scale under load. The problem is that a modern CPU is so fast that just accessing the L1 cache takes anywhere from 3-5 cycles. This is part of the reason modern CPUs rely so much on register renaming, allowing them to have hundreds of internal registers that are used to make things go fast, as opposed to the paltry 90 registers actually exposed, 40 of which are just floating point registers for vector operations. The fundamental issue that CPU architects run into is that the speed of light isn’t getting any faster. Even getting an electrical signal from one end of a CPU to the other now takes more than one cycle, which means the physical layout of your CPU now has a significant impact on how fast operations take. Worse, the faster the CPU gets, the more this lag becomes a problem, so unless you shrink the entire CPU or redesign it so your L1 and L2 caches are physically closer to the transistors that need them, the latency from accessing those caches can only go up, not down. The CPU might be getting faster, but the speed of light isn’t.

Now, obviously RISC CPUs are very complicated architectures that do all sorts of insane pipelining to try and execute as many instructions at the same time as possible. This is necessary because, unless your data is already loaded into registers, you might spend more cycles loading data from the L1 cache than doing the actual operation! If you hit the L2 cache, that will cost you 13-20 cycles by itself, and L3 cache hits are 60-100 cycles. This is made worse by the fact that complex floating-point operations can almost always be performed faster by encoding the operation in hardware, often in just one or two cycles, when manually implementing the same operation would’ve taken 8 or more cycles. The FJCVTZS instruction mentioned above even sets a specific flag based on certain edge-cases to allow an immediate jump instruction to be done afterwards, again to minimize hitting the cache.

All of this leads us to single instruction multiple data (SIMD) vector instructions common to almost all modern CPUs. Instead of doing a complex operation on a single float, they do a simple operation to many floats at once. The CPU can perform operations on 4, 8, or even 16 floating point numbers at the same time, in just 3 or 4 cycles, even though doing this for an individual float would have cost 2 or 3 cycles each. Even loading an array of floats into a large register will be faster than loading each float individually. There is no escaping the fact that attempting to run instructions one by one, even with fancy pipelining, will usually result in a CPU that’s simply not doing anything most of the time. In order to make things go fast, you have to do things in bulk. This means having instructions that do as many things as possible, which is the exact opposite of how RISC works.

Now, this does not mean CISC is the future. We already invented a solution to this problem, which is VLIW - Very Large Instruction Word. This is what Itanium was, because researchers at HP anticipated this problem 30 years ago and teamed up with Intel to create what eventually became Itanium. In Itanium, or any VLIW architecture, you can tell the CPU to do many things at once. This means that, instead of having to build massive vector processing instructions or other complex specialized instructions, you can build your own mega-instructions out of a much simpler instruction set. This is great, because it simplifies the CPU design enormously while sidestepping the pipelining issues of RISC. The problem is that this is really fucking hard to compile, and that’s what Intel screwed up. Intel assumed that compilers in 2001 could extract the instruction-level parallelism necessary to make VLIW work, but in reality we’ve only very recently figured out how to reliably do that. 20 years ago, we weren’t even close, so nobody could compile fast code for Itanium, and now Itanium is dead, even though it was specifically designed to solve our current predicament.

With that said, the MILL instruction set uses VLIW along with several other innovations designed to compensate for a lot of the problems discussed here, like having deferred load instructions to account for the lag time between requesting a piece of data and actually being able to use it (which, incidentally, also makes MILL immune to Spectre because it doesn’t need to speculate). Sadly, MILL is currently still vaporware, having not materialized any actual hardware despite it’s promising performance gains. One reason for this might be that any VLIW architecture has a highly unique instruction set. We’re used to x86, which is so high-level it has almost nothing to do with the underlying CPU implementation. This is nice, because everyone implements the same instruction set and your programs all work on it, but it means the way instructions interact is hard to predict, much to the frustration of compiler optimizers. With VLIW, you would very likely have to recompile your program for every single unique CPU, which is a problem MILL has spent quite a bit of time on.

MILL, and perhaps VLIW in general, may have a saving grace with WebAssembly, precisely because it is a low-level assembly language that can be efficiently compiled to any architecture. It wouldn’t be a problem to have unique instruction sets for every single type of CPU, because if you ship WebAssembly, you can simply compile the program for whatever CPU it happens to be running on. A lot of people miss this benefit of WebAssembly, even though I think it will be critical in allowing VLIW instruction sets to eventually proliferate. Perhaps MILL will see the light of day after all, or maybe someone else can come up with a VLIW version of RISC-V that’s open-source. Either way, we need to stop pretending that pipelining RISC is going to work. It hasn’t ever worked and it’s not going to work, it’ll just turn into another CISC with a javascript floating point conversion instruction.

Every. Single. Time.


Software Engineering Is Bad, But That's Not Why


I’ve been writing code for over 12 years, and for a while I’ve been disgusted by the sorry state of programming. That’s why I felt a deep kinship with Nikita Prokopov’s article, Software Disenchantment. It captures the intense feeling of frustration I have with the software industry as a whole and the inability of modern programmers to write anything even remotely resembling efficient systems.

Unfortunately, much of it is wrong.

While I wholeheartedly agree with what Nikita is saying, I am afraid that he doesn’t seem to understand the details behind modern computers. This is unfortunate, because a misinformed article like this weakens both our positions and makes it more difficult to convince people that software bloat is a problem. Most of the time, his frustrations are valid, but his reasoning is misdirected. So, in this article, I’m going to write counterpoints to some of the more problematic claims.

1. Smooth Scroll

One of my hobbies is game development, and I can assure you that doing anything at 4K resolution at 60 FPS on a laptop is insanely hard. Most games struggle to render at 4K 60 FPS with powerful GPUs, and 2D games are usually graphically simplistic, in that they can have lots of fancy drawings, but drawing 200 fancy images on the screen with hardly any blending is not very difficult. A video is just a single 4K image rendered 60 times a second (plus interpolation), which is trivial. Web renderers can’t do that, because HTML has extremely specific composition rules that will break a naïve graphics pipeline. There is also another crucial difference between a 4K video, a video game, and a webpage: text.

High quality anti-aliased and sometimes sub-pixel hinted text at 10 different sizes on different colored backgrounds blended with different transparencies is just really goddamn hard to render. Games don’t do any of that. They have simple UIs with one or two fonts that are often either pre-rendered or use signed distance fields to approximate them. A web browser is rendering arbitrary unicode text that could include emojis and god knows what else. Sometimes it’s even doing transformation operations in realtime on an SVG vector image. This is hard, and getting it to run on a GPU is even harder. One of the most impressive pieces of modern web technology is Firefox’s WebRender, which actually pushes the entire composition pipeline to the GPU, allowing it to serve most webpages at a smooth 60 FPS. This is basically as fast as you can possibly get, which is why this is a particularly strange complaint.

I think the real issue here, and perhaps what Nikita was getting at, is that the design of modern webpages is so bloated that the web browsers can’t keep up. They’re getting inundated with <div> trees the size of Mount Everest and 10 megs worth of useless javascript bootstrapping ad campaigns that load entire miniature videos. However, none of this has anything to do with the resolution or refresh rate of your screen. Useless crap is going to be slow no matter what your GPU is. Inbox taking 13 seconds to load anything is completely unacceptable, but animating anything other than a white box in HTML is far more expensive than you think, and totally unrelated.

2. Latency

Latency is one of the least understood values in computer science. It is true that many text editors have abysmal response times caused by terrible code, but it’s a lot easier to screw this up than people realize. While CPUs have gotten faster, the latency between hardware components hasn’t improved at all, and in many cases cannot possibly improve. This is because latency is dominated by physical separation and connective material. The speed of light hasn’t changed in the past 48 years, so why would the minimum latency?

The problem is that you can’t put anything on top of a system without increasing its latency, and you can’t decrease the latency unless you bypass a system. That 42-year-old emacs system was basically operating at its theoretical maximum because there was barely anything between the terminal and the keyboard. It is simply physically impossible to make that system more responsive, no matter how fast the CPU gets. Saying it’s surprising that a modern text editor is somehow slower than a system operating at the minimum possible latency makes absolutely no sense, because the more things you put between the keyboard and the screen, the higher your latency will be. This has literally nothing to do with how fast your GPU is. Nothing.

It’s actually much worse, because old computers didn’t have to worry about silly things like composition. They’d do v-sync themselves, manually, drawing the cursor or text in between vertical blanks of the monitor. Modern graphics draw to a separate buffer, which is then flipped to the screen on it’s next refresh. The consequence, however, is that drawing a new frame right after a vertical blank ignores all the input you got that frame! You can only start drawing after you’ve processed all the user input, so once you start, it’s game over. This means that if a vertical blank happens every 16.6 ms, and you start drawing at the beginning of that frame, you have to wait 16.6 ms to process the user input, then draw the next frame and wait another 16.6 ms for the new buffer to get flipped to the screen!

That’s 33ms of latency right there, and that’s if you don’t screw anything up. A single badly handled async call could easily introduce another frame of lag. As modern hardware connections get more complex, they introduce more latency. Wireless systems introduce even more latency. Hardware abstraction layers, badly written drivers, and even the motherboard BIOS can all negatively impact the latency and we haven’t even gotten to the application yet. Again, the only way to lower latency is to bypass layers that add latency. At best, perfectly written software would add negligible latency and approach the latency of your emacs terminal, but could never surpass it (unless we start using graphene).

We should all be pushing for low-latency systems, but electron apps are simply the worst offender. This is something everyone, from the hardware to the OS to the libraries, has to cooperate on if we want responsive computers.

3. Features

It seems silly to argue that computers today have no new features. Of course they have new features. A lot of the features are ones I don’t use, but they do get new features and occasionally they are actually nice. I think the real problem here that each new feature, for some inexplicable reason, requires exponentially more resources than the feature before it, often for no apparent reason. Other times, basic features that are trivial to implement are left out, also for no apparent reason.

For example, Discord still doesn’t know how to de-duplicate resent client messages over a spotty connection despite this being a solved problem for decades, and if a message is deleted too quickly, the deletion packet is received before the message itself, and the client just… never deletes it. This could be trivially solved with a tombstone or even just a temporary queue of unmatched deletion messages, yet the client instead creates a ghost message that you can’t get rid of until you restart the client. There is absolutely no reason for this feature to not exist. It’s not even bloat, it’s just ridiculous.

However, a few other comparisons in here really don’t make any sense. For example, an installation of Windows 10 is only 4 GB because of extreme amounts of compression, yet the article compares this with a 6 GB uncompressed basic install of android. Windows 10 is actually 16 GB once it’s actually installed (20 GB for 64-bit). While software bloat is a very real problem, these kinds of comparisons are just nonsense.

4. Compilers

Now, this one I really don’t understand. Any language other than C++ or Rust basically compiles instantly until you hit 100k lines of code. At work, we have a C# monstrosity that’s half a million lines of code and compiles in 20 seconds. That’s pretty fast. Most other languages are JIT-compiled, so you can just run them instantly. Even then, you don’t really want to optimize for compile time on Release mode unless you’re just removing unnecessary bloat, and many modern compilers take a long time to compile things because they’re doing ridiculously complex optimizations that may require solving NP-hard optimization problems, which some actually do.

The original C language and Jonathon Blow’s language compile really fast because they don’t do anything. They don’t help you, they have an incredibly basic type system, they don’t do advanced memory analysis or a bunch of absurd optimizations to take advantage of the labyrinthine x86-64 instruction set. Languages in the 1990s compiled instantly because they had no features. Heck, sometimes compilation is actually disk bound, which is why getting an SSD can dramatically improve compile times for large projects. This has nothing to do with the compiler!

My question here is what on earth are you compiling that takes hours to compile?! The only projects I’m aware of that take this long are all C++ projects, and it’s always because of header files, because header files are terrible and thankfully no other language ever made that mistake ever again. I am admittedly disappointed in Rust’s compilation times, but most other languages try to ensure that at least debug compilation is ridiculously fast.

I think the complaints here are mostly due to bloated javascript ecosystems that pile NPM modules on top of each other until even a simple linter takes forever to run, or when coders write their entire program in a completely different language that transpiles to javascript and then minify the javascript and that’s if you don’t put it through Babel to polyfill back to earlier versions of javascript and… this sure seems like a javascript problem to me, not a general issue with modern compilers.

Hopefully, webassembly will eliminate this problem, at least on the web. As for everywhere else, unless you’re using a complex systems programming language, compilation times usually aren’t that bad, and even when they are, they exist for a reason.

Why would you complain about memory usage and then in the next breath complain about compilation times? Language features are not free. Rust’s lifetime analysis is incredibly difficult to do, but frees the programmer from worrying about memory errors without falling back to a stop-the-world garbage collector, which is important when garbage collection uses 2-3 times more memory than it actually needs (depending on how much performance you want).

Efficient code, fast compilation and memory safety. Pick two. If you feel that compilers can simply magically improve everything, you’ve probably been living in javascript land for too long, where everything is horrible all the time for no reason. The rest of computing is not actually that horrible. The real problem is that most things are now being written in javascript, so everyone inherits all of javascript’s problems. If you don’t want javascript’s problems, stop using it.

Conclusion

I care very deeply about the quality of code that our industry is putting out, and I love the Better World Manifesto that Nikita has proposed. However, it is painful for me to read an article criticizing bad engineering that gets the technical details wrong. A good engineer makes sure he understands what he’s doing, that’s the whole point of the article! If we truly want to be better engineers, we need to understand the problems we face and what we can do to fix them. We need to understand how our computers work and fundamental algorithmic trade-offs we make when we compare problem-solving approaches.

Years ago, I wrote my own article on this, and I feel it is still relevant today. I asked if anyone actually wants good software. At least now, I know some people do. Maybe together, we can do something about it.


Avatar

Archive

  1. 2024
  2. 2023
  3. 2022
  4. 2021
  5. 2020
  6. 2019
  7. 2018
  8. 2017
  9. 2016
  10. 2015
  11. 2014
  12. 2013
  13. 2012
  14. 2011
  15. 2010
  16. 2009