Erik McClure

How To Make Your Profiler 10x Faster


Frustrated with C profilers that are either so minimal as to be useless, or giant behemoths that require you to install device drivers, I started writing a lightweight profiler for my utility library. I already had a high precision timer class, so it was just a matter of using a radix trie that didn’t blow up the cache. I was very careful about minimizing the impact the profiler had on the code, even going so far as to check if extended precision floating point calculations were slowing it down.

Of course, since I was writing a profiler, I could use the profiler to profile itself. By pretending to profile a random number added to a cache-murdering int stuck in the middle of an array, I could do a fairly good simulation of profiling a function, while also profiling the act of profiling the function. The difference between the two measurements is how much overhead the profiler has. Unfortunately, my initial results were… unfavorable, to say the least.

BSS Profiler Heat Output: 
[main.cpp:3851] test_PROFILE: 1370173 µs   [##########
  [code]: 545902.7 µs   [##########
  [main.cpp:3866] outer: 5530.022 ns   [....      
    [code]: 3872.883 ns   [...       
    [main.cpp:3868] inner: 1653.139 ns   [.         
  [main.cpp:3856] control: 1661.779 ns   [.         
  [main.cpp:3876] beginend: 1645.466 ns   [.         
The profiler had an overhead of almost 4 microseconds. When you’re dealing with functions that are called thousands of times a second, you need to be aware of code speed on the scale of nanoseconds, and this profiler would completely ruin the code. At first, I thought it was my fault, but none of my tweaks seemed to have any measureable effect on the speed whatsoever. On a whim, I decided to comment out the actual _querytime function that was calling QueryPerformanceCounter, then run an external profiler on it.
Average control: 35 ns
What?! Well no wonder my tweaks weren’t doing anything, all my code was taking a scant 35 nanoseconds to run. The other 99.9% of the time was spent on that single, stupid call, which also happened to be the one call I couldn’t get rid of. However, that isn’t the end of the story; _querytime() looks like this:
void cHighPrecisionTimer::_querytime(unsigned __int64* _pval)
{
  DWORD procmask=_getaffinity(); 
  HANDLE curthread = GetCurrentThread();
  SetThreadAffinityMask(curthread, 1);
  
  QueryPerformanceCounter((LARGE_INTEGER*)_pval);
  
  SetThreadAffinityMask(curthread, procmask);
}

Years ago, it was standard practice to wrap all calls to QueryPerformanceCounter in a CPU core mask to force it to operate on a single core due to potential glitches in the BIOS messing up your calculations. Microsoft itself had recommended it, and you could find this same code in almost any open-source library that was taking measurements. It turns out that this is no longer necessary:

**Do I need to set the thread affinity to a single core to use QPC?**

No. For more info, see Guidance for acquiring time stamps. This scenario is neither necessary nor desirable.

I couldn’t get rid of the QueryPerformanceCounter call itself, but I could get rid of all that other crap it was doing. I commented it out, and voilà! The overhead had been reduced to a scant 340 nanoseconds, only a tenth of what it had been before. I’m still spending 90% of my calculation time calling that stupid function, but there isn’t much I can do about that. Either way, it was a good reminder about the entire reason for using a profiler - bottlenecks tend to crop up in the most unexpected places.

BSS Profiler Heat Output: 
[main.cpp:3851] test_PROFILE: 142416 µs   [##########
  [code]: 56575.4 µs   [##########
  [main.cpp:3866] outer: 515.43 ns   [....      
    [code]: 343.465 ns   [...       
    [main.cpp:3868] inner: 171.965 ns   [.         
  [main.cpp:3876] beginend: 173.025 ns   [.         
  [main.cpp:3856] control: 169.954 ns   [.         

I also tried adding standard deviation measurements, but that ended up giving me ludicrous values of 342±27348 ns, which isn’t very helpful. Apparently there’s quite a lot of variance in function call times, so much so that while the averages always tend to be the same over time, the statistical variance goes through the roof. This is probably why most profilers don’t include the standard deviation. I was able to add in accurate unprofiled code measurements, though, and the profiler uses a dynamic triple magnitude method of displaying how much time a function takes.


The Problem With Photorealism


Many people assume that modern graphics technology is now capable of rendering photorealistic video games. If you define photorealistic as any still frame is indistinguishable from a real photo, then we can get pretty close. Unfortunately, the problem with video games is that they are not still frames - they move.

What people don’t realize is that modern games rely on faking a lot of stuff, and that means they only look photorealistic in a very tight set of circumstances. They rely on you not paying close attention to environmental details so you don’t notice that the grass is actually just painted on to the terrain. They precompute environmental convolution maps and bake ambient occlusion and radiance information into level architecture. You can’t knock down a building in a game unless it is specifically programmed to be breakable and all the necessary preparations are made. Changes in levels are often scripted, with complex physical changes and graphical consequences being largely precomputed and simply triggered at the appropriate time.

Modern photorealism, like the 3D graphics of ages past, is smoke and mirrors, the result of very talented programmers and artists using tricks of the eye to convince you that a level is much more detailed and interactive than it really is. There’s nothing wrong with this, but we’re so good at doing it that people think we’re a heck of a lot closer to photorealistic games then we really are.

If you want to go beyond simple photorealism and build a game that feels real, you have to deal with a lot of extremely difficult problems. Our best antialiasing methods are perceptual, because doing real antialiasing is prohibitively expensive. Global illumination is achieved by deconstructing a level’s polygons into an octree and using the GPU to cubify moving objects in realtime. Many advanced graphical techniques in use today depend on precomputed values and static geometry. The assumption that most of the world is probably going to stay the same is a powerful one, and enables huge amounts of optimization. Unfortunately, as long as we make that assumption, none of it will ever feel truly real.

Trying to build a world that does not take anything for granted rapidly spirals out of control. Where do you draw the line? Does gravity always point down? Does the atmosphere always behave the same way? Is the sun always yellow? What counts as solid ground? What happens when you blow it up? Is the object you’re standing on even a planet? Imagine trying to code an engine that can take into account all of these possibilities in realtime. This is clearly horrendously inefficient, and yet there is no other way to achieve a true dynamic environment. At some point, we will have to make assumptions about what will and will not change, and these sometimes have surprising consequences. A volcanic eruption, for example, drastically changes the atmospheric composition and completely messes up the ambient lighting and radiosity.

Ok, well, at least we have dynamic animations, right? Wrong. Almost all modern games still use precomputed animations. Some fancy technology can occasionally try to interpolate between them, but that’s about it. We have no reliable method of generating animations on the fly that don’t look horrendously awkward and stiff. It turns out that trying to calculate a limb’s shortest path from point A to point B while avoiding awkward positions and obstacles amounts to solving the Euler-Lagrange equation over an n-dimensional manifold! As a result, it’s incredibly difficult to create smooth animations, because our ability to fluidly shift from one animation to another is extremely limited. This is why we still have weird looking walk animations and occasional animation jumping.

The worst problem, however, is that of content creation. The simple fact is that at photorealistic detail levels, it takes way too long for a team of artists to build a believable world. Even if we had super amazing 3D modelers that would allow an artist to craft any small object in a matter of minutes (which we don’t), artists aren’t machines. Things look real because they have a history behind them, a reason for their current state of being. We can make photorealistic CGI for movies because each scene is scripted and has a well-defined scope. If you’re building GTA V, you can’t somehow manage to come up with three hundred unique histories for every single suburban house you’re building.

Even if we did invent a way to render photorealistic graphics, it would all be for naught until we figured out a way to generate obscene amounts of content at incredibly high levels of detail. Older games weren’t just easier to render, they were easier to make. There comes a point where no matter how many artists you hire, you simply can’t build an expansive game world at a photorealistic level of detail in just 3 years.

People always talk about realtime raytracing as the holy grail of graphics programming without realizing just what is required to take advantage of it. Photorealism isn’t just about processing power, it’s about content.


Google's Decline Really Bugs Me


Google is going down the drain.

That isn’t to say they aren’t fantastically successful. They are. I still use their products, mostly because I don’t put things on the internet I don’t want other people to find, and I’m not female, so I don’t have to worry about misogynists stalking me. They still make stupendous amounts of money and pump out some genuinely good software. They still have the best search engine. Like Microsoft, they’ll be a force to be reckoned with for many decades to come.

Google, however, represented an ideal. They founded the company with the motto “Don’t Be Evil”, and the unspoken question was, how long would this last? The answer, oddly enough, was “until Larry Page took over”.

In its early years, Google unleashed the creativity of the brilliant people it hired to the world and came up with a slew of fantastic products that were a joy to use. Google made huge contributions to the open-source world and solved scalability problems with an elegance that has yet to be surpassed. They famously let engineers use 20% of their time to pursue their own interests, and the result was an unstoppable tidal wave of innovation. Google was, for a brief moment, a shining beacon of hope, a force of good in a bleak world of corporations only concerned with maximizing profit.

Then Larry Page became CEO. Larry Page worshiped Steve Jobs, who gave him a bunch of bad advice centered around maximizing profit. The result was predictable and catastrophic, as the entire basis of what had made Google so innovative was destroyed for the sake of maximizing profit. Now it’s just another large company - only concerned about maximizing profit.

Google was a company that, for a time, I loved. To me, they represented the antithesis of Microsoft, a rebellion against a poisonous corporate culture dominated by profiteering that had no regard for its users. Google was just a bunch of really smart people trying to make the world a better place, and for a precious few years, they succeeded - until it all came tumbling down. Like an artist whose idol has become embroiled in a drug abuse scandal, I have lost my guiding light.

Google was largely the reason I wanted to start my own company, even if college kept me from doing so. As startup culture continued to suck the life out of silicon valley, I held on to Google as an ideal, an example of the kind of company I wanted to build instead of a site designed to sort cat photos. A company that made money because it solved real problems better than everyone else. A company that respected good programming practices, using the right tool for the job, and the value of actually solving a problem instead of just throwing more code at it.

Google was a company that solved problems first, and made money second.

Now, it has succumbed to maximizing stock price for a bunch of rich wall street investors who don’t care about anything other than filling their own pockets with as much cash as they possibly can. Once again, the rest of the world is forced to sit around, waiting until an investor accidentally makes the world a better place in the process of trying to make as much money as possible.

Most people think this is the only way to get things done. For a precious few years, I could point to Google and say otherwise. Now, it has collapsed, and its collapse has made me doubt my own resolve. If Google, of all companies, couldn’t maintain that idealistic vision, was it even possible?

Google gave me a reason to believe that humanity could do better. That we could move past a Wall Street that has become nothing more than a rotting cesspool of greed and corruption.

Now, Google has fallen, along with the ideal it encompassed. Is there a light at the end of the tunnel? Or is it a train, a force of reality come to remind us that no matter how much we reach for utopia, we will be sentenced to drown in our own greed?


The Educational Imbroglio


**im·bro·glio** *noun* 1. an extremely confused, complicated, or embarrassing situation.
Across the country, there is a heated debate over our educational system. Unfortunately, it's a lot like watching members of the flat earth society argue about whose theory is right - there is no right answer, because *everyone is wrong*.

The most recent and perhaps egregious example of this is a breathtakingly misguided article by Kevin G. Welner, who is the director of the National Education Policy center, which is absolutely terrifying. He is attempting to discredit the idea of “tracking”, wherein low-performing students are separated from higher achieving students, because obviously you can’t teach kids who “get it” the same way as kids who are struggling. Unfortunately, the entire conclusion rests on a logical fallacy. He says, and I quote:

"When children fall behind academically, we have a choice. We can choose to sort them into less demanding classes where they will fall further behind, or we can choose to include them in classes that maintain high expectations."
This is a [false dichotomy](http://en.wikipedia.org/wiki/False_dilemma), since there are many other choices. We can sort them into a class with the same expectations, but an alternative teaching method. Sort them into a class that actually pays attention to the concepts that are giving them trouble. The idea is to help children who have fallen behind *catch up with their peers*, not throw them in a room and forget about them. Schools that simply lower their expectations of poorly performing students are doing it wrong. Furthermore, trying to argue that something can't work because no one's doing it properly is [another logical fallacy](http://en.wikipedia.org/wiki/Argument_from_ignorance).

There’s also a persistent argument against charter schools, which claims that the money spent on charter schools should instead be used to improve public schools instead. This is laughable, because public schools receive funding based on test scores. So, all the money would be spent improving test scores instead of actually teaching children anything. Charter schools are important because they aren’t bounded by these nonsensical restrictions and thus are free to experiment with alternative teaching styles. Throwing money at our public schools will only shore up a method of teaching that’s fundamentally broken. It’s like trying to fix the plumbing after the roof caved in - it’s completely missing the point.

To make matters worse, there’s also a war on free time. Recess is being cut in favor of increased instruction time, while educators cite minuscule increases in test scores as proof that this “works”. If by “works”, you mean it succeeds in cramming more useless junk into kids heads, then sure, it’s “working”. However, if you want kids to actually learn instead of memorize pointless facts that they will immediately forget, you have to give them time to process concepts. Their brains need rest, not more work. Bodybuilders don’t lift weights as long as they can every single day; they lift weights every other day and only for a few hours or so, because the muscle needs time to recover.

This, however, is an issue with a society that thinks hard work means working yourself to exhaustion. This is incredibly short-sighted and in direct opposition to plenty of evidence that points to rest being a necessary part of a healthy lifestyle. It can be your job, or school, or a hobby, it doesn’t matter. Humans do not function effectively when forced to focus on one thing for hours at a time. The only reason we attempt to do this is because we used to work in factories, but nowadays we have robots. Modern jobs are all about thinking creatively, which cannot be forced. You can’t force yourself to understand a concept. It’s like trying to force a broken leg to heal faster by going for a jog. You must give kids time absorb concepts instead of trying to cram it down their throats. They need to understand what they are learning, not memorize formulas.

Mainstream education doesn’t take this seriously. There are plenty of experiments that have effectively taught children advanced concepts with radically different teaching methods. One guy taught 3rd graders binary. These kids learned english and how to operate a computer without a teacher at all. There are plenty of cases that show just how woefully inadequate our teaching system is, but it seems that we care more about a one-size-fits-all method that can be mass-produced than a method that’s actually effective.

Our educational system is failing our students, and we refuse to even entertain notions that could make a difference. Instead, we just legislate more tests and take away their recess.

Because really, memorizing the date of the Battle of Gettysburg is more important than playing outside and having a childhood.


Write Less Code


*"Everything should be as simple as possible, but not simpler."* - Albert Einstein ([paraphrased](http://quoteinvestigator.com/2011/05/13/einstein-simple/#more-2363))

The burgeoning complexity of software is perhaps one of the most persistent problems plaguing computer science. We have tried many, many ways of managing this complexity: inventing new languages, creating management systems, and enforcing coding styles. They have all proven to be nothing more than stopgaps. With the revelation that the NSA is utilizing the inherent complexity of software systems to sabotage our efforts at securing our systems, this problem has become even more urgent.

It’s easy to solve a problem with a lot of code - it’s hard to solve a problem with a little code. Since most businesses do not understand the advantage of having high quality software, the most popular method of managing complexity is not to reduce how much code is used, but to wrap it up in a self-contained library so we can ignore it, instead. The problem with this approach is that you haven’t gotten rid of the code. It’s still there, it’s just easier to ignore.

Until it breaks, that is. After we have wrapped complex systems into easy-to-use APIs, we then build even more complex systems on top of them, creating networks of interconnected components, each one encapsulating it’s own complex system. Instead of addressing this burgeoning complexity, we simply wrap it into another API and pave over it, much like one would pave over a landfill. When it finally breaks, we have to extract a core sample just to get an idea of what might be going wrong.

One of the greatest contributions functional programming has made is its concept of a pure function with no side-effects. When a function has side effects, it’s another edge on the graph of components where something could go wrong. Unfortunately, this isn’t enough. If you simply replace a large complex system with a bunch of large complex functions that technically have no side-effects, it’s still a large complex system.

Object-oriented programming tries to manage complexity by compartmentalizing things and (in theory) limiting how they interact with each other. This is designed to address spaghetti code that has a bunch of inter-dependent functions that are impossible to effectively debug. The OO paradigm has the right idea: “an object should do exactly one thing, and do it well”, but suffers from over-application. OO programming was designed to manage very large complex objects, not concepts that don’t need Get/Set functions for every class member. Writing these functions just in case you need them in the future is a case of future-proofing, which should almost always be avoided. If there’s an elegant, simple way to express a solution that only works given the constraints you currently have, USE IT. We don’t need any more leaning towers of inheritance.

The number of bugs in a software program is proportional to how much code you write, and that doesn’t include cheap tricks like the ternary operator or importing a bunch of huge libraries. The logic is still there, the complexity is still there, and it will still break in mysterious ways. In order to reduce the complexity in our software, it has to do less stuff. The less code you write, the harder it is for the NSA to sabotage your program without anyone noticing. The less code you write, the fewer points of failure your program will have. The less code you write, the less stuff will break. We cannot hide from complexity any longer; we must actively reduce it. We have to begin excavating the landfill instead of paving over it.


Avatar

Archive

  1. 2025
  2. 2024
  3. 2023
  4. 2022
  5. 2021
  6. 2020
  7. 2019
  8. 2018
  9. 2017
  10. 2016
  11. 2015
  12. 2014
  13. 2013
  14. 2012
  15. 2011
  16. 2010
  17. 2009