The Problem of Vsync
If you were to write directly to the screen when drawing a bouncing circle, you would run into some problems. Because you don't do any buffering, your user might end up with a quarter circle drawn for a frame. This can be solved through Double Buffering, which means you draw the circle on to a backbuffer, then “flip” (or copy) the completed image on to the screen. This means you will only ever send a completely drawn scene to the monitor, but you will still have tearing issues. These are caused by trying to update the monitor outside of its refresh rate, meaning you will have only finished drawing half of your new scene over the old scene in the monitor's video buffer when it updates itself, resulting in half the scanlines on the screen having the new scene and half still having the old scene, which gives the impression of tearing.
This can be solved with Vsync, which only flips the backbuffer right before the screen refreshes, effectively locking your frames per second to the refresh rate (usually 60 Hz or 60 FPS). Unfortunately, Vsync with double buffering is implemented by simply locking up the entire program until the next refresh cycle. In DirectX, this problem is made even worse because the API locks up the program with a 100% CPU polling thread, sucking up an entire CPU core just waiting for the screen to enter a refresh cycle, often for almost 13 milliseconds. So your program sucks up an entire CPU core when 90% of the CPU isn't actually doing anything but waiting around for the monitor.
This waiting introduces another issue - Input lag. By definition any input given during the current frame can only come up when the next frame is displayed. However, if you are using vsync and double buffering, the current frame on the screen was the LAST frame, and the CPU is now twiddling its thumbs until the monitor is ready to display the frame that you have already finished rendering. Because you already rendered the frame, the input now has to wait until the end of the frame being displayed on the screen, at which point the frame that was already rendered is flipped on to the screen and your program finally realizes that the mouse moved. It now renders yet another frame taking into account this movement, but because of Vsync that frame is blocked until the next refresh cycle. This means, if you were to press a key just as a frame was put up on the monitor, you would have two full frames of input lag, which at 60 FPS is 33 ms. I can ping a server 20 miles away with a ping of 21 ms. You might as well be in the next city with that much latency.
There is a solution to this - Triple Buffering. The idea is a standard flip mechanism commonly used in dual-thread lockless synchronization scenarios. With two backbuffers, the application can write to one and once its finished, tell the API and it will mark it for flipping to the front-buffer. Then the application starts drawing on the second, after waiting for any flipping operation to finish, and once its done, marks that for flipping to the front-buffer and starts drawing on the first again. This way, the application can draw 2000 frames a second, but only 60 of those frames actually get flipped on to the monitor using what is essentially a lockless flipping mechanism. Because the application is now effectively rendering 2000 frames per second, there is no more input lag. Problem Solved.
Except not, because DirectX implements Triple Buffering in the most useless manner possible. DirectX just treats the extra buffer as a chain, and rotates through the buffers as necessary. The only advantage this has is that it avoids waiting for the backbuffer copy operation to finish before writing again, which is completely useless in an era where said copy operation would have to be measured in microseconds. Instead, it simply ensures that vsync blocks the program, which doesn't solve the input issue at all.
However, there is a flag,
D3DPRESENT_DONOTWAIT, that forces vsync to simply return an error if the refresh cycle isn't available. This would allow us to implement a hack resembling what triple buffering should be like by simply rolling our own polling loop and re-rendering things in the background on the second backbuffer. Problem solved!
Except not. It turns out the Nvidia and Intel don't bother implementing this flag, forcing Vsync to block no matter what you do, and to make matters worse, this feature doesn't have an entry in D3DCAPS9, meaning the DirectX9 API just assumes that it exists, and there is no way to check if it is supported. Of course, don't complain about this to anyone, because of the 50% of people who asked about this who weren't simply ignored, almost all of them were immediately accused of bad profiling, and that the
Present() function couldn't possibly be blocking with the flag on. I question the wisdom of people who ignore the fact that the code executed its main loop 2000 times with vsync off and 60 times with it on and somehow come to the conclusion that
Present() isn't blocking the code.
Either way, we're kind of screwed now. Absolutely no feature in DirectX actually does what its supposed to do, so there doesn't seem to be a way past this input lag.
There is, however, another option. Clever developers would note that to get around vsync's tendency to eat up CPU cycles like a pig, one could introduce a
Sleep() call. So long as you left enough time to render the frame, you could recover a large portion of the wasted CPU. A reliable way of doing this is figuring out how long the last frame took to render, then subtracting that from the FPS you want to enforce and sleep in the remaining time. By enforcing an FPS of something like 80, you give yourself a bit of breathing room, but end up finishing rendering the frame around the same time it would have been presented anyway.
By timing your updates very carefully, you can execute a
Sleep() call, then update all the inputs, then render the scene. This allows you to cut down the additional lag time by nearly 50% in ideal conditions, almost completely eliminating excess input lag. Unfortunately, if your game is already rendering at or below 100 FPS, it takes you 10 milliseconds to render a frame, allowing you only 2.5 milliseconds of extra time to look for input, which is of limited usefulness. This illustrates why Intel and Nvidia are unlikely to care about
D3DPRESENT_DONOTWAIT - modern games will never render fast enough for substantial input lag reduction.
Remember when implementing the Yield that the amount of time it takes to render the frame should be the time difference between the two render calls, minus the amount of time spent sleeping, minus the amount of time
Present() was blocking.