The Current State of nVidia SLI – SLI Part 3



To finish up our review of the Asustek nForce4-SLI hardware, we are going to explore what exactly the nVidia SLI technology offers you right now, the factors impacting SLI performance and then ask some questions to help those stuck in the rut of whether or not they should buy into nVidia SLI.

Make sure you check out Part 1 and 2 of the SLI article suite in which we reviewed the motherboard and graphics cards used in today’s article.

Birth of Multi-GPU Processing
If you don’t know already, nVidia SLI technology provides the ability to use multiple graphics cards to optimize performance in 3D applications by spreading the workload across several Graphics Processors (GPU’s). Although this technology, no matter how much the PR machine throws flashy ads and stellar performance numbers at you, it is not a new idea.

The first time that the computer enthusiasts industry heard of “SLI” was from a now gone graphics hardware company named 3Dfx. When they launched their Voodoo2 chip in 1999 the also brought a method of linking two identical Voodoo2 graphics cards together on a shared PCI bus in order to boost overall performance. They called this technology “SLI”, or “Scan Line Interleaving”. Like the name suggests, Scan Line Interleaving would take even screen lines and let them be rendered by one card, and take the odd lines and let them be rendered by the other card. This allowed for a much lighter load on each GPU and overall better performance. The technology rocked the industry.

Noticing the impact of multi-GPU performance gains several other companies tried their hands at developing their own SLI-esque technologies. One such company was Metabyte (Wicked 3D) and their “PGC”, or “Parallel Graphics Configuration”. PGC again used 3Dfx’s Voodoo2 cards or in some later cases an AGP card and a PCI card, but with a different method of splitting up the work. With PGC, one card renders the top half of the screen and the other card renders the bottom half. Again we see a large increase in performance as the load decreases at each GPU.

Although both of these technologies, 3Dfx-SLI and Metabyte-PGC, had pitfalls. Their biggest issue is not in the hardware but in the way they split up the work:

With 3Dfx-SLI, regardless of the fact that each card is only rendering half the number of lines on each frame, each card needs to know what is on each frame before they can render their lines. If a triangle edge has endpoints on an odd and even line, and that card is rendering even lines, it needs to know where the other card is putting the other endpoint before it can draw the triangle edge. This means both cards need the same data, they also need to do some calculation to know what is on some of the other lines or image quality suffers.

With Metabyte-PGC, it doesn’t take into account if the top or the bottom of the screen has more complex renderings. Therefore one card could be heavily loaded while the other is nearly idle. So the one card needs to wait for the other before it can move on to the next frame. In addition, when using PGC in the case of one card AGP and the other PCI, there can be significant image quality issues. Different cards render differently than others (Example: nVidia vs. ATI image quality), so the top half of the screen can look different than the bottom, and the screen tearing apparent from lines being improperly aligned.

Taking a different route, ATI introduced the Rage Fury MAXX graphics card in late 1999. This was a single card with two GPU’s that shared 128MB of memory (basically two 64MB cards on one PCB). ATI also implemented a brand new way to distribute the workload between GPU’s; they called it “AFR”, or “Alternate Frame Rendering”. With AFR one GPU would render the first frame, and the other the next frame. They would then continually alternate frames. This all-in-one hardware coupled with the AFR workload method provided the best alternative to date for multi-GPU 3D graphics. There were no image quality issues as each frame is uniform (rendered completely on one GPU, and any discrepancies are unnoticed by the eye as long as frame rates are high enough), and the workload from one frame to the next reasonably balanced.

Over time however, product competition, much better GPU’s, or tedious driver developments led to the demise of all of these products. Until now…

Part 3 – nVidia SLI
With the latest implementation of PCI Express on modern chipsets multi-GPU hardware has been brought to the forefront again. PCI Express offers an immense, scalable bandwidth. It runs on a full duplex (2way) operation instead of AGP’s half duplex (1way) meaning commands are given on the rise and fall of each clock. The inherent abilities of PCI Express make it ideal for high bandwidth accessory cards, such as graphics adapters.

nVidia has taken advantage of the PCI Express interface and introduced its own multi-GPU system. Using the term “SLI” which was acquired from their buyout of 3Dfx (late 2000), they have re-coined it to mean “Scalable Link Interface”. They claim massive performance increases, “…up to 1.9x” the performance using a single graphics card.To effectively use nVidia SLI you need three things: two “SLI-ready” nVidia graphics cards, an nVidia nForce4-SLI chipset based motherboard, and one of the latest nVidia Forceware drivers supporting SLI. Now first let’s set the record strait:

  • “SLI” is not an industry term for multiple graphics cards working together. It is a name for nVidia’s proprietary multi-GPU technology. Other companies will likely use other names for their methods.
  • nVidia’s multi-GPU system is not a rehashing of any single multi-GPU system (Example, re-using 3Dfx’s Scan Line Interleaving). It is more of a hybrid of several of the methods used in years past…

… and this is what we mean.

Most likely nVidia looked backwards to see what worked and what didn’t with previous multi-GPU systems. What they came up with (nVidia SLI) is a system that uses several different methods to split the work between the separate graphics cards depending on the situation; the situation being the game that you are running and the brain behind this selection being the nVidia graphics drivers. There are three methods of screen rendering in nVidia’s SLI solution:

  • Alternate Frame Rendering (AFR) – Yes, a blast from the past; essentially exactly what ATI developed for the Rage Fury MAXX. Every other frame is rendered by one card, and the others are rendered by the other card. AFR is patent pending technology, so it is to be expected that there was a business agreement met between nVidia and ATI to avoid any intellectual property issues. This method provides the highest performance boost.
  • Split Frame Rendering (SFR) – This method is very similar to Metabyte’s PGC technique, with a twist. The screen is initially split at the middle; one card renders the top 50%, the other the bottom 50%. As the content on the screen changes, algorithms determine the complexity of certain parts of the screen and change the % distribution so that both cards take roughly the same time to render their section. In addition, since the cards are identical you avoid differing image qualities and screen tearing if they operate properly.
  • Compatibility Mode – In this mode only one card is used and the other is completely idle. This is the same thing as running without SLI at all.
  • Now since the drivers are selecting which mode to operate in, they need already know about the game you are running before you run it. So built into the drivers are a list of settings for games known as “Game Profiles”. There are many game profiles built into the current drivers but only some of the games effectively use AFR or SFR rendering mode, more on that later.