Contents

Optimizations Corner

A Wasted Opportunity

Guidelines for
Data Organization


A Wasted Opportunity

What do I mean by a wasted opportunity? Let's take the transform only case as an example. All my primitives of size 32 could have been of size 64 with very little performance degradation. Why? Because the throughput of the engine is about 54% faster with primitives of size 64 than with size 32. In other words, it would cost me a 30% increase in transformation time for effectively doubling the content. Consider that the percentage of the entire application that is the transformation is often 20-30% and you can see that an almost negligible decrease in performance (maybe 6-10% in frame rate) awaits you for doubling the content by increasing the size of the primitives.

In one case study of a real game coming out this winter holiday season (sorry, I'm not at liberty to say which one), we performed an experiment that effectively doubled the content (that's 2X the number of vertices per primitive for the entire scene... we could have been more selective and only doubled the interesting content). We noticed a drop in frame rate of about 1.2 frames per second (only 5% in this case). That's not much of a price to pay for much greater content.

Their artists could have had twice the triangle budget... and we haven't even addressed other types of optimization yet!

Why is an Engine Sensitive to Primitive Size?

Any engine requires a certain amount of overhead to process a primitive. Many of the auxiliary data structures (e.g. transform matrices, light structures, etc.) have to be fetched from memory, thus accounting for some of this overhead. Also, new generation games are using SIMD capabilities of newer CPUs (e.g. Streaming SIMD Extensions). These engines restructure the game's data into a SIMD friendly format that promotes efficient fetching and processing of the data.

Specifically regarding floating point SIMD extensions, a certain amount of overhead is associated with SIMD style processing (such as matrix expansion for 4-wide matrix vector calculations if done in Structure of Arrays style). As you can see, this overhead can overwhelm the actual vertex processing if the primitive size is too small. In fact, the Direct3D Processor Specific Graphics Pipeline for the Pentium III (starting with DX6.1) processor uses a path with scalar Streaming SIMD Extensions instructions whenever the primitive is too small. Relative overhead in primitive processing becomes smaller as the size of the primitive increases, thus performance increases to a point of diminishing returns.

You can also see from the figure that vertex lighting makes the engine less sensitive to primitive size. The decrease in sensitivity is because the cost of the vertex lighting itself tends to reduce the relative cost of the overhead talked about previously. Pixel based lighting (via texturing) doesn't affect this sensitivity.

I spoke earlier about engine's restructuring their data in a SIMD friendly format. We'll let's now take a closer look at data organizations, including the effects of SIMD capabilities.

So... What About Data Organization?

There are other issues regarding your data that do affect performance. Consider the fact that most multimedia processing (of which 3D graphics is a part) can be characterized by streaming lots of data into the CPU, operating on it in the CPU and then streaming some results out of the CPU.

Most multimedia processing can be characterized by streaming lots of data into the CPU, operating on it in the CPU and then streaming some results out of the CPU.

When you consider the above, you'll find that a lot of potential performance is wasted on inefficient data movement. The data must be organized so that it can be efficiently moved in and out of the CPU. Caches go a long way to help avoid data reloading. However, prefetching into the proper cache level should be done in order to hide cache latencies. (i.e. By loading the data before we really need it, it appears to the CPU that the data is already there, ready and waiting to be operated on).

Some Guidelines for Data Optimization