Contents |
|
Some guidelines for data
organization are:
1. Proper data alignment. A cache miss causes a cache line of memory (32-bytes) to be fetched into an assigned cache line. If your data is spread over multiple cache lines (and didn't have to be), then you'll be causing unnecessary cache misses. Proper alignment is also necessary when loading into registers to avoid misalignment penalties and also make use of fast aligned loads.
2.Vertical parallelism data organization. The vertical
versus horizontal parallelism model best describes the way in which SIMD
extensions (e.g. MMX Technology, Streaming SIMD Extensions, 3DNOW!)
exploit parallelism within the algorithm. Either the computation itself
can be done in parallel steps ("horizontal") or the algorithm can be done
serially, but with multiple data elements ("vertical"). Because data
dependencies in an algorithm limit the amount in which a computation can
be done in parallel, it's usually more efficient to process the algorithm
serially, while operating on many pieces of data in parallel. Below is an
animated figure of the vertical process. Note that the vector X is
composed of 4 separate data points each doing the same serial operation...
but all four are operated on in parallel producing 4 results in parallel.
The data dependency chain of the calculation doesn't limit the parallelism
as it would in horizontal parallelism.
|
The Vertical
Process |
Another way of referring to this type of data representation
is SOA (Structure Of Arrays) for vertical parallelism and AOS (Array Of
Structures) for horizontal parallelism. While this has often been written
about with respect to geometry, it also applies to just about any type of
operation (e.g. physical deformation, procedural texturing, and so
on).
One other practical advantage of using vertical parallelism is the ease with which it is possible to turn serial code into parallel code. If you look at the simple transform code, you'll see that the code looks very similar to the code for the vertical SIMD code that operates on 4 vertices simultaneously. For more information about this, I'll refer you to an earlier Gamasutra paper on this topic from one of the members of my team.
3.Proper use of prefetches and streaming stores. This goes back to efficient data movement. All the parallel processing in the world of data inside the CPU won't do any good if you can move the data in and out fast enough. I'll refer you here to a previous published paper in Gamasutra by a member of my team on the subject of efficient data movement through prefetches and streaming stores.
4.Avoid dirty writebacks. When you write out data to cacheable memory, the cache line to which you write becomes "dirty" and if you write all over the place in memory, then writebacks of these dirty lines to main memory are required. Try to keep your updates to memory in as compact a manner as possible to avoid unnecessary writebacks.
The above is not an exhaustive list. However, I do hope that it will provoke performance programmers into examination of these and other issues. In future articles, we will drill down even further into these and other topics.
Conclusion
Obviously I wouldn't tell you or your artists to just add content where it doesn't make sense. That certainly isn't my desire or intention. But I hope you got some information from this article about the effects of primitive size and data organization on 3D performance. This should provide you with some guidelines in how to better estimate the platform's performance given that your content budget process must take this information into account.
It's also my intention to set a tone for optimizations in this article series and provide some insight into data organization and its affect on performance. I hope I've been successful at this and I look forward to your comments.
Haim has a Ph.D. in Electrical Engineering (1987) from the University of Southern California. His areas of concentration are in 3D graphics, video and image processing. Haim was on the Electrical Engineering faculty at Tulane University before joining Intel in 1995. Haim is a staff engineer and currently leads the Media Team at Intel's Israel Design Center (IDC) in Haifa, Israel.