Contents

The Pentium III Processor and SIMD Extensions

Motion Compensation Using Streaming SIMD Extensions

Special Memory Instructions

Special Memory Instructions

Since multimedia applications have memory constraints and the calculation units are fast, cache utilization is important. For that reason the Pentium III processor utilizes two major advanced memory instructions, which help improve the cache locality.

prefetch. The first instruction is the 'prefetch' instruction, which moves a certain block in memory closer to the processor. The 'prefetch' instruction provides hints to the processor as to where to move the specified data. There are four different locality hints for different purposes.

One hint is the 'prefetchnta' hint, which uses non-temporal data (data that is not anticipated to be accessed again in the near future). This operation loads the data into L1 and does not pollute the L2 cache level.

When data is needed for more than one usage, or when there is a need to read, modify and then write back to the same place in memory, you can use 'prefetcht0', 'prefetcht1', or 'prefetcht2' hints, where each hint tells the processor which level of cache should be updated in response to the 'prefetch' instruction.

For example, While doing the last operation on a current frame, you can use 'prefetch' for moving the next frame closer to the processor. Yet since you are in the middle of using the current frame, you would like to move the next frame as close as you can, while not overwriting the memory needed for the current operations. Therefore, we could use 'prefetcht1' to get the next frame, as this fetches data only to L2. Consequently, the current data in the low cache levels is not polluted and the next frame is closer for future calculation.

movntq. The second instruction is the streaming store operation - 'movntq'.

This is actually a store operation just like the 'movq' instruction, which does not pollute the cache hierarchy. That is, if the store hits L1 then it is a regular 'movq', otherwise it goes directly to memory through the line fill-buffers. This is useful when you want to write data that will probably not be in L1 the next time it is accessed, so there is no need to bring this data to the cache. In contrast, using a regular store brings data to the cache and consequently pollutes the caches.

These two operations have been proven to be very efficient in multimedia applications, where memory is one of the major bottlenecks. Understanding the data flow in the application and using suitable memory layout of the data structure, while maintaining the basic principals of 'movntq' & 'prefetch' operations, removes the dependency of the application on the memory sub-system.

An Important Constraint

There is an important constraint that should be taken into account when trying to use 'prefetch' and 'movntq' operations. Misunderstanding this issue will result in losing performance in the application level.

The interface between memory (main memory or L2) and the cache hierarchy is via the line fill-buffers (32-bytes size). These line buffers are allocated whenever:

If there is a new request (one of the 3 above) and all line fill-buffers are occupied, one of the following two options are possible.

To avoid overloading this resource you should avoid the following:

Conclusion

The Pentium III processor with Streaming SIMD Extensions boosts the performance of floating-point as well as integer based applications, thanks in part to new advanced memory operations. This advanced technology enables the high resolution MPEG encoder, based on software only, to work in real-time with high quality results. Other multimedia and communication applications can easily improve their performance using this technology.

Asi Elbaz has a B.Sc. (1998) in Electrical Engineering from The Technion - Israel Institute of Technology. His areas of concentration are in image and signal processing. Formerly he co-developed the first real time MPEG-2 Encoder for the Pentium III processor. Asi currently works in the Networking group at Intel's Israel Design Center (IDC) in Haifa, Israel. He can be contacted at mailto:%20asi.elbaz@intel.com.