Video Applications For the Pentium III Processor

Contents

The Pentium III Processor and SIMD Extensions

Motion Compensation Using Streaming SIMD Extensions

Special Memory Instructions

After the great success of Intel's MMX technology, the increasing demand for more complex algorithms based on floating-point calculations drove Intel to define yet another new technology. This time around, it defined a new set of instructions and data types for floating-point based algorithms, such as 3D and advanced signal & image processing algorithms, and extended MMX technology support for integer-based algorithms, all while maintaining compatibility with the existing software designed for the Intel architecture. It also included new memory operations that could accelerate any memory-based algorithm - especially multimedia applications, which typically use large blocks of memory.

Subsequent projects in 3D and video applications have demonstrated that the Pentium III processor is an excellent processor for multimedia applications. One of the most impressive such projects is the high resolution, real-time MPEG2 Encoder. This paper describes how the Pentium III processor and Streaming SIMD Extensions can improve the performance of integer-based applications, using examples from the MPEG encoder application.

Motion Estimation & Motion Compensation

For a better understanding, the following examples introduce two of the most basic operations in video compression techniques applications: Motion Estimation (ME), and Motion Compensation (MC).

ME is performed during encoding. It makes use of the fact that the next frame in a sequence is almost the same as the previous frame. The technique looks for the location of a given block in the previous frame by comparing the block to certain related blocks in the previous frame. The output of this operation for each block is a motion vector.

MC is the opposite operation. Given a certain motion vector and a difference block, MC builds a new block by taking the block, which can be located by the motion vector from the previous frame, and adding it to the difference block.

Streaming SIMD Extensions

The Streaming SIMD Extensions meet the demand for specific, advanced, and yet basic operations for video and communication.

The Streaming SIMD Extensions include the following instructions:

pavgb - SIMD averaging of two absolute byte-sized operands. A crucial operation in MC & ME algorithms

psadb - Absolute subtract and sum of two byte-sized operands. Crucial for block matching algorithms

pmin & pmax - SIMD minimum or maximum of two signed operands.

As the following examples show, these new instructions ease and speed up a lot of the basic kernels in video applications and other integer-based algorithms.

The following example shows the basic loop for MC using MMX technology:

Motion_Comp_Loop:

Movq mm0,// read eight pixels from one block.
Movq mm4,// read eight pixels from second block.
Movq mm5,// next eight pixels.
Movq mm2,mm0
Movq mm3,mm1
Movq mm6,mm4 // No MMX registers left.
// mm7 was initialized to be zero.
Punpcklbw mm0,mm7 // convert the first four pixels
Punpcklbw mm1,mm7 // from byte format to short format.
Punpcklbw mm4,mm7
Punpckhbw mm2,mm7 // convert the second four pixels
Punpckhbw mm3,mm7 // from byte format to short format.
Punpckhbw mm6,mm7

// Calculate the average values.
Paddw mm0,mm1 // after add values are 9 bits.
Paddw mm2,mm3

Movq mm1,mm5 // Now mm1 is free.
Punpcklbw mm5,mm7
Punpckhbw mm1,mm7

Paddw mm4,mm5
Paddw mm6,mm1

Psrlw mm0,1 // divide by two.
Psrlw mm2,1 // after division values are 8 bits.
Psrlw mm4,1 // divide by two.
Psrlw mm6,1 // after division values are 8 bits.
Packuswb mm0,mm2 // convert back to byte format.
Packuswb mm4,mm6 // convert back to byte format.

Movq ,mm4 // store results.

// Increment pointer to the next line.
Jmp back while not end of macro block

 

Example 1. Motion Compensation Using MMX Technology

Since the data range after adding two pixels is more than eight bits, you have to convert the values to short format and then calculate the average. Although we could do this with a shift (divide by 2) before the adding, this would reduce one bit of accuracy.

Motion Compensation Using Streaming SIMD Extensions