Contents |
|
Motion Compensation Using Streaming SIMD Extensions
The next example shows the same implementation using Streaming
SIMD Extensions. In this example, you can see that there is no need to do
any conversions when using the 'pavgb' instruction.
Motion_Comp_Loop: Movq mm0,// read eight pixels from one
block. Movq
mm1,// read eight pixels from second block. Pavgb mm0,mm1 //
calculate the average values. Movq ,mm1 // store results. // Increment
pointer to the next line. |
Example 2. Motion Compensation Using Streaming SIMD Extensions |
Another basic operation in ME algorithms is "Block Matching", taking two blocks and calculating the energy of the difference block.
The following example shows the basic code for block matching using MMX technology:
Motion_Est_Loop: Movq mm1,//
read 8 pixels of ref block. Movd
esi,mm0 IF not
there_is_threshold Fast_Out: |
Example 3. Block Matching Using MMX
Technology |
Since MMX technology does not contain a horizontal operation such as a sum of four short elements in one MMX technology register, and since the sum of the absolute differences takes more than eight bits, the implementation of the block matching algorithm must converted to short format and perform 3 extra adds in each iteration. At the end of the loop, you need to sum all four difference values to produce one final result.
Moreover, when using a threshold energy to avoid unnecessary calculations (which is typically the case in ME algorithms) the overhead is large, since using that method (there_is_threshold=TRUE ) means you must calculate the final sum for each iteration, for comparison with the threshold energy, Using the 'psadbw' instruction enable a quick and efficient comparison at each iteration.
The following example shows the same implementation using the 'psadb' instruction, which is specifically designed to solve these problems.
Motion_Est_Loop: Movq
mm1,//
read next 8 pixels of ref block. Paddd
mm7,mm1 Fast_Out: |
Example 4. Block Matching Using Streaming SIMD
Extensions |
Table 1 shows possible performance boosts to be gained by using Streaming SIMD Extensions. The measurements assume Block size: 16x16 pixels and hot cache.
| |||||||||||||||
Table 1. MMX Technology vs. Streaming SIMD
Extensions Implementation for ME &
MC |
Streaming SIMD Extensions include more instructions that can
improve performance of integer-based algorithms. For MMX technology
developers, these extensions can be easily integrated into previous
implementations.