Contents |
|
Data Structures and Classes
The CBezierTessellation Class:
The CBezierTessellation class serves two purposes. The first is to hold the triangle indices data for the current tessellation level. This data is sent to the rasterization HW and indicates how to connect the vertices generated by the tessellation process into triangles. The second purpose is to hold precalculated values for the basis functions and their derivatives for the chosen (s, t) parameter pair at each sample point. Since our tessellation is not dependent on the position of the control points of the Bezier patch, we can use one CBezierTessellation object for all the Bezier patches in our application.
The CBezierSurface and CBezierPatch Classes:A single, 4x4 patch can generate only a limited set of objects.
This is why we often see sets of 4x4 patches connected to each other to
produce the object, where each patch defines only a small portion of the
object’s surface. Neighboring patches are connected by sharing the control
points along the common edge of the connected surface. To use the data
efficiently, we define two classes: CBezierSurface, which holds all the
control points used by the object, and CBezierPatch, which holds a 4x4
matrix of indices to control points stored in the patch parent
CBezierSurface object.
Streaming SIMD Extensions Implementation
SetupWe use Streaming SIMD Extensions to evaluate the surface position and surface normal for four sample points in parallel. Within the data structure, the basis function values for the sample points (s, t) are organized into groups of four values each. To improve cache locality, we store these values in continuos memory blocks. The memory footprint of the basis function values looks like:
B0,3(s0),B0,3(s1),B0,3(s2),B0,3(s3),B1,3(s0),B1,3(s1),,,,B3,3(t0),
B3,3(t1), B3,3(t2),
B3,3(t3)
For every iteration of four sample points, we need 32
floating-point values (4 s values x 4 basis functions + 4 t values x 4
basis functions), or 4 cache lines. We use the prefetch instruction to
tell the processor that the next 4 cache lines will be used in the next
iteration. This way, no cache misses occur during the execution of the
tessellation algorithm.
We also setup the tessellation process by expanding the control points four times. We need to expand the current set of control points because the algorithm uses the control points values in parallel to generate four vertices. (When tessellating directly to screen space, we need to expand the control points after the transformation to screen).
Position calculationThere is a big difference between the tessellation of screen space
surfaces and the tessellation of object space surfaces. For object space
surfaces, we only need to calculate the surface position “by the book”.
For screen space surfaces, we usually need to transform using a
perspective projection. The perspective projection changes the actual
position of the x, y coordinates based on their z-value. To make the
surface persistent in projective transformations, we must convert our
patch to a “Rational Bezier surface”. A Rational Bezier surface is based
on a set of homogenous control points coming from the control-point
transformation algorithm. The fourth coordinate (W) is used as a weight
that divides all the other control points coordinates:
The last stage of the surface position generation is the
reformatting of the output data to the rasterizer. The data comes out as
groups of 4 X’s, 4 Y’s, 4 Z’s and 4 W’s, while the rasterizer expects x,
y, z, w vectors. We use the _MM_TRANSPOSE4_PS macro to transpose the
values to the correct format.
Finally, the code for the tessellation of surface position
is:
// Zero the
vertex coordinates
//
pre-multiply basis function values for s & t
DWORD
idx = _indices;
vertex.x += coeff*pt->x;
// to screen
space
}
}
//
Perspective division - required only when tessellating
//directly to
screen space
// 1 over w using rcp
vertex.x *=
rhw;
vertex.y *= rhw;
vertex.z *=
rhw;
//
Transpose position values from
// to
X4 format.
_MM_TRANSPOSE4_PS(vertex.x, vertex.y,
vertex.z,rhw);
//
Store the four vertices position, using unaligned
move
storeu(&vtx.sx,
vertex.x);
storeu(&vtx.sx,
vertex.y);
storeu(&vtx.sx,
vertex.z);
storeu(&vtx.sx,rhw);
___________________________________________________________________