Flex Your Facial Animation Muscles
Contents |
|
In my
preceeding article, "Read My
Lips: Facial Animation Techniques" I left off with a nice short list
of the visemes I would need to represent speech realistically. However,
now I am left with the not insignificant problem of determining exactly
how to display these visemes in a real-time
application.
It may seem as if this is purely an art problem, better left to your art staff Or, if you are a one-person development team, at least left to the creative side of your brain. However, your analytical side needs to inject itself in here a bit. This is one of those early production decisions you read about so much in the Postmortem column that can make or break your schedule and budget. Choose wisely and everything will work out great. Choose poorly and your art staff or even your own brain will throttle you.
Decisions, Decisions
From this information I can expect that if I can reasonably represent these 13 visemes with my character mesh, then continuous lip-synch should be possible. So the problem really comes down to how I construct and manipulate those meshes.
Certainly, the obvious method for creating these 13 visemes is to generate 13 versions of my character head mesh, one to represent each viseme. I can then use the morphing techniques I discussed in my column “Mighty Morphing Mesh Machine,” in the December 1998 issue of Game Developer to interpolate smoothly between different sounds.
![]() |
Figure 1. The
“l” viseme as seen at the start of the word “life.” |
Modeling the face to match the visemes is pretty easy. Once the artist has the base mesh created, each viseme can be generated by deforming the mesh any way necessary to get the right target frame. As long as no vertices are added or deleted and the triangle topology remains the same, everything should work out great. Figure 1 shows an image of a character displaying the “L” viseme, as in the word “life.” The tongue is behind the top teeth, slightly cupped, leaving gaps at the side of the mouth, and the teeth are slightly parted.
Suppose in addition to simply lip-synching dialog, your characters must express some emotion. You want them to be able to say things sadly, or speak cheerfully. We need to add an emotional component to the system.
![]() |
Figure 2. A
very surprised “l” viseme. |
For example, let me start with the “L” sound from before and blend in a surprised emotion at 100 percent. The “L” sound moves the tongue up to the top set of teeth and parts the mouth slightly. However, the surprise target drops the jaw even farther but leaves the tongue alone. This combination blends into the odd-looking character you see in Figure 2.
This problem really becomes apparent when the two meshes are actually fighting each other. For example, the “oo” viseme drives the lips into a tight, pursed shape while the surprise emotion drives the lips apart. Nothing pretty or realistic will come out of that combination.
________________________________________________________