Contents |
|
Process Peaking
Process peaking occurs when AI processes are scheduled to execute on the same frame at regular intervals. For example, consider that three behavioursets are initialised at the same time, with execution periods of 4, 6 and 8. In consequence, every 8 and 12 frames, 2 behavioursets require execution, while every 24 frames 3 behavioursets require execution. When the PM is working with large numbers of behavioursets, with a diverse range of execution periods, process peaking will impact on the effort to maintain a constant frame rate.
There are two main methods for minimising the number of processes executing on individual frames:
(a) Specifying execution periods as prime numbers can reduce process peaking. Compare the previous example with the following: now the three behavioursets have execution periods of 3, 5 and 7 (all prime numbers). This time there are 3 cases where there are 2 behavioursets requiring execution occurring at intervals of 15, 21 and 35 frames, while every 105 frames there are 3 behavioursets requiring execution. Table 2 summarises the number of occurrences of process peaking under the different execution periods during a 105 frame period.
Table 2. Process
peaking under different execution
periods | ||
Number of 2 behaviourset
executions in 105 frames |
Number of 3 behaviourset
executions in 105 frames | |
Execution periods 4, 6,
8 |
13 |
4 |
Execution periods 3, 5, 7
(prime) |
12 |
1 |
Table 2 shows that when using prime execution periods process peaking is reduced, despite the fact that each behaviourset now executes more frequently. The instances of process peaking with 2 behavioursets reduce marginally, and the occurrences of 3 behaviourset process peaks are reduced by a factor of 4. The current PM does not incorporate prime periods in its design; instead, users are simply informed of this useful fact.
(b) Process peaking can be reduced by delayed activation. Agents may have the same execution period without ever executing in the same frame. This is possible through the introduction of a mechanism for delaying activation of agents. Take the example of two newly created agents, each requiring execution every two frames. By delaying the activation of one of the agents by one frame after agent creation it is possible to interleave the execution of the agents so that only one executes each frame, thus minimising process peaking. The same principle can be applied to behavioursets within agents. The PM user can specify activation delays for behavioursets at compile-time, while the PM itself can handle activation delays for agents at run-time.
Two possible cases exist for the interaction of execution period and activation delay:
· Some or
all of the agents/behavioursets with different activation delays are put
permanently out of phase, and will never all execute simultaneously.
·
The processes all execute simultaneously before they would if no
activation delays were specified, but subsequently execute simultaneously
at the normal interval predicted by the lowest common multiple of their
execution periods.
The important point is that specifying activation delays cannot increase the period of process peaking (compared to simultaneous activation), it can only decrease it.
The PM reasons about the density of processes executing within the game, and decides on the best way to fit new agents and their behavioursets into this environment. The PM uses the following formula to calculate the number of processes executing in a future frame at time step t.
Where
P
= number of processes,
n = number of agents,
t = time step under
examination,
ai = activation time of agent i,
pi = execution
period of agent i,
The formula can be modified to consider behavioursets within agents with different activation delays and execution periods. This formula is used in the pseudo-code in figure 3 to determine the optimal activation delay for a newly created agent. d is the maximum activation delay possible for a newly created agent, while l is the lookahead distance over which the number of process peaks is examined. A reasonable value for d could be 3, while l could be set to the largest behaviourset execution period within the agent, plus its associated activation delay. These settings allow newly created agents to have their activation delayed by up to 3 frames, based on an assessment of future process peak levels including at least one execution of all the agent's behavioursets. For a more comprehensive search of future process peaks, the lookahead distance l can be specified as the lowest common multiple of all the agent's behaviourset execution periods. The lowest common multiple is the maximum length of repeated process peaking patterns; therefore, specifying this as the lookahead value ensures that the algorithm considers all potential future process peaks. However, this approach is computationally expensive, as the lookahead distance may be large, particularly if execution periods are specified as primes.
For d time
steps from current time step
Calculate
magnitude of process peaks over next l
time steps
end
for
choose
activation delay resulting in smallest maximum
process peak
The complexity of the algorithm is O(npn) where n is the number of agents under consideration and pn is the maximum period among all the new agent's behavioursets plus that behaviourset's associated activation delay. This is a pseudo-polynomial time algorithm.
The parameters for the lookahead algorithm trade minimising process peaking with maximising the expense of the search. Therefore, the algorithm can be used either with a small lookahead at run-time for dynamic activation delay, or with full lookahead as an offline optimisation tool for sets of agents and behavioursets. With dynamic activation delay the PM will avoid unnecessary process peaking when new agents are added to the AgentList at run-time.
"Level of Detail" AI
Sometimes it is desirable to modify the priorities of agents while a game is running in order to control the allocation of AI time to different parts of the game AI. For example, in a crowded city environment, the priorities of pedestrians further away from the camera could be reduced. Such functionality is necessary to implement the egocentric principle. Reducing the priority of agents far away from the player is analogous to the graphical technique of using different 'level of detail' models for rendering according to distance from the camera.
The PM allows run-time modification of agent priorities. It may also be useful to modify behaviourset priorities at run-time. The PM allows this also. Similarly, it may be necessary to modify execution periods of agents, and potentially behavioursets. Modifying execution periods may unbalance the AI computational load resulting in process peaks. Nevertheless, for flexibility, facilities are provided to modify execution periods for both agents and behavioursets.
AI Granularity
The PM is a non-pre-emptive multi-tasking scheduler: it only regains control of the processing thread when the code "atom" (e.g., behaviour in a behaviourset) terminates. As the PM has to wait until AI code returns control, it is inevitable that overruns will occur. For example, a behaviour might be allocated 100 clock cycles, the PM invokes the behaviour, but when control is returned 200 clock cycles have elapsed. In consequence the PM cannot guarantee to terminate within the specified AI time: it can only approximate to the specified upper bound. This can lead to some agents or behavioursets not having time to execute. To alleviate this, we allow execution of agent/behaviourset lists to resume after the last executed agent/behaviourset, as well as allowing time allocation to agents/behavioursets to be continually recalculated based on the remaining available AI time in a frame.
We decided to avoid the complication of pre-emptive scheduling (interruption of AI processes by the PM at any time). This does not mean that pre-emptive scheduling is not worth considering. However, on machines that make extensive use of instruction and data caches, such as the PlayStation 2, pre-emptive scheduling can reduce run-time efficiency.
Implementation
We developed a test-bed consisting of a population of agents inhabiting a 2-dimensional toroidal environment. Each agent moves within the environment according to a speed and heading, which they are able to modify subject to certain constraints on maximum acceleration/deceleration and turn rate. Each agent has two goals, to follow a target agent, and to avoid collisions with other agents
Three versions of the testbed were implemented to demonstrate the benefits and costs of using the PM to execute AI code. The different versions are listed in table 3.
Table 3.
Testbed versions | |
Version |
Description |
Non-scheduled
naïve |
Agents execute all their AI behaviour on every frame, without using the PM. |
Non-scheduled |
Agents execute each of their AI behaviours with the same execution periods and delays as the scheduled version, but without using the PM. |
Scheduled by process
manager |
Agents execute their AI behaviours using execution periods and delays, using the PM to control execution. |
Figure 2 presents performance results for the different testbed versions. Maximum performance is the number of agents that cause "performance degradation". In the case of non-scheduled testbed versions, "performance degradation" means dropping from a frame rate of 60 frames per second. In the case of scheduled testbed versions, "performance degradation" means failing to completely execute all agents and their behaviours when scheduled to do so. "DAD" means "dynamic activation delay". Two scheduled versions of the testbed were evaluated, one with no dynamic activation delay, the other with a maximum dynamic activation delay of 2. The results show that while the performance hit from using the (non-optimised) PM is around 20%, much of this can be reclaimed through dynamic activation delay.
![]() |
Figure 2. Testbed performance
results |
More importantly, the scheduled version can give the appearance of handling more than 67 agents by suspending the execution of some AI behaviours. The scheduled DAD version can process 169 agents before frame drop. In this situation, some of the agent behaviours are executing less frequently. By changing the priorities of agents at run-time and re-ordering the AgentList, game programmers can control which agents will lose AI resources first. A fuzzy upper limit and graceful degradation of AI performance replaces a sharp cut-off point for loss of frame rate. This is very useful for games with a variable and unpredictable number of agents that consume a variable and unpredictable amount of CPU time. This means that the need for manual optimisation of AI code is reduced. If your game allows "believable" rather than "accurate" AI processing then the limiting factor becomes drawing time rather than CPU time. You just might be able to satisfy that design brief after all.
Careful readers will note that the use of the process manager will require programmers to specify extra parameters for their code, such as whether code fragments may be interruptible, how much share of CPU time should be allocated to code fragments, and so forth. This can represent an increase in AI code development time. However, the important point is that time spent specifying process manager parameters will save time in the area of AI optimisation: due to the fuzzy upper time limit on AI processing per frame the point when hand optimisation of AI code is required is postponed. We anticipate that use of the process manager will save time overall, as hand optimisation of all types of code, not just AI, is a time consuming process.
AI Hardware Acceleration?
Universal methods are a prerequisite for hardware acceleration. Computational, projective geometry is the universal basis of graphics programming, but there seems, at first glance, to be no such common basis for AI. Defining a general-purpose execution "pipeline" for AI processing is a step in the right direction. We think that the Process Manager describes such a pipeline: AI code can be run upon it, and there are benefits from doing so.
As stated, the process manager makes no assumptions about the precise nature of the AI code. The process manager is identical to an operating system in the sense that it can run multiple different threads performing different kinds of computational work, and imposes no restrictions on the type of code that may be run. However, there is an issue of code granularity. Some AI mechanisms are more amenable to time-slicing than others. For example, we are working on a rule-based, declarative language to sit on top of the process manager. Our reasoning is that rule-based languages are much easier to interrupt, and exhibit finer execution granularity than their procedural counterparts. The higher the code granularity the better the process manager can allocate time. This is an issue of performance rather than competence. The process manager really can run any kind of code, whether AI or not, or whether the code content is evolutionary computation, finite state machines, simple flocking algorithms or any other mechanism for that matter. However, the granularity of that code can affect process manager performance.
But this is just a start. Chips that truly accelerate AI processing will need to deal with two main themes.
First, game AI code generally exhibits a coarse-grained parallelism at the agent level. Characters in a game "sense" the current game state, perform some "internal" processing and then "act" upon the game state to produce a new game state. The internal processing of each agent could occur in parallel without requiring significant changes to AI logic. Future AI accelerators could impart huge processing speed-ups with this kind of parallelisation. Note that this isn't fine-grained parallelism at the instruction level, but a coarse-grained parallelism at the agent level.
Second, game AI code is essentially rule-based. In the abstract, AI code maps conditions on game state to changes to game state. In procedural languages, such as C and C++, the mapping is implemented as "if then" rules acting on data structures. Despite claims to the contrary, AI code is very ordinary in this sense: it is like other code, except it usually exhibits a higher density of condition tests that tend to predicate on a highly dynamic game state. AI hardware accelerators need to directly address the rule-based nature of AI code. Progress in this area is more difficult, as it requires developing and using new real-time, rule-based AI languages while simultaneously designing hardware that speeds the execution of such languages.
Ian Wright began creating games in the early 80's at the age of 14, developing two hit games for the ZX Spectrum home computer. He received a PhD in Artificial Intelligence from the University of Birmingham in 1997. He now develops AI technologies for PS2 at Sony Computer Entertainment Europe's "Team Soho" development studio. Ian can be reached at Ian_Wright@scee.net.
James Marshall came from a background in computer science, via artificial life research and part-time commercial game development on the Amiga, to his present position at Sony Computer Entertainment Europe's "Team Soho" development studio. He currently works on core AI technologies for PlayStation 2 games. James can be reached at James_Marshall@scee.net, unless he's climbing mountains.
Discuss this article in Gamasutra's discussion forums