Cyberspace in the 21st Century: Part Five,
Scalability With a Big 'S'

A Dynamic Hierarchy

Figure 4: A hierarchy of responsibility

Keeping Track

The distributed system is organized in a hierarchy for two key reasons: keeping track of the participating nodes, and keeping track of responsibility for all the objects distributed between the participants. The system needs to enable one node to find any other node, and for a node to understand its place in the grand scheme of things. The system also needs to store an unlimited number of objects and keep track of these, even as they get distributed around the system.

Participation

Why does a node participate? Because it has an attendant player that has an interest in playing, or more precisely, observing and influencing the virtual world currently being modeled (consisting as a set of objects).

A node needs firstly to make contact with any other participating node. It may be that some nodes are well known (probably roots or relatively senior nodes), or there may be some channel upon which participating nodes are advertised. There may even be multiple virtual worlds available from which the player can select.

Once a node has made contact, it is ready to find a close node with which to use as an initial parent (having good a connection, but with broad coverage of the virtual world appropriate to the topological network position of the player). Possibly unaware of the rapidly improving connection, the player selects the area of the virtual world, or the virtual character, that interests them most in terms of wishing to observe and influence (Princess Leia, Robin Hood, or Julius Caesar - if he's free). The game may construct mechanisms for allocating particular areas or characters of the virtual world to particular players or teams of players. Either way, when the player next connects, the nodes they made contact with last time are likely to be a good initial guess.

Once the area or object in the virtual world that the player is interested in has been determined, then this interest will be expressed to the system and adjust the selection of the player's node's parent in due course.

Naturally, the player and the object (avatar) that they're influencing will affect and experience continuous change in the set of objects of interest to the node. This, together with the changing conditions of the network, the changing relationships of other nodes (players joining, leaving), and any other changes, will cause occasional changes in parent.

A Hierarchy of Responsibility?

We start from a single node, but instead of making the default relationship a peer one, we make it a hierarchical one. This is because we are not trying to partition or divide the system, we're just trying to spread the workload. Responsibility ultimately remains with a single computer, or in some sense we always have a single server, it's just distributed over several computers.

Now one of the things about a hierarchy that may cause concern is if there is still some kind of client/server communications bottleneck effect. If there is any kind of aggregation performed by nodes, then it would appear that the root node would get a tad overloaded. A hierarchy would seem to act as much as a concentrator as the single server of a client/server arrangement, i.e. we would look forward to the same bottlenecks.

Well, I think a hierarchy has the effect of releasing that focus to dissipate throughout the system. It also allows nodes to organize themselves according the role that best suits them. Server-like computers end up serving and not much clienting, and client-like computers do a lot of clienting and little serving. Computers in between do a bit of both.

Think of it like a living tree, the closer you get to the root or the heartwood, the stronger and more stable things are. Conversely, at the periphery, there is more life and less stability. The players' computers are at the periphery, and the more reliable and capacious computers reside at the center. The most recent communications are gossiped around via peer connections, whereas the state updates to parents gradually migrate to the persistent core.

There are a variety of configurations which can support a networked game. If we have a truly adaptable distributed system then we'd expect it to assume a similar communications configuration if the connectivity and participants were the same. It all depends on the way the variety of heuristics are tuned, but if they're tuned well, we'd expect them to make the system adopt a fairly optimal configuration.

16 Player Peer-to-peer

Say we have 16 players, but they all have the same capacity computer. Well, the first player to start will become the root node. As it has all the responsibility it will initially appear the best node for all other nodes to child onto. At some point the root may realize that it has insufficient capacity to maintain the persistence of its state and may delegate this to children (according to their interest). Node relationships will also organize according to bandwidth costs. It may be that the eighth node finds it more appropriate to child to one of the non-root nodes than the root, simply because of the better bandwidth.Ultimately, what is likely to happen will be for persistence responsibility to be distributed around the nodes as necessary (if the root can't cope). Ownership is likely to follow this distribution.

As is more likely, if all computers can easily maintain all state, then the root node is likely to retain storage responsibility for all state, but each node will express interest in all objects to all other nodes. We'll end up with a fairly balanced hierarchy, with additional peer connections between nodes, until each node communicates its owned object state changes to each other node. The child/parent connection acting as one of these one to many connections.

So while all nodes have equally balanced interests and capabilities, the connections we end up with look very similar to the connections in a peer-to-peer configuration. However, as soon as computers deviate from this, then the nodes will migrate around the hierarchy according to their relative interests and capabilities.

For short-session based games, it would be overkill to use such a scalable system, but we'd still expect it to adopt an appropriate configuration.

Client/Server

If we had a supercomputer with great reliability, availability, connectivity, capacity, etc. and umpteen player computers, much poorer in every respect, then the hierarchy is likely to adopt a client/server configuration. The supercomputer would be the root, and each player's computer a child node to it. It's unlikely that any player's computer would get to own any objects at all (perhaps their avatar, but that's about it). It may still happen that some nodes will create connections with each other if only to obtain details of player owned avatars, but these would be pretty light-weight.

Overall, we'd end up seeing the parent/child connections becoming the most important.
However, if latency with the server becomes significant, more and more peer connections are likely. You might even end up with mini hierarchies developing between mutually interested groups of players, with ownership migration becoming likely too.

Distributed Server

With several supercomputers dispersed around the network, it's likely that you'd end up with a relatively central root (if that's possible), with object responsibility being portioned out to the dispersed super-nodes. It's expected that the game conspires to encourage players to be interested in the content held by their nearest super-node, and so players are likely to child off this one. They'll peer of any other node on a need-to-know basis.

Of course, if you really like this kind of flexibility, but prefer the intrinsic security of siting super-servers at ISP's premises, then you could decide to prevent object ownership migrating beyond these nodes. In this way, you end up with a system fairly comparable to a 'distributed server' system.

Scalable Peer-to-peer

If you have an unknown mix of an unlimited number of computers, then this adaptive system is ideal. If any philanthropist donates a super-computer to the cause it's likely to quickly migrate towards the root of the hierarchy. ISPs that donate some of their as yet unused rack-mounted Linux boxes are likely to reduce the bandwidth used by their subscribers, saving them money at the same time as adding value to their service. Even some players that leave their computers on whilst they're not using them (ADSL) can add some useful resources to the pool.

So in this case we maximize the use of all computers, we don't make any particular assumptions about how resources are distributed, but the more resources that are available the better the system's overall performance becomes.

Responsibility

Responsibility means being answerable about objects. One node is always responsible for all objects. This doesn't necessarily require that it be a supercomputer, but it would be nice.

Responsibility entails the following duties:

Now, in terms of analogues, registration is equivalent to the land registry of a property (the state ultimately controls everything), persistence is equivalent to the freehold of a property (the freeholder is nominally in control of the property), arbitration is equivalent to the use of a property (practical ownership), and estimation is perhaps (at a stretch) equivalent to visiting the property. There are no particular terms used to describe the delegation of registration or storage, but the act of delegating arbitration (or ownership) is equivalent to leasing and sub-leasing, etc.

New objects can only be created by the root or owned objects. In the case of an owning node, it then becomes responsible for the new object, however, it is obliged to pass this responsibility up toward the root. It's parent is similarly obliged.

At the Limit

The root node is responsible for all objects, but it is possible that it can run out of capacity to perform other duties. When the root no longer has capacity to register all objects it delegates that duty to the child that attempted to pass responsibility up to it. Thus when it runs out of storage capacity, it knows that some objects are only registered by a child. This doesn't just apply to the root, one of its children could also run out of space. In general, if such a node gets a request for an object it doesn't know about, it also checks with its child in addition to referring to its parent.

A node is more likely to run out of space for maintaining persistent storage of all the objects created underneath it. In this case (assuming no parent has the capacity), it delegates the duty for persistence to the child that passed the object up (the child gets the freehold). This means that if a node only has space for registration, it can at least know that it can service a request for the state of that node by passing it to the child to which it delegated persistence.
If an object is created for which no node has the capacity to maintain its persistence (even the creating node), then the object never gets any state (only defaults). If an object is created for which no node even has the capacity to register it, then it can't have been created, or in other words, the creation of an object requires its registration at least on the creating node.

Given this limiting behavior you can see that even in the case where objects have expanded in number to fill the space that's available, they end up overflowing from the root back out toward the branches. Ultimately in this way, such overflow objects become more and more remote. It then becomes less reliable to get hold of them and more prone to delay given limited bandwidth and the number of hops involved. Of course, once made, peer connections will obtain their details fine, but the objects won't be as persistent. When nodes that have responsibility for persistence of certain objects go offline, then it may be that some of the node's children stuck around and can assume responsibility instead. Otherwise, those objects' state becomes inaccessible (unless a peer node coincidentally had cached copies).

The system could then eventually run out of capacity, and no new objects could be created. That's a heck of a lot of objects if you imagine the collective capacity of a few million computers in a few years' time. The game design could do a lot to avoid this getting consumed too rapidly, but it may be that we'd need a policy that purged registration of the least frequently accessed objects. Bear in mind we're looking at something comparable to the situation where web sites would be growing in size faster than ISPs could plug in extra hard disk drives. It's a race between exponential growth in demand and capacity. I have a hunch that demand only increases in response to capacity, so perhaps we'll always just have enough. Hmmmn, is addressing space for only 2^64 objects enough for a few years - what do you think?

Interest

Let's look a bit more at how objects get distributed around the hierarchy of nodes - especially when the system is operating within its capacity.
All objects have a responsibility to express interest in the objects that may affect their behavior. Naturally, there are plenty of passive objects that can rely on being of interest to others (rocks, say).

An 'Interest' is a set of criteria and acts as a measure of relevance to which any number of other objects may be compared.

Interests are essentially constant entities, but they have a key feature: it is straightforward to tell if any object meets the Interest or not (partial match is classed as non-matching). An Interest could take the form of a template object with the criteria being 'all objects of this class that have the same property values'. This would be an equivalence operation, but in some cases it may be useful to perform a distance operation between a 3D position property, e.g. 'all objects of this class that have a 3D position less than L away from the 3D position specified'.

However, more than the ability to determine whether an object meets an Interest, it may be useful to see how interesting or uninteresting it is. This may be useful, but perhaps for the time being we can make do with an Interest obtaining a logical response as opposed to a fuzzy one. Remember we're already allowing objects to express an Interest as a prioritized demand for details all objects that meet particular criteria (whichever node they may reside on), it may be going a bit to far to allow the goodness at meeting the Interest to further moderate the priority of individual objects.

Given Interests deal with matching objects it's likely that an Interest may end up being laid out in a manner very similar to the objects. An Interest is therefore also likely to observe the same class hierarchy as objects. Interests may also have a variable lifetime, dependent upon the behavior of the object that's expressing them. Static objects may have long term interests in a relatively fixed sphere, whereas dynamic objects may have repeated short term interests in a conical region in front of them. Though I've used spatial examples here, there's no reason to hard-code interests in terms of spatial properties. A mercenary NPC may be interested in all country stats objects containing at least 1 battle in progress.

Note that Interests only get communicated if their lifetime is sufficiently long enough to warrant it. Their lifetimes however, must be designed carefully to ensure useful results. There's no point getting tons of objects downloaded if it's highly likely that they'll not be interesting once they've arrived. It's also worth pointing out here, that Interests are not intended to be used for modeling purposes, e.g. collision detection. There are separate services provided by each node that can monitor objects (that have caused themselves to be registered to the proximity analysis service) and then raise events as appropriate upon collision.

If you're worried about performance issues in satisfying all these interests, note that although Interests don't need to be based on spatial distance this doesn't stop us providing a low level service that does allow objects to be spatially indexed (octree, whatever). We can use this in order that spatial interests can be satisfied promptly.

The Domino Effect of Interest

An object is interested in all other objects that are relevant to its behavior. Ultimately (as Douglas Adams might phrase it) a single piece of fairy cake can be influenced by every other atom in the universe. However, for our purposes we can get by with a decent proportion of the local environment. Though players with big enough hard disks, might well end up caching nearly the entire virtual world, even if the player was just interested in a piece of fairy cake (it would have to be non-passive in this case).

If a farmer is interested in foxes and rabbits, but there is a rabbit that is too distant from the farmer's measure of interest to get downloaded, it may happen that a fox will be interested in the rabbit, and raise the importance of the rabbit sufficient for it to be brought in to the node and thus available to the farmer's perception.

The only way objects usually become able to be aware of one another in the modeling sense is if they reside on the same node. An Interest implicitly represents the 'best effort' set of all objects that meet the Interest's criteria. This set of objects will naturally change as and when objects are downloaded or get pushed from the node's cache.

Each node's Interest Manager accumulates all the resident objects' interests, and does its best to obtain objects that meet these interests. It will do this by communicating with the parent (possibly children too) and the parent can do its best to satisfy one or more of the interests, but it in turn may pass an Interest on as appropriate. The Interest in effect represents a seeking tendril that feels around neighboring nodes until it find a good source to satisfy that interest, in which case the tendril plugs in and forms a semi-permanent peer subscription to that node. These peer relationships are chosen according to the advantages outweighing the cost (of bandwidth).

For example some distant rocks might be passive objects, but another player's weapon might blast them to bits. The weapon may be out of range of the node's interest, and thus the weapon will not be modeled. However, in due course, there will be incoming state updates to the distant rocks that results in a 'destroyed rocks' situation.

If they were distant bunny rabbits that tended to run away from farmers with shotguns, then there's a good chance the bunnies would express interest in farmers (unlike rocks). A farmer appearing on the horizon might just get downloaded (even if it was the barest of state info). This might be enough to get the bunnies to run away in a fluid motion, rather than occasional updates make them move in jerky fashion. There's still a good chance the bunnies run and get discontinuously relocated to the remotely arbitrated position, as opposed to the local estimate, but it's better than nothing.

An object with behavior that is affected by some aspect of its environment, expresses interest accordingly. If that results in other objects turning up, that have behavior and a consequent interest in their environment, then we end up with a set of objects prioritized according to their relevance for modeling the experience of the observing avatar.

Two different avatars thus have a perception of 'reality' from two different perspectives.

The greater the degree of commonality in the interests of two avatars, the closer the modeling of their experience will be. This is because they are likely to have the same set of objects (though one node may have less resources than the other).

Thus the closer two avatars are, the more their perceptions will agree. And at the end of the day, that's all we need to worry about. As long as interacting players have a consensual version of reality then they'll make sufficiently similar interpretations of their situation that each player will be able to believe all players are in the same 'reality'.

Note though, that it's up to the games designer to determine where the priorities lie in contriving the best-effort modeling for an avatar. It may be more important to model enemy helicopters than weather vanes on church steeples. Only the games designer will understand the player enough to be able to have some idea of what is most likely to be important to their experience.

Caveat Emptor

The system I'm gradually providing more and more clues about is one that attempts to address issues of scalability. It's not one that makes minimal use of CPU or storage. It certainly doesn't attempt to provide any integrity or consistency guarantees.

People are already developing systems which address particular configurations, particular player counts, particular bandwidth and latency constraints, and because of this, such systems can achieve some degree of integrity and consistency, and can be performance optimized in many areas.

Do not mistake these articles as guidelines for developing the systems underlying contemporary multiplayer games. There are many techniques and algorithms that I haven't covered, many of which are critical to systems in use today. You'd need to get very familiar with them before embarking on such development.

So, I'm not talking about how to design a system that is going to support umpteen thousands of subscribing players and meet their demands for a glitch-free, 100% uptime, quality experience, that is profanity free, and above all fair.

For that matter, if you built a system that observed the principles I'm suggesting you'd probably end up with something that you couldn't charge for. Not only would there be no security, but by the time a million or more people participated, there'd be loads of people exchanging offensive material, and a few vandalizing the system wholesale.

It is difficult to imagine anyone who'd be 'brave' enough to invest in the development of such a system.

However, if we're going to have cyberspace, a system as large as the web that exploits the connected computers, then it doesn't matter how unsound a business prospect it is. As technologists, as games developers looking to tomorrow, the sooner we understand the issues, the sooner we'll create the entertainment system of the future.

Resilience and Security

It's one thing getting a plane to fly, it's another to stop it falling to pieces (or getting shot down).

Next time I'll be discussing methods and strategies for ensuring the system can cope with sudden and unexpected failure of nodes or parts of the network. These I'm pretty confident about. It's less certain how to obtain security in the face of concerted attacks. However, I'll have a go.

Until then, check out Dig It Network Multimedia Inc. for an example of a possible solution to security in P2P games.

Discuss this article in Gamasutra's discussion forum

________________________________________________________

Interesting Times