Cyberspace in the 21st Century: Part Five,
Scalability With a Big 'S'

Reviewing the Object

Before we go any further let's discuss what we're distributing.

If you want to have a comparable idea of the sort of thing I'm describing then JavaSpaces would be a good term to plug into your search engine. It's an evolution from Linda - based on David Gelernter's tuple-space idea.

What the games programmer sees in terms of objects described in a programming language and what goes on under the hood of the distributed system that supports it can be somewhat different. Imagine a language much like Java for the time being. However, common to many virtual machines, one isn't necessarily tied to a particular language. Even if the virtual machine is object oriented, one can develop languages appropriate to particular types of application - games for instance. I'm envisaging such a language tailored to games, but to keep things simple for the time being, we only need to appreciate that the storage of our objects can be managed irrespective of the programming language and virtual machine that manipulates them.

The Object Store

In our object store we keep a record of the class inheritance hierarchy and the details concerning the definitions of each classes' methods (or properties). The class is an object as well as a template that governs the form of instances of objects of the class. Each class defines methods which either execute code or manipulate corresponding state variables. All objects (including class objects) contain details of their ownership, e.g. last known owner (lease-holding end-user), last known freeholder.

Figure 1: Single inheritance of properties and operations


Figure 2: Objects only contain non-default values


All we need to distribute then are objects and the classes that define them. The objects consist of one or more values (Figure 2). These values are held within method slots, and a value represents either an operation or a property. Each operation consists of a string of byte codes and each property consists of a value. However, as operations are the same for each object these only appear in the class object - that special object that defines the class operations and default values for each property. In this way objects will only contain values that differ from the defaults.

As you can see (Figure 1) a class object may inherit from another class object. In this case, the derived class object only contains operations or properties that differ from those in the base class. All methods are implicitly virtual. Note though that only single inheritance is supported in this scheme (it'll do for starters).

Figure 3: Example of Object Layout

Object Layout

When we come to implementing our object database, we're probably going to end up with something like Figure 3. I won't say it's going to look exactly like that, but there will be some similarity.

Each object will need to contain information sufficient to track down its class definition (inheritance, class methods, property defaults, etc.), i.e. both the details of the class and a good idea of which node to talk to in order to obtain those details. We also need some information to give us an idea of how up to date we are in order to specify what updates we're interested in. We can use a system of revision information which may be as simple as a revision serial number, or it may involve a timestamp of some sort, or even both. Note that time on the Internet is a problem all by itself.

The properties of an object may be able to be updated independently of other properties, or they may need to be marked as coherent with other properties. Some may even be marked as not needing updating at all, e.g. properties which are always the result of computations alone.
It's likely that we'll need to record both the locally computed (predicted) value of a property as well as the latest news (received indirectly from the owner) of its arbitrated value. This allows us to pass on this news. We'll also need to know if we own the object or not, and otherwise, who does (or the last known owner).

Note that the local storage requirement of an object will be larger than the amount of data required to transmit some of its details. There are many ways of optimizing the communication overhead. For example, if the receiver communicates the extent to which it is up to date, then the sender only needs to send more recent information.

In case you're concerned about the local storage requirement, recall that one of our guiding objectives is to prioritize the reduction of communication overheads over and above any reduction of storage or processing overheads.

Values

Values are either immediate values or references to larger data elements that are held in an appropriate repository adjoining the object store. Note though that all values are immutable (constant), the object (including the class object) is the only mutable entity that the system deals with. Series of values may be created, but once created they remain constant (until they get destroyed upon a zero reference count).

This allows us to easily refer to large amounts of constant data that many players already have. For example, it is likely that someone could produce a DVD-ROM of a snap-shot common textures and geometry that exist in the system. All objects that use these only need to transmit the references to such constant data. Of course, if the node doesn't have the data available then they must download it, but this can be done at a relatively low priority (a lesser level of detail object is likely to be sufficient in the interim).

Note that large values may only be deliberately destroyed if they have never been communicated outside the node. A similar policy exists with respect to IDs for objects, series, etc. Local IDs can be used for greater convenience until something is communicated outside the node, in which case globally unique IDs must be used. NB they can still be tokenized in cases where they are mentioned several times in a message. Of course, intermediate values that arise within a computation do not need to be stored. It is only when persistence is required that values need to be written to persistent storage.

Operations

Some operations of an object may be marked as to be executed upon a particular standard event, e.g. when a new object instance is generated, or when the containing object arrives at a node. I doubt it would be prudent to allow an operation to execute upon being flushed from the cache, however, as the object is implicitly of least interest, and any further behavior on its part is unlikely to be useful.

Aspects of Distribution

To permit the game code to be aware to some extent of the distributed nature of the underlying system, there may be a need to mark some operations as operable only if the object is owned, or not owned. In addition, some operations could be marked as auto-forwarded, i.e. the call is forwarded to the owner of the object and executed on that object, with the result returned. These could be blocking (wait for return), or non-blocking (result ignored). Such things may require different underlying communications strategies, but as long as the game developer understands what they're doing, such low-level controls may come in handy sometimes, e.g. in achieving synchronization where it's critical.

Persistence

Remember that the persistent storage system is a limited resource. A policy similar to 'least recently used' will remove objects or values when space runs out. In this case it will be an 'of least current interest' policy. When a property value is missing the class default for that property is used, and null is used for missing values resulting from a computation (rare). When an object is missing, a default instance of the same class is used. When a class method is missing the base class' method is used. Ultimately a null value is used. Generally, the best default is used in the event of missing information. Whilst one could create diagnostic tools to catch such events, there really isn't any point in alerting the user or trying to do any recovery, because these are likely events and there's no remedy available in any case. You can't restart the system or perform a roll-back. You simply have to assume that such missing data only occurs in relation to particularly uninteresting objects. For example, the lush mahogany texture map may be missing, but then the default wood texture may be used instead. Naturally, it is up to the games programmer to utilize the inheritance facility to create a cascade of ever more sophisticated detail, i.e. define how a simple object property of wooden, is part of a hierarchy that at some point may be flat-shaded brown, but at another point is highly polished mahogany. Given objects could have their base properties implicitly prioritized for distribution over and above their more derived properties, this can help reduce distractions caused by degradation in simulation fidelity of objects at the periphery of ones area of interest (it's better for a distant animal to appear of the correct color, than an arbitrary one, if it's fur details were too big to download fast enough). Similarly, it's better for a distant vehicle to have its general vehicular properties downloaded before its specific properties (behavior, damage record, cargo, current operating parameters, etc.).

Ownership

All nodes that own objects get to own them because heuristics determined them as the most suitable nodes to own them, and these heuristics obviously have to be contrived to encourage ownership by nodes that are interested in the objects and have enough resources to do a good job in modeling them. In other words, all the glitched modeling that occurs is very likely to be related to uninteresting objects, and thus unowned objects. Therefore, we can expect that such erroneously modeled objects will be overridden by incoming updates of the objects' state from the owner.

Communication

Remember, whilst most systems have a hardwired distinction between properties necessary for the visualization of an object (position, orientation), and those necessary for the behavioral modeling of an object, communicating the former (with dead-reckoning), and not communicating the latter. In our case this distinction is not hardwired, but determined by the game's designer, moreover, it is done on a priority basis. This means that if the behavioral properties are distributed on a lower priority than the salient properties (position, orientation), then at least these salient details will be communicated. When the behavioral properties get to us (as the object becomes more interesting) we obtain effective dead-reckoning. Perhaps not that kind of dead-reckoning where the server is aware of the client's prediction algorithm (and duplicates it) and then only needs to advise the client when its estimation diverges too much. Nevertheless, as both parties in our case are expected to perform the same modeling there is still some potential for prioritizing communication according to the computed value's distance from the arbitrated value, though the game designer would have to determine the precise relationship if any. Perhaps this would best be left as an empirical research exercise.

Perhaps I should note here that when you have a huge virtual world, it gets so big that the client software cannot be forewarned of all the content that is likely to come its way. A player can't be required to upgrade their software just because someone elsewhere has invented a new vehicle type. We have to design the system such that the client can obtain information about how to model a new object of which it was previously unaware. Not only will there be more objects than a single computer can store, but there will be more classes of objects than a single computer can store the modeling details of.

Like the Web, the system has to cope with live and continuous development of the underlying software, the game design, the game content, and the game state. It cannot be shut down, reset, or suspended.

________________________________________________________

A Dynamic Hierarchy