Scores To Settle

Lets get back to the value of a frag. Our initial idea was that we want to accurately measure an unknown quality, the player's skill or "strength", her ability to compete against others. For simplicity's sake let's pretend this could be measured by a single number. Of course, in all likelihood this will be about as just and accurate as the infamous IQ, but at least it is some indicator. Let's call our estimate of the strength of a given player a PQ.

If we look at all our players, there will be lots of average players, some really bad ones, and some outstandingly good. This is called a probability distribution in PQ. Figure 1, 2 and 3 show example distributions for small and large player populations.

Here is our link to the outcome of an actual match beyond simply determining the winner. Remember that all our swapping and sorting is an attempt to approximate the players' strength - to solve the problem faster, we either need more or better samples, or we need to make better use of each sample. An algorithm that outperforms our simple pairwise exchange, converging faster to a better estimate of the true player ability, has to make use of more information as contained in a frag event. If only we knew the difference in PQ between the winner and the loser. But we do not know the PQs, so how do we get them?

Fig 3: Example skill distribution for 50,000 players

Bootstrapping

If we don't have any information, we might as well start anywhere. Everything described in this article rests on making certain assumptions about the players, their real strength, and our ability to measure it. For example, we will pretend that PQ (and thus real strength) are linear, so we know we can put the zero mark wherever we want, as only PQ differences are presumed important. Our algorithm will be:

Just as with a ladder, we are effectively trying to minimize our approximation error by using game results as statistical samples.

As we will be iteratively improving our estimates, we might as well randomly choose a PQ value for each player, as a starting point. No matter how bad our initial estimate, eventually the algorithm will converge. We expect the ultimate results to resemble the examples of player populations in Figure 1,2 and 3 - bell-shaped, with a lot of average players, and a few good and bad ones.

In other words, we expect that the probability to encounter a player of a given strength can be approximated by a Gaussian distribution (thusly trusting in the Central Limit Theorem's assurance that the world gets simpler the more complex it is). Distributions can be described by writing down a probability density function (PDF). The histograms in Figure 1,2 and 3 measure frequency by binning (counting within a PQ interval). Dividing the number of players for a given PQ (interval) by the total number of players normalizes the result. Calling player strength (i.e. our estimate PQ) the argument x, the density function P(x) measures the probability to encounter a player with a PQ of value x. P(x) is a normal PDF if its integral over all possible values of PQ equals 1 (for any width). Figure 4 below shows such a normal distribution for mean zero and unit width (see also Figure 6). The width or standard deviation is the square root of the variance found in the data. The distribution also has a mean value. Many mathematicians prefer to describe P(x) by mean and variance instead of using the width.

    // N normalizes the integral to 1 for a given variance
    const double InvN = 1/(sqrt(2*pi*variance);
    double P( double x ) {
         return InvN*exp((x - mean_value)^2/(2*variance) );
    }

In the real world, you will have to measure the variance of values for a given number of players N. The mean value can be set to zero as long as we consider strength (and thus PQ) linear: we can always determine the average over all values and use it as an offset.

     // Sum: i over N
     mean_value = (1/N)*Sum( x(i) );
     variance = 1/(N-1)*Sum( (x(i)-mean_value)^2 );


For this distribution, you can expect 68.3% of your players to fall within the interval (with mean_value being zero). Make the interval twice as large, and you will cover 95.4% of the player population. There is little point in trying to cover the fringe, aswill only bring you up to 99.7%. The distribution is symmetric with respect to the mean value: 50% of the players will be below the average value, 50% above, 34.15% will be in the intervaleach, and so on.

Fig 4: Probability Density Function
Normal Gaussian Distribution
with unit variance and zero mean

Decisions and Uncertainty

Our algorithm requires us to make a guess on the expected result of an encounter between players, based on their relative strength, which in turn is approximated by the current PQ's. How do we calculate this? The simplest decision function is a hard threshold:

  // return expectation based on PQ
  if ( pq > opponent.pq ) return 1;    // I win
  if ( pq < opponent.pq ) return 0;    // opponent wins
  return 0.5;                          // a draw
            

Using the threshold function is appropriate if there is no uncertainty in the game. In the real world, we will have to deal with some uncertainty: e.g. better players will occasionally lose against weaker oppponents. Our hope is that the average will be meaningful - that, over time, true skill will show itself in the accumulated results despite random noise.

To account for this, we need a softer decision function, one that is sensitive not just to the sign, but the magnitude of the difference in PQ. The expected average result - the probability of a given player beating a given opponent - should be a continuous, differentiable function of the difference in ratings. If the difference becomes arbitrarily large, the probabilities should approach zero (weaker player wins) and one (stronger player wins), respectively. In the case of two equally rated players, the expected average outcome should be 0.5 - not a just draw, but a 50-50 chance for each player.

The kind of function we are looking for should thus have to following properties:

    lim x- -infinity f(x) = 0
    f(0) = 0.5
    lim x- +infinity f(x) = 1


More, it should be symmetric around ( 0, 0.5 ), that is f(-x)-0.5 = -f(x)+0.5 or f(-x) = 1-f(x). Why? If the player trade places in our calculations, it should just swap the probabilities, and the sum of outcome probabilities has to equal unity. Finally, it should be strictly monotone: probability always increases, never decreases, with increasing difference in the two players' ratings.

As before, we choose a mathematical function well-known for its nice properties, the Sigmoid Function. This ancient beast from the dungeons of mathematics can be encountered in so diverse yet related areas as combinatorial optimization (simulated annealing), neural networks (logistic or sigmoid transfer function), statistics, and physics (e.g. as the Fermi probability distribution). We borrow from physics a generalization of the Sigmoid function, i.e. use the Fermi function. It scales the argument x by a factor of 1/T. The Fermi function describes an infinite number of functions with sigmoid shape, parametrized by T, which is commonly referred to as "temperature" parameter.

    double Fermi( double x ) {
         return 1.0/(1.0+exp(-x/T));
    }


In Figure 5, Fermi(x) is plotted for different values of T. Parameter values printed are T=0.1, 0.5, 1.0, 2.0, and 5.0, colors ranging from blue (low temperature) to red (high temperature). Each curve has the distinct S-shape, but you can see how lowering T brings us closer to the threshold function. For infinitely high temperature, Fermi(x) will flatten out to 0.5 for finite x. In other words, the higher T, the less a given different in PQ will matter with respect to the outome of a game. In a way, T determines the significance of the scale we use for our PQ. But for the moment, these values are arbitrarily chose. Every positive T, every value in ]0,infinity[ will satisfy our requirements. For the time being, we will just set T = 1.

Fig 5: Fermi Function for varying temperature T

_________________________________________________

Estimation 101