Contents |
|
Estimation 101
We have defined PQ as our estimate of player strength, we have chosen an density function that we expect to describe our player population, and we have chosen a function to predict probable outcomes for a given player-player encounter. All that is left to do is to decide how we adjust PQ estimates when we learn that we are off the mark.
//
result: 1 means I won, 0 means opponent won, 0.5 is
draw
const
T=1;
max_gain
=1;
void ChangeRating( my_rating,
opponent_rating, result )
{
delta_rating =
opponent_rating-my_rating;
expectancy = Fermi(
delta_rating, T
);
prating_change =
max_gain*(result-expectancy);
my_rating
+= rating_change;
opponent_rating -=
rating_change;
}
As an example, use a one-on-one deathmatch situation. For simplicity, we use each single kill or "frag". Adjusting ratings per frag is likely to give the fastest convergence towards a good estimate, it will also work reasonably well for free-for-all deathmatch, and it avoids issues of normalizing game results (e.g. 2:1 frags vs. 20:10).
Beating your equal, you will get 50% of the maximum possible gain (0.5 out of 1). If you lose the match, your loss will be 50% of the maximum possible loss (-0.5 out of -1). Diminishing losses and gains occur for very high ranked players crushing bottom ranks, maximum gains and losses will be applied in the opposite case of a weak player beating a suppposedly much stronger one. Take a look at the expression
expected_gain = (expectancy)*(1-expectancy)
This expression resembles the derivative of the Sigmoid function. It calculates my normalized expected gain: the probability to accomplish an improvement in my rating. The first factor is simply my likelihood of winning the encounter, while the second factor is the reward based on that likelihood: the more certain my victory, the smaller the reward. The expected gain will peak for expectancy 0.5. Keeping an eye on both expected risk and expected reward, the player is guided to search out those presumed equals.
No Pain, No Gain
The "pure" algorithm presented here differs from the Elo chess rating and the IBL Bolo rating in several ways. Most notably, we have set free parameters to 1 (e.g. T) or 0 (e.g. mean) whereever possible. This might not be justified for all such parameters. Take max_gain, the maximum gain per sample/encounter. We are free to choose max_gain=1. However, T and max_gain are not independent, and both combined express another expectation about our game.
How are these values connected? In a sense, for a given value of T and a given player distribution, max_gain depends on the amount of randomness in your game. Let's take two players with equal rank, who just met for the first time, and let's assume player A beat player B. If we are now confident that player A will thusly beat B every time, max_gain should be large enough to make sure that the new ratings for A and B reflect this. In other words, 0.5*max_gain should increase the difference in rating between the two enough to make sure that the new estimate for A beating B will be sufficiently near to unity.
Yet, we do not know what difference in PQ indicates certain victory. We would need the width of the distribution to determine this. For simplicity, set max_gain to 1 for starters and see how it works out.
The Bell Tolls
It is Day One of our hypothetical Online Gaming Service, and the worldwide tournament has just started. Players have registered, and there have to be initial ratings. Usually, for purely psychological reasons, all players would get a positive, fixed rating to start with. But this is, essentially, just an offset. The bell-shaped distribution (as well as any real world population it could safely approximate) will be centered around the average strength, and whether or not we call that average zero or 10,000 is merely a matter of caressing egos - our algorithm only cares about the difference delta_rating.
In our database we set each player to zero rating on Day One. Over time, winners will increase their rating, while those who lose will end up with smaller (i.e. negative) values. As we set both T and max_gain to 1, the actual rating values (the width of the distribution) will develop based on the skill differences found among your players. If every player can beat every other player occasionally, the distribution will be comparatively narrow - all players will have ratings within the range where the Fermi function changes significantly. If, however, we can find players A,B,C, where A will always beat B and C, and B will always beat C, then the width of the distribution will be larger - if B is an average player, then A and C will have ratings in the range where the Fermi function is asymptotic - near zero or near unity.
Thus the width of the distribution we found, relative to the temperature parameter T, measures how discriminating the game is - all with respect to a given set of players and their skills. We can only measure player skill relative to other players. The more players participate, the better our model assumptions will fit. In the low and high "tails" of the distribution our model will fail - in the real world, there is a limit on how badly a player can perform, and the better a player, the more likely she will join. There are density functions describing such distributions, but in most cases, there should be little to gain from adding this complexity to your model.
Now, after a couple of months, when every player has played sufficiently often against a sufficient number of opponents, your rating database will approach some kind of equilibrium. If there is not much change in the player population, your rating distribution will give you an increasingly good approximation of the range of player skills you are dealing with. Your model will also aid you in predicting the outcome of matches. The more randomness in your game, the less reliable the prediction for a single match - averaged over a large number of encounters, however, skill difference will eventually show, and the distribution of results for each player pairing can also serve as a measure of how discriminating your game actually is. You could also use this information to adjust max_gain for better convergence.
![]() |
Fig 6: Probability
Density Functions |
_________________________________________________