EDIT
 Compressing Reality

# Stats and Chess

I’m not great at chess – by which I mean that I’m almost certainly below the average ranked player (I’m not even ranked so I’m not sure on the truth of this point). However, I think this analysis I did is really interesting, so here it is.

## Gathering Data

A quick note before I begin. In chess game theory, we distinguish between a move and a ply. A move is when both players complete an action. A ply is when a single player completes an action. Thus, in chess, there are always two plies per move.

I wrote a program that played through hundreds of thousands of master-level chess games. I ignored any games in which either player had a rating less than 2500. At any point in time, if there was a was ever a position in which no captures appeared in the 6 prior plies or the 6 future plies, that position was saved. This was to ensure that I was only saving relatively stable positions, as opposed to a position just after, say, someone captured my rook and I was about to recapture.

A problem came up with this. Some chess games have quieter periods, whereas other chess games have captures more evenly spaced throughout. The games with longer quieter periods were given undue weight. Thus, I further required that all saved positions were at least 3 moves away from all other saved positions.

From these saved positions, I gathered a wide variety of data on material, pawn structure, and mobility. Before, I get into what I found, I need to make one last note on mobility. The mobility calculation was based on how hanging piece analysis is done.

Take this position, for instance,

Each king has 5 moves. What about the white pawn? The way I counted mobility, that pawn has 2 “moves” – the capture left and the capture right. This means that each non-edge pawn is counted as having two moves, while each edge pawn is counted as having one move. If that seems weird, just wait for this: I counted the white Bishop’s mobility as 2. The first “move” is defending the white pawn. The second “move” is that the Bishop supports the white pawn’s right capture.

This may strike you as an odd way to count mobility, but I believe this is a better way to measure control of the board.

## Statistical Analysis

I removed all positions in which there were 14 or more pawns, to avoid positions in standard openings. Next, I split the positions into 2 groups. The END group included all positions with 6 non-pawn pieces or fewer. The START group included all positions with 8 non-pawn pieces or more.

I then analyzed these with logistic regression to determine how strongly different positional features correlated with victory. For the statistical analysis, I required a significant level of 0.1%. It is very important that I note here that correlation does not imply causation.

The full results are at the end. Here are the material and mobility coefficients:

## Material and Mobility

 Feature Start End Pawn 0.78 0.86 Knight 2.38 2.60 Bishop 3.12 2.99 Rook 4.17 4.72 Queen 8.85 9.41 Bishop Pair 0.43 0.67 Rook Pair -0.36 -0.24 Pawn Mobility 0.11 0.07 Knight Mobility 0.10 0.03 Rook Mobility 0.07 0.02

I’ve adjusted the results so that each factor’s weight can be measured in non-edge pawns. Its important to note that for positions near the end, pretty much every factor becomes better at predicting the outcome, however because I’m measuring importance in pawns, this fact is obscured in this table of results.

The first thing that stood out to me was that the Knight was a mere 2.38 points in the beginning, while the Bishop was three quarters of a pawn higher. This can be explained by the importance of a Knight’s mobility. A knight with 8 moves (the maximum possible) is worth 3.18 points, just above a typical Bishop.

The Rook’s mere 4.17 points is similarly misleading. With 15 legal moves (the maximum possible), a Rook moves up to 5.15 points in the beginning.

The second thing to note is that not only is having a pair of Bishops worth 0.43 points, having a pair of Rooks is worth -0.36 points. From this we can conclude that

1. A second Rook is somewhat redundant.
2. It’s not just that a second Bishop isn’t redundant – a second bishop adds additional power. This is probably because once I take out my opponent’s white Bishop, I can position my pieces on black squares and vice-versa.

## Pawn Structure

 Feature Start End Doubled Pawn -0.17 -0.23 Pawn Island -0.11 -0.06 Passed 4 0.00 0.14 Passed 5 0.22 0.00 Passed 6 0.55 0.70 Passed 7 1.05 0.99 Outpost 4 0.18 0.33 Outpost 5 0.22 0.13 Outpost 6 0.53 0.19

For “pawn islands”, I counted the number of empty columns with pawns on either side. I think most of this is self-explanatory. For outposts, I considered only knights. The numbers next to the passed and outpost coefficients indicate the rank on which either the passed pawn or the outpost are on.

## King Safety

 Feature Start End Pawns in Front of King 0.11 -0.03 King Threat -0.07 0.06

I considered a pawn “in front of the king” if it was either directly ahead of the king or diagonally ahead of the king. Thus, this had a max value of 3.

For counting “king threats”, I counted the number of moves (see my definition of “mobility”) that hit squares in the 3x3 grid surrounding the king. I did not count checks.

## Summary

 Feature Start End White Elo 0.01 0.00 Black Elo -0.01 0.00 Pawn 0.78 0.86 Knight 2.38 2.60 Bishop 3.12 2.99 Rook 4.17 4.72 Queen 8.85 9.41 Bishop Pair 0.43 0.67 Rook Pair -0.36 -0.24 Doubled Pawn -0.17 -0.23 Pawn Island -0.11 -0.06 Pawns in Front of King 0.11 -0.03 King Threat -0.07 0.06 Pawn Mobility 0.11 0.07 Knight Mobility 0.10 0.03 Rook Mobility 0.07 0.02 Passed 4 0.00 0.14 Passed 5 0.22 0.00 Passed 6 0.55 0.70 Passed 7 1.05 0.99 Outpost 4 0.18 0.33 Outpost 5 0.22 0.13 Outpost 6 0.53 0.19