Pages

Thursday, September 10, 2009

TC TCO05 Round 2 D1 500 (PartyGame)

Category: Dynamic Programming, Probability
URL: http://www.topcoder.com/stat?c=problem_statement&pm=4635

The problem asks us to compute the expected number of moves required to reach from left-most side of a row to the right-most side. The row consists of a number of R's and G's which control what actions we can take. We are also given a N-sided dice which allows us to randomly (uniformly) go 1 to N places to the right. If we land on an R we have to go back to the position we rolled from (i.e. no change in position, but a penalty of a move is applied). If we land on a G cell then all is good and we are allowed to keep our new position. If it's impossible to move from the left-most cell to the right-most cell then return -1. Also note that you can "over-hit" the target, anything which takes us beyond the last rightmost cell is considered a win.

The first thing to come into mind is dynamic programming. Why? Given a particular cell the expected number of moves does not change and is only dependent on the cells which come to the right of it. Another hint, is that there are overlapping subproblems as a result - since the number of possible paths is exponential it only makes sense to not re-calculate the same cells over and over. To apply DP to the problem we have already noted that it is sufficient to represent the expected number of cells by itself, this forms a 1-dimensional DP problem. The order in which we calculate should be obvious - we need to know all cells to the right of a particular cell, so we need to calculate from rightmost to leftmost cell.

Now defining the actual recurrence relation is more tricky, given a set of transitions from a cell some of them re-direct them back to our cell - how do we handle this case? Let's start off by defining f(x) as the expected number of moves for cell x. Now since the probability transitions are equal, i.e. 1 / N. If the next transition we land on for a given jump is 'G' then the expected number of moves grows by (1 / N) * (f(y) + 1) where y is the next cell. If the next transition we land on is 'R' then we have to go back to our current position, so the expected number of moves actually grows by (1 / N) * (f(x) + 1), where x is our current cell. Using an addition of all these expected moves defines our f(x). Hence we are left with something of the form:

f(x) = (1 / N) * (f(x) + 1) + (1 / N) * (f(x) + 1) + ... + (1 / N) * (f(x) + 1) + (1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...

Where the first part are all the cell transitions which lead them back to the current cell. The second part is the "good" transitions, in the example above these corresponds to the cells y and z. Now we can combine the first part together:

f(x) = (a / N) * (f(x) + 1) + (1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...

Where there are "a" number of cell transitions which lead it back to the current cell (i.e. 'R' cells). We need to formulate it in the form of f(x) = ... where the right hand side does not contain any f(x) terms. This requires a basic mathematical manipulation:

(1 - (a / N)) * f(x) = (1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...
f(x) = ((1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...) / (1 - (a / N))

This let's us calculate the expected number of moves including transitions which lead us back to our current cell. Hence combining all the concepts above, we come up with the following DP implementation:

class PartyGame {
public:
   double numberOfMoves(vector <string>, int);
};

double dp[2511];

double PartyGame::numberOfMoves(vector <string> field, int N) {
   string totalField;
   for (int i = 0; i < field.size(); i++) {
      totalField += field[i];
   }
   double res = 0.0;
   dp[totalField.size()-1] = totalField[totalField.size()-1] == 'R' ? 0.0 : 1.0;
   for (int i = totalField.size()-2; i >= 0; i--) {
      if (totalField[i] == 'R') continue;
      dp[i] = 0.0;
      int totBad = 0;
      double moveGood = 0.0;
      for (int j = 1; j <= N; j++) {
         if (i + j >= totalField.size()-1) {
            // instant win
            moveGood += (1.0 / N);
            continue;
         }
         if (totalField[i+j] == 'R') {
            // bad move (return to self)
            totBad++;
         } else {
            moveGood += (1.0 / N) * (dp[i+j] + 1);
         }
      }
      // solve equation
      dp[i] = (((double) totBad / N) + moveGood) / (1 - ((double)totBad / N));
   }
   
   return res = dp[0] > 1e50 ? -1.0 : dp[0];
}

No comments:

Post a Comment