Pages

Monday, September 14, 2009

Google Code Jam Round 1C

Overview:
This was the last opportunity for qualifiers to advance into Round 2. Congratulations to everyone who has made it to Round 2! Bad luck to those that didn't, there's always next year!

This round was greeted with a set of interesting problems. Problem A was the easy problem in the set, and most competitors were able to solve the small input with some incorrect submissions for the large case. Problem B proved rather interesting and had multiple correct approaches. One of which required a little mathematical manipulation to derive a short and safe implementation, the other had the danger of precision issues but was based on a simpler concept without the use of much mathematics. Problem C was a fairly subtle DP problem in disguise but this did not deter the top competitors to solve all three problems in under 40 minutes.

Links:
Google Code Jam Statistics: http://www.go-hero.net/jam/
Google Code Jam Scoreboard: http://code.google.com/codejam/contest/scoreboard?c=189252
Google Code Jam Unofficial Scoreboard/Online Source Viewer (Thanks to Zibada on TC): http://zibada.ru/gcj/2009/round1a.html, http://zibada.ru/gcj/2009/round1b.html, http://zibada.ru/gcj/2009/round1c.html

Problem A: All Your Base
Category: Greedy
URL: http://code.google.com/codejam/contest/dashboard?c=189252#s=p0

The problem asks us to determine the minimum possible number we can derive from an unknown number representation system. To do this we must assign values to each of the characters present in the symbols. For example let's say we are given "cats" as an input we need to determine what the 'c' means numerically, what the 'a' means numerically etc. Since we want the smallest possible number, it suggests a greedy strategy of assigning what each letter means. In our case, we first see 'c' and therefore wish for this to be as small as possible. However, there is an additional constraint that no number starts with a 0 and since 'c' is the first symbol it cannot be 0. Therefore, the next obvious choice is to give it a value of 1. Next, we process 'a' and assign it a value of 0 since it hasn't been used yet and it's not the first symbol. Following this, 't' gets assigned 2 and 's' gets assigned 3.

Now the next task is to determine the base system the aliens work on, it can be easily shown we want the base as small as possible - yet large enough to allow us to use all the symbols. Therefore, the base system is simply the number of unique characters/letters in the alien number. We are then tasked with converting this to "human" form (i.e. decimal representation) which can be done iteratively like any base conversion algorithm. Implementation wise, we can keep track of our character assignments in a map data structure. When we encounter a letter, we first check whether or not we have seen the letter - if we haven't then we greedily assign it the next available (and smallest) number. If we have, then we simply skip over the character.

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>
#include <queue>
#include <map>

using namespace std;

map<char,int> memo;

int main() {
   int nT; cin >> nT;
   string str;
   int k = 1;
   while (nT-- && cin >> str) {
      memo.clear();
      int next = 0;
      memo[str[0]] = 1;
      for (int i = 1; i < str.size(); i++) {
         if (memo.count(str[i]) == 0) {
            memo[str[i]] = next;
            next++;
            if (next == 1) next++;
         }
      }
      long long val = 0;
      int base = next == 0 ? 2 : next;
      for (int i = 0; i < str.size(); i++) {
         val = val * base + memo[str[i]];
      }
      cout << "Case #" << k << ": " << val << "\n";
      k++;
   }
   return 0;
}

Problem B: Centre of Mass
Category: Maths, Ternary Search
URL: http://code.google.com/codejam/contest/dashboard?c=189252#s=p1

This is a tricky problem to tackle due to precision issues, so it's safer to go for a closed-form solution. Basically, we are given N fireflies each starting off in different positions and each going their own directions with a constant velocity. We need to calculate the distance and time in which the centre of mass of the N fireflies are closest to the origin (0,0,0). We need to make one key observation to heavily simplify this problem. If we look closely, we can collapse all of the N fireflies into one single entity (this is heavily suggested by the notes section in the actual problem statement). If we only have a single point (i.e. the centre of mass) we simply need to find the point in time/distance in which it comes closest to the origin. With some visualisation we can see that the centre of mass movement is divided into two cases:

- There is no movement hence the earliest time and closest distance it is to the origin is at time = 0.
- There is movement (i.e. velocity) then the earliest time and closest distance is unique (as there would be a point in which it gets closer and closer and after the minimum distance it gets larger and larger).

With these two conditions, we have multiple approaches to solve the problem. The first approach is to use a ternary search over the set of possible times, as there is only one minima for the movement case - we are guaranteed that the algorithm will converge to the spot eventually. Ternary search works on functions which have only one extrema that is either strictly increasing then strictly decreasing or the other way around. For more information on this search technique look at: http://en.wikipedia.org/wiki/Ternary_search. The second approach is to differentiate the distance equation - as we know there is only 1 solution we can confidently differentiate and solve for d/dt = 0 to obtain the unique solution. We will now proceed to demonstrate this approach.

We first need to collapse the N fireflies into the centre of mass vector with respect to time. This can be done by a straight summation over the coefficients of the starting positions and velocities. Let S = (Sx, Sy, Sz) be the summation of the starting positions of the N fireflies (this is a straight application of the formula given in the notes section). Let D = (Dx, Dy, Dz) be the summation of the velocities of the N fireflies (similar argument applies to the velocity). Then we yield a distance (with respect to time T) equation of:

D = sqrt((Sx + Dx * T)^2 + (Sy + Dy * T)^2 + (Sz + Dz * T)^2)

We can ignore the sqrt for differentiation purposes as it will get cancelled out later when it's multiplied by 0 (another reason is the function is strictly increasing, and as a result has no effect on the location of the extrema):

D = (Sx + Dx * T)^2 + (Sy + Dy * T)^2 + (Sz + Dz * T)^2

Now let's differentiate (using the Chain Rule):

dD/dT = 2*(Sx + Dx * T)*(Dx) + 2*(Sy + Dy * T)*(Dy) + 2*(Sz + Dz * T)*(Dz)

Let dD/dT = 0,

0 = 2*(Sx + Dx * T)*(Dx) + 2*(Sy + Dy * T)*(Dy) + 2*(Sz + Dz * T)*(Dz)
0 = 2*((Sx + Dx * T)*Dx + (Sy + Dy * T)*Dy + (Sz + Dz * T)*Dz)
0 = (Sx + Dx * T)*Dx + (Sy + Dy * T)*Dy + (Sz + Dz * T)*Dz

Expanding:

0 = Sx*Dx + Dx*Dx*T + Sy*Dy + Dy*Dy*T + Sz*Dz + Dz*Dz*T

Pushing all the T terms into the LHS:

-Dx*Dx*T - Dy*Dy*T - Dz*Dz*T = Sx*Dx + Sy*Dy + Sz*Dz
T(-Dx*Dx - Dy*Dy - Dz*Dz) = Sx*Dx + Sy*Dy + Sz*Dz

Therefore T is equal to:

T = Sx*Dx + Sy*Dy + Sz*Dz / (-Dx*Dx - Dy*Dy - Dz*Dz)

Now we need to be careful with dividing by 0, if (-Dx*Dx - Dy*Dy - Dz*Dz) is 0 this means that there is no movement (i.e. our first condition). We can safely make a separate case for this as the closest time is at t=0. Using these concepts we just need to do it which is pretty simple:

#include <iostream>
#include <string>
#include <algorithm>
#include <cmath>

using namespace std;

int main() {
   int T; cin >> T;
   for (int tt = 0; tt < T; tt++) {
      int N; cin >> N;
      double dX = 0.0, dY = 0.0, dZ = 0.0;
      double sX = 0.0, sY = 0.0, sZ = 0.0;
      // summation of the different components
      for (int i = 0; i < N; i++) {
         double x, y, z, tx, ty, tz;
         cin >> x >> y >> z >> tx >> ty >> tz;
         dX += tx; dY += ty; dZ += tz;
         sX += x; sY += y; sZ += z;
      }
      dX /= N; dY /= N; dZ /= N;
      sX /= N; sY /= N; sZ /= N;
      int before = 0;
      double t;
      // check for division by zero case
      if (dX * dX + dY * dY + dZ * dZ < 1e-09) {
         before = 1;
      } else {
         t = -(sX*dX + sY*dY + sZ*dZ) / (dX*dX + dY*dY + dZ*dZ);
      }
      double dist = 0;
      // time can't be negative, if it is - then the optimal time is on t = 0
      if (t < 0 || before) {
         t = 0;
         dist = sqrt(sX*sX + sY*sY + sZ*sZ);
      } else {
         dist = sqrt((sX+dX*t)*(sX+dX*t)+(sY+dY*t)*(sY+dY*t)+(sZ+dZ*t)*(sZ+dZ*t));
      }
      cout << "Case #" << (tt+1) << ": ";
      cout << dist << " " << t << "\n";
   }
   return 0;
}

Problem C: Bribe the Prisoners
Category: Dynamic Programming
URL: http://code.google.com/codejam/contest/dashboard?c=189252#s=p2

For the final problem, we are given a set of prison cells (from 1 to P) and inside each of these prison cells is one prisoner. Prisoners communicates with its neighbours if the cell is adjacent and there is at least one prisoner inside. As a prisoner is released, the neighbours find out (if any) and this effect gets echoed in both directions until it hits the end of the prison (i.e. left or right wall) or when there is no one in the neighbouring prison (because they were released earlier). We are then tasked to bribe each prisoner who hears about a prisoner getting released 1 gold coin to prevent them from destroying their cell. We need to identify the optimal sequence of prisoners to release as to minimise the number of gold coins spent.

The obvious way is to brute force all possible ordering of prisoners and compute the minimum gold coins spent. This works easily for the small case as there are only 5 prisoners awaiting to be released at the worst case. This obviously does not work for the large case as there can be as many as 100 prisoners to be released. To optimise this we turn to dynamic programming. Seeing there can be as many as 100 prisoners to be released we can't keep a bitmask of which prisoners we have released as it will certainly TLE and/or exceed memory limit (on all current systems). The key observation is to divide the prison into sub-prisons. This can be seen by noting that the ends of the prison cells act as barriers/walls in which the word does not go through. If we release prisoners from cells - the cells we have just freed implicitly acts like the ends of our original prison.

In addition, we need to make a small optimisation as the prisons can be as large as 10,000 in the large case. We need to note that we will never free from a cell which is not in the list of prisoners to be freed. Hence instead of keeping track of the index of the prison cells itself, we keep track of the index of the prisoners to be freed. A simple sort suffices to keep the implementation neat. Hence the space complexity is only O(Q*Q) and since Q is at most only 100 - this is pretty much nothing.

So we define the DP state as:

D(n,m) = the minimum number of gold spent to free prisoners between index n and m of the prisoners to be freed.

Our recurrence relation is then simply defined as:

D(n,m) = min { Cost(i) + D(n,i) + D(i,m) } where n < i < m. (free prisoner i)
D(n,m) = 0 if m - n < 2 (base case)

The base case is for situations where there are no prisoners to be freed between n and m. Cost(i) is defined as the cost to free prisoner index i given that the empty prison to the left is n and the empty prison to the right is m. This can be calculated in constant time as we know which locations index n and m corresponds to. There is a small trick for implementing this - we add explicit left and right barriers to our original prison to make it conform to our recursion. This way we don't need to handle for the original prison special case which would otherwise be unbounded.

Implementing this should be simple with either memoisation (top-down) or a bottom-up approach:

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>
#include <queue>
#include <map>

using namespace std;

int memo[111][111];

vector<int> data;

int func(int leftIdx, int rightIdx) {
   // base case
   if (rightIdx - leftIdx < 2) {
      return 0;
   }
   if (memo[leftIdx][rightIdx] != -1) return memo[leftIdx][rightIdx];
   int res = 1<<30;
   for (int i = leftIdx+1; i < rightIdx; i++) {
      // split on prisoner index i
      int calc = (data[i] - data[leftIdx] - 1) + (data[rightIdx] - data[i] - 1);
      calc = calc + func(leftIdx, i) + func(i, rightIdx);
      res = min(res,calc);
   }
   return memo[leftIdx][rightIdx] = res;
}

int main() {
   int N;
   cin >> N;
   for (int t = 0; t < N; t++) {
      int P, Q; cin >> P >> Q;
      vector<int> pris;
      for (int i = 0; i < Q; i++) {
         int val; cin >> val; pris.push_back(val);
      }
      // add explicit barriers to the ends of the prisons
      pris.push_back(0);
      pris.push_back(P+1);
      // ensure that the prisoner indexes are strictly increasing
      sort(pris.begin(),pris.end());
      data = pris;
      memset(memo,-1,sizeof(memo));
      int best = func(0, data.size()-1);
      cout << "Case #" << t+1 << ": " << best << "\n";
   }
   return 0;
}

Friday, September 11, 2009

TC TCO08 Round 2 D1 500 (CreaturesTraining)

Category: Dynamic Programming
URL: http://www.topcoder.com/stat?c=problem_statement&pm=8570

This problem asks us to upgrade a set of creatures to maximise the final score after D days. The score is defined as the sum of the power level of the creatures. This is a tricky problem to solve efficiently. The key observation to make is that you never downgrade creatures and you can also skip creatures. Therefore this suggests that we can go from the leftside to rightside in a linear fashion. We can do a rough sketch of the recursive state as:

D(idx, x) = the maximum number of points we can get when we are processing level idx of the creatures and we have x days left in credit to upgrade our creatures.

When we try to formulate our recurrence relation we run into a problem, the state does not fully describe the problem state as it does not keep track of which creatures we have upgraded so far. So we begin to re-formulate our DP state by incorporating the set of monsters we have upgraded so far, i.e. the modifications we have made to the count array. One obvious yet inefficient way is to keep track of the count array as part of the DP state. However, this is too large of a memory footprint and as a result causes our solution to become inefficient. However, it is a good starting point as we can re-use our observation before that we never look back at a previous level. With this in mind, we can simply just keep track of the current extra creatures we have upgraded so far and optimally decide how many of these to keep in each level. Therefore, our DP state becomes:

D(idx, x, carry) = the maximum number of points we can get when we are processing level idx of the creatures and we have x days left in credit to upgrade our creatures. Given that we are also carrying forth carry number of monsters from the previous level.

In each state we can choose between 0 to count[idx] + carry monsters to upgrade. Each of these are valid transitions/upgrades provided we have enough day credit to upgrade them. We need to check whether or not this state space fits within the memory constraints, as there are a maximum of 50 monster levels and 100 days and carries (as we cannot carry more than 100 as this will exceed the upper limit of day credits). So we are faced with a memory footprint of 50 * 100 * 100 = 500,000 64-bit integers which easily fits within the limit.

Lastly, we consider the base/terminating case, this is when we reach the last level of the creatures. We implicitly prune any transitions where we exceed the amount of remaining day credit, so we do not need to consider this in our terminating case.

Implementing the ideas above is rather simple and short:

class CreatureTraining {
public:
   long long maximumPower(vector <int>, vector <int>, int);
};

long long dp[51][111][111];
vector<int> _count;
vector<int> _power;

long long func(int idx, int day, int carry) {
   if (idx >= _count.size()) return dp[idx][day][carry] = 0;
   if (dp[idx][day][carry] != -1) return dp[idx][day][carry];
   long long res = 0;
   int mx = carry + _count[idx];
   for (int i = 0; i <= mx; i++) {
      // move a set of 0 -> mx monsters up to the next level
      // leave mx - i behind
      if (day - i >= 0) {
         res >?= func(idx+1, day - i, i) + (mx-i) * (long long)_power[idx];
      } else {
         break;
      }
   }
   return dp[idx][day][carry] = res;
}

long long CreatureTraining::maximumPower(vector <int> count, vector <int> power, 
      int D) {
   long long res = 0;
   memset(dp,-1,sizeof(dp));
   limitD = D; _count = count; _power = power;
   return res = func(0, D, 0);
}

TC TCO08 Qualification Round 1 D1 600 (PrimeSums)

Category: Dynamic Programming
URL: http://www.topcoder.com/stat?c=problem_statement&pm=8596

This problem continues with our DP-themed run of recent problems. It asks us to compute the number of subsets (including the empty subset) in which the total bag weight is a prime number. The constraints give us an upper bound of 500,000 (50 * 10,000) for the maximum prime. So the first task is to compute all the prime numbers under this number, this is an easy and efficient task if you know the Sieve of Eratosthenes. The next step is determining how many ways we can make a sub-bag of a weight W, we can use this in our last step by adding up all the weights W which are prime to yield our final answer. Before that though, we need to determine the number of sub-bags which totals a weight of W.

We notice that there are overlapping subproblems which screams to us: dynamic programming! To see this, imagine we have a bag of {1, 1, 2, 7}. To compute the number which weights are possible for all subsets: we start off with the first element. We can choose to use this element and place it into our bag, or we can reject this element and not put it into our bag. Either way, we have {1, 2, 7} as a sub-problem however if we don't do any caching or DP we will compute this sub-problem twice. This effect carries on from the first element to the n-th element and the number of sub-problems we are forced to compute grows exponentially. So we can simply memoise our solution, or can we?

Looking at the constraints we need to keep track of our current index inside the bag (so we know which element we are considering) and also the current weight of the bag (which has an upper bound of 500,000). Since the number of elements is 50 are the worst case, using a memoisation table of 50 * 500,000 64-bit integers is too large to fit into the memory limit. There must be another way to optimise our space complexity. As with all DP problems, we only need to keep sub-solutions to as much as our recurrence relation needs them. So let's define our recurrence relation in hopes of a good optimisation:

DP State: F(n,w) = the number of sub-bags we can construct using a set of 1..n bags (1-based index) that sums to exactly a weight of w.
DP Recurrence: F(n,w) = F(n-1,w - b[n]) + F(n-1,w)

The recurrence comes from the fact that we are faced with 2 decisions at a particular state, to take the item and place it into the bag (the first recursive call) and to skip the item and retain our weight (the second recursive call). As you can see from the depth of the recurrence relation is only at most 1 (i.e. we only need to keep track of the n-1'th pre-computed values), then we can reduce our memoisation table to 2 * 500,000 64-bit integers and use modulus arithmetic to implicitly swap the rows around. See the DP: State Compression article for more information on this technique.

As usual, we also need to define a base case. This is pretty simple, we are allowed to have an empty subset so naturally there is 1 way to have a weight of 0 without using any elements in the bag. Hence F(0,0) = 1. However, we encounter a major problem when we try and implement the solution. The fact there are non-unique elements in the bag can cause our dynamic programming algorithm to double-count the same sub-bag. For example, consider the bag {1, 1, 2, 7} if we skip the first bag and decide to use all the other bags we yield {1, 2, 7} with a weight of 10. However, we can also choose to use the first bag and skip the second bag to yield the same sub-bag {1, 2, 7} also with a weight of 10. Our current algorithm does not take this into account and hence will produce incorrect results for certain inputs.

A neat way to tackle this problem is to condense the same weights into one bag and keep track of how many elements we have of that weight. In the example used previously, we can make a mapping of { {1,2}, {2,1}, {7,1} } which has a pair in each element denoting the weight of the element and how many times it is inside our bag. Then we apply the same DP algorithm to this modified mapping and introduce additional decisions to include k = 1 to n of each element, where n is defined as the number of times it is in our bag. So we can choose to use 0, 1 or 2 elements of weight 1 but we are only allowed to choose 0 or 1 elements of either weight 2 and 7. Note that this does not cause us to double count as we only allow a transition of 2 elements of weight 1 once in our algorithm.

Lastly, we just need to add up our DP table for all the prime weights which is pretty self-explanatory.

class PrimeSums {
public:
   long long getCount(vector <int>);
};

int prime[500011];
long long dp[2][500011];

long long PrimeSums::getCount(vector <int> bag) {
   long long res = 0;
   sort(bag.begin(),bag.end());
   // sieve to generate the primes
   memset(prime,1,sizeof(prime));
   prime[0] = prime[1] = 0;
   for (int i = 2; i * i <= 500000; i++) {
      if (prime[i]) {
         for (int j = i * i; j <= 500000; j += i) {
            prime[j] = 0;
         }
      }
   }
   // merge identical bag elements together
   map<int,int> mappings;
   for (int i = 0; i < bag.size(); i++) mappings[bag[i]]++;
   
   int kth = 1;
   
   // base case
   dp[0][0] = 1;
   
   for (map<int,int>::iterator it = mappings.begin(); it != mappings.end(); it++, 
   kth++) {
      for (int i = 0; i <= 500000; i++) {
         dp[kth&1][i] = dp[!(kth&1)][i];
         // loop through possible weight arrangements based on the available set
         for (int j = 1; j <= it->second; j++) {
            if (i-j*it->first < 0) break;
            dp[kth&1][i] += dp[!(kth&1)][i-j*it->first];
         }
      }
   }
   // sum up all the number of sub-bags which have a prime weight
   for (int j = 0; j <= 500000; j++) {
      if (prime[j]) res += dp[!(kth&1)][j];
   }
   return res;
}

Thursday, September 10, 2009

TC TCO06 Round 1 D1 350 (SeparateConnections)

Category: Dynamic Programming, Matching in a General Graph
URL: http://www.topcoder.com/stat?c=problem_statement&pm=6095

This problem just asks us to find the maximum number of matches in a general graph (should be a straightforward reduction given the problem constraints). The most efficient way to solve this problem is to use a maximum matching algorithm (like Edmond's). However, this is not straightforward to code and certainly not required for a 350 pointer problem. Given the constraints, one easy way to implement a maximum matching algorithm is to use dynamic programming. We will be using a bitmask to efficiently keep track of the nodes that have already been matched so far.

For those new to bitmasks, here is a very short tutorial on it. As you may know, all decimal integer numbers (base 10) have a binary representation form (base 2). As a result, each bit inside a binary representation can represent up to 1-bit of information (duh). Besides representing numbers we can use each of these bits (0 or 1) as useful book-keeping information - in our case (and in many cases) we use them to mark whether or not we have visited or used a particular resource. So for example, the bit representation 01100 could mean we have used resource 2 and 3 (0-based index). Note that we go from left to right due to the fact that the left-most bit is also the most significant bit (MSB). This bit representation (01100) is simply known as 2^3 + 2^2 = 14 in decimal. Expanding on this, let's say in our problem we have used node 1, node 4 and node 5. This is represented as 2^5 + 2^4 + 2^1 = 32 + 16 + 2 = 50. There are several important bit operations we can use to manipulate and query the bits:

For starters, to get a '1' in the n-th position from the right (remember 0-based index) - you use the shift left operator '<<'. For example, 1 << 5 yields a bit representation equivalent to 0010 0000 (32).

To set a bit, we use the bitwise-OR operator denoted as '|'. The bitwise-OR operator works by applying the OR operation to each of the corresponding bits between two numbers. The OR operation is defined as:

0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1

For example to set the 5th bit (again 0-index) from the right, we first use the shift left operator to yield the correct number (i.e. 1 << 5). Let's say the bitmask before the operation is defined as 0100 0010 (68). Then applying 1 << 5 which has a representation of 0010 0000 (32) yields:

0100 0010 (68) OR 0010 0000 (32) Equals 0110 0010 (100)

To test whether a bit is set within a bitmask, we make use of the bitwise-AND operator denoted as '&'. The AND operation is defined as:

0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1

We use a similar method as we did for setting a bit in a particular position, let's say we wanted to test if the 5th bit was set. We use the shift left operator to yield the correct number (i.e. 1 << 5). Then we simply apply the bitwise-AND operator to the bitmask we wanted to test. If our mask was 0110 0010 then it would return 0010 0000. However if our mask was 0100 0010, then it would return 0000 0000 as it has no bits both turned on at the same position. Thus, we can use this to test whether or not a bitmask contains a bit in a particular position, if it the result is positive after the AND operation then it is inside the bitmask, if it's zero then it isn't inside the bitmask.

There are other bitmasking techniques which won't be discussed here as they aren't used for this problem.

Back to the problem, given that there are only 18 nodes at maximum. We can easily create a bitmask that keeps track of which nodes we have currently used up (there are 2^18 = 262144 different combinations). So for the actual DP state, we can iteratively introduce more nodes to the set and choose any adjacent nodes which haven't been used so far. Therefore, the state is D(n,m) where n is the index of which node we are processing and m is the bitmask of which nodes we have already used up. Next, we need to define the recurrence relation as well as any base cases. The recurrence relation can be found by selecting from a set of valid decisions at a particular node.

The two available options are to:
- Not used the node in the maximum matching (in hopes that it increases the available nodes later on)
- Use the node along with another node which is adjacent to it (increases the number of matchings by 1 and decreases the available set)

We define the recurrence as: D(n,m) = max { 1 + D(n+1, m | (1 << k), D(n+1, m) } where k is the set of all unused nodes which are adjacent to n. Take note, we do not consider matching node n, if it's already been matched by the time we reach index n. Then applying a simple memoization we yield a working solution:

class SeparateConnections {
public:
   int howMany(vector <string>);
};

int adjmat[19][19];
int memo[19][1<<19];
int sz;

int func(int idx, int mask) {
   if (idx >= sz) return 0;
   if (memo[idx][mask] != -1) return memo[idx][mask];
   int res = 0;
   if (!(mask & (1 << idx))) {
      for (int i = idx+1; i < sz; i++) {
         if (mask & (1 << i)) continue;
         if (!adjmat[idx][i]) continue;
         res >?= 1 + func(idx+1, mask | (1 << i));
      }
   }
   res >?= func(idx+1, mask);
   return memo[idx][mask] = res;
}

int SeparateConnections::howMany(vector <string> mat) {
   int res = 0;
   for (int i = 0; i < mat.size(); i++) {
      for (int j = 0; j < mat.size(); j++) {
         if (mat[i][j] == 'Y') adjmat[i][j] = adjmat[j][i] = 1;
      }
   }
   sz = mat.size();
   memset(memo,-1,sizeof(memo));
   return res = func(0,0) * 2;
}

However, one can also use a general matching algorithm like Edmond's Blossoming/Shrinking Algorithm to solve the problem. See below for such an implementation (based on Pape and Conradt variation of Edmond's which is incorrect for a certain set of inputs, but it still passed system tests for this problem). No further explanation would be provided for this algorithm as I'll leave this for a later blog entry in which we build up our matching algorithms.

#define MAX_VERTICES 21

int adjmat[MAX_VERTICES+1][MAX_VERTICES+1];
int table[MAX_VERTICES+1][MAX_VERTICES+1];
int mate[MAX_VERTICES+1];
int level[MAX_VERTICES+1];
int grandparent[MAX_VERTICES+1];

bool checkAncestor(int currentNode, int checkNode) {
       int v = grandparent[currentNode];
       if (v < 0) return false;   // not an ancestor
       if (v == checkNode) return true; // ancestor
       return checkAncestor(v, checkNode);
}

int augment(int curNode) {
       // BFS an augmenting path
       memset(level,1,sizeof(level));
       memset(grandparent,-1,sizeof(grandparent));
       queue<int> q;
       q.push(curNode);
       level[curNode] = 0;
       while (!q.empty()) {
               int x = q.front(); q.pop();
               for (int y = 1; y <= MAX_VERTICES; y++) {
                  // not part of an edge in G or seen the odd-vertex in T
                  if (!adjmat[x][y] || !level[y]) continue;   
                  if (mate[y] == 0) {
                     // found an augmenting path
                     int next = 0;
                     while (true) {
                        mate[y] = x;
                        next = mate[x];
                        mate[x] = y;
                        if (grandparent[x] < 0) break;
                        x = grandparent[x];
                        mate[next] = x;
                        y = next;
                     }
                     return 1;
                 } else if (mate[y] != 0 && !checkAncestor(x,y) && 
                         mate[y] != x) {
                     // push the node into the alternating tree T
                     q.push(mate[y]);
                     grandparent[mate[y]] = x;
                     level[y] = 0;
                 }
               }
       }
       return 0;
}

int maxMatching() {
   int res = 0;
   memset(mate,0,sizeof(mate));    // all currently unmatched
   memset(grandparent,-1,sizeof(grandparent));
   // start at the vertex i if it's single
   while (true) {
      int v = 0;
      for (int i = 1; i < MAX_VERTICES; i++) {
         if (!mate[i]) v += augment(i);
      }
      if (v == 0) break;
      res += v;
   }
   return res;
}

int SeparateConnections::howMany(vector <string> mat) {
   for (int i = 0; i < mat.size(); i++) {
      for (int j = 0; j < mat.size(); j++) {
         if (mat[i][j] == 'Y') adjmat[i+1][j+1] = adjmat[j+1][i+1] = 1;
      }
   }
   return maxMatching() * 2;
}

TC TCO05 Round 2 D1 500 (PartyGame)

Category: Dynamic Programming, Probability
URL: http://www.topcoder.com/stat?c=problem_statement&pm=4635

The problem asks us to compute the expected number of moves required to reach from left-most side of a row to the right-most side. The row consists of a number of R's and G's which control what actions we can take. We are also given a N-sided dice which allows us to randomly (uniformly) go 1 to N places to the right. If we land on an R we have to go back to the position we rolled from (i.e. no change in position, but a penalty of a move is applied). If we land on a G cell then all is good and we are allowed to keep our new position. If it's impossible to move from the left-most cell to the right-most cell then return -1. Also note that you can "over-hit" the target, anything which takes us beyond the last rightmost cell is considered a win.

The first thing to come into mind is dynamic programming. Why? Given a particular cell the expected number of moves does not change and is only dependent on the cells which come to the right of it. Another hint, is that there are overlapping subproblems as a result - since the number of possible paths is exponential it only makes sense to not re-calculate the same cells over and over. To apply DP to the problem we have already noted that it is sufficient to represent the expected number of cells by itself, this forms a 1-dimensional DP problem. The order in which we calculate should be obvious - we need to know all cells to the right of a particular cell, so we need to calculate from rightmost to leftmost cell.

Now defining the actual recurrence relation is more tricky, given a set of transitions from a cell some of them re-direct them back to our cell - how do we handle this case? Let's start off by defining f(x) as the expected number of moves for cell x. Now since the probability transitions are equal, i.e. 1 / N. If the next transition we land on for a given jump is 'G' then the expected number of moves grows by (1 / N) * (f(y) + 1) where y is the next cell. If the next transition we land on is 'R' then we have to go back to our current position, so the expected number of moves actually grows by (1 / N) * (f(x) + 1), where x is our current cell. Using an addition of all these expected moves defines our f(x). Hence we are left with something of the form:

f(x) = (1 / N) * (f(x) + 1) + (1 / N) * (f(x) + 1) + ... + (1 / N) * (f(x) + 1) + (1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...

Where the first part are all the cell transitions which lead them back to the current cell. The second part is the "good" transitions, in the example above these corresponds to the cells y and z. Now we can combine the first part together:

f(x) = (a / N) * (f(x) + 1) + (1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...

Where there are "a" number of cell transitions which lead it back to the current cell (i.e. 'R' cells). We need to formulate it in the form of f(x) = ... where the right hand side does not contain any f(x) terms. This requires a basic mathematical manipulation:

(1 - (a / N)) * f(x) = (1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...
f(x) = ((1 / N) * (f(y) + 1) + (1 / N) * (f(z) + 1) + ...) / (1 - (a / N))

This let's us calculate the expected number of moves including transitions which lead us back to our current cell. Hence combining all the concepts above, we come up with the following DP implementation:

class PartyGame {
public:
   double numberOfMoves(vector <string>, int);
};

double dp[2511];

double PartyGame::numberOfMoves(vector <string> field, int N) {
   string totalField;
   for (int i = 0; i < field.size(); i++) {
      totalField += field[i];
   }
   double res = 0.0;
   dp[totalField.size()-1] = totalField[totalField.size()-1] == 'R' ? 0.0 : 1.0;
   for (int i = totalField.size()-2; i >= 0; i--) {
      if (totalField[i] == 'R') continue;
      dp[i] = 0.0;
      int totBad = 0;
      double moveGood = 0.0;
      for (int j = 1; j <= N; j++) {
         if (i + j >= totalField.size()-1) {
            // instant win
            moveGood += (1.0 / N);
            continue;
         }
         if (totalField[i+j] == 'R') {
            // bad move (return to self)
            totBad++;
         } else {
            moveGood += (1.0 / N) * (dp[i+j] + 1);
         }
      }
      // solve equation
      dp[i] = (((double) totBad / N) + moveGood) / (1 - ((double)totBad / N));
   }
   
   return res = dp[0] > 1e50 ? -1.0 : dp[0];
}

TC TCO05 Round 1 D1 500 (FibonacciSum)

Category: Greedy, Dynamic Programming
URL: http://www.topcoder.com/stat?c=problem_statement&pm=4629

The problem asks us to compute the minimum number of Fibonacci numbers required to sum up to a given number n. If you are familiar with Zeckendorf's Theorem then you might be able to see a greedy strategy. Zeckendorf's Theorem states that every positive integer can be represented in a unique way as the sum of one or more distinct Fibonacci numbers such that it does not contain two consecutive Fibonacci numbers. A more obvious approach is to use dynamic programming, let D(n) denote the minimum number of Fibonacci terms to make up a sum of exactly n. Then we can define a recurrence relation such as: F(n) = min { 1 + F(n - T[k]) } where T[k] denotes the k-th Fibonacci term. Obviously F(n - T[k]) is only defined for n - T[k] >= 0. Additionally, the base case is defined as F(0) = 0, which acts as our stopping point as this means we have no remaining sum to make up for. Coding this is relatively simple:

class FibonacciSum {
public:
   int howMany(int);
};

int tab[101];
int memo[1000001];
int cnt;

#define INF (1 << 20)

int func(int n) {
   int res = INF;
   if (n == 0) return 0;
   if (memo[n] != -1) return memo[n];
   for (int i = cnt; i >= 0; i--) {
      if (n - tab[i] >= 0) res <?= 1 + func(n - tab[i]);
   }
   return memo[n] = res;
}

int FibonacciSum::howMany(int n) {
   tab[0] = tab[1] = 1;
   cnt = 2;
   memset(memo,-1,sizeof(memo));
   for (int i = 2; tab[i-1] + tab[i-2] < 1000000; i++, cnt++) {
      tab[i] = tab[i-1] + tab[i-2];
   }
   cnt--;
   return func(n);
}


We can also translate this into a bottom-up dynamic programming algorithm. To do this, we must consider the order of calculation - this is dictated by the recurrence relation we defined earlier. As this is a 1 dimensional DP problem, it's relatively simple: given an n value, we need to compute all values below n before considering n. This can be seen by the term inside the recursive call (n - T[k]), as T[k] is strictly positive for all k values we are guaranteed to only require all values below n to be calculated beforehand.

class FibonacciSum {
public:
   int howMany(int);
};

int dp[1000001];
int tab[101];

#define INF (1 << 20)

int FibonacciSum::howMany(int n) {
   tab[0] = tab[1] = 1;
   int cnt = 2;
   for (int i = 2; tab[i-1] + tab[i-2] < 1000000; i++, cnt++) {
      tab[i] = tab[i-1] + tab[i-2];
   }
   dp[0] = 0;
   for (int i = 1; i <= n; i++) {
      dp[i] = INF;
      for (int k = 1; k < cnt; k++) {
         if (i - tab[k] < 0) break;
         dp[i] = min(dp[i], 1 + dp[i - tab[k]]);
      }
   }
   return dp[n];
}

Friday, September 4, 2009

Google Code Jam 2009 Qualification Round

Overview:
The GCJ qualification round started smoothly until the input downloading system stopped working. This isn't crucial to the qualification round as participants can just move on to the next problem and submit all the problems at the same time when the system started working again. The number of competitors came close to almost 10,000 international participants - many of which were able to solve at least 1 problem. The problems itself required no specialised algorithmic or mathematical knowledge so they proved to be accessible to most participants. A score of 33 was required to progress to Round 1, so any combination of a correct small input and a large input solution was sufficient enough to qualify.

Links:
Google Code Jam Statistics: http://www.go-hero.net/jam/
Google Code Jam Scoreboard: http://code.google.com/codejam/contest/scoreboard?c=90101

Problem A: Alien Language
Category: Brute Force, String Manipulation
URL: http://code.google.com/codejam/contest/dashboard?c=90101#s=p0

The problem asks us to determine the number of dictionary words (which are guaranteed to be unique) such that it matches the given set of token inputs. There are several ways to solve this, one way is to convert the patterns into regular expressions and run through the dictionary list and compute whether or not the word matches the regex. Programming languages like Java or scripting languages like Perl or Python support this method natively. Converting the given pattern into regex is simple: convert the left brackets into '[' and convert the right brackets into ']'. For example, (ab)(bc)(ca) becomes [ab][bc][ca].

The alternative approach is to keep a set of dictionary words that are correct so far and keep reducing this set based on the constraints imposed by the patterns. For example, given the dictionary list in the sample input:

S1 = {abc, bca, cba, dac, dbc}

Given (ab)(bc)(ca) as the pattern, we start with the first token (ab). So we eliminate all strings/words in S1 which do not have 'a' or 'b' in the first index. This yields the second set:

S2 = {abc, bca}

We proceed to process the second token (bc). Again we eliminate all strings/words in S2 which do not have a 'b' or 'c' as the second letter/index. Since both words in S2 contain this, we don't eliminate any. We simply keep repeating this process until we reach the last index. The answer is simply the size of the set which remains.

// Google Code Jam 2009 - Qualification Round (Alien Language)

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>

using namespace std;

vector<string> dict;

// determine the number of dictionary words that match the given pattern
int calculate(const vector<string>& tokens, 
   const vector<string>& remain, int idx) {
   if (idx >= tokens.size()) return remain.size();
   if (remain.size() == 0) return 0;
   string t = tokens[idx];
   vector<string> next;
   for (int i = 0; i < remain.size(); i++) {
      // a valid word given the current idx comparison - 
      // append it for the next iteration
      if (t.find(remain[i][idx]) != string::npos) {
         next.push_back(remain[i]);
      }
   }
   // run the next iteration
   return calculate(tokens, next, idx+1);
}

int main() {
   int L, D, N;
   cin >> L >> D >> N;
   for (int i = 0; i < D; i++) {
      string word; cin >> word;
      dict.push_back(word);
   }
   sort(dict.begin(),dict.end());
   // get rid of the trailing newline from using cin
   string dump; getline(cin,dump);
   for (int i = 0; i < N; i++) {
      string line;
      getline(cin,line);
      vector<string> tokens;
      // parse the (xxx) patterns into separate strings
      for (int j = 0; j < line.size(); j++) {
         if (line[j] == '(') {
            string token;
            j++;
            while (line[j] != ')') {
               token += line[j];
               j++;
            }
            tokens.push_back(token);
         } else {
            string token = string(1,line[j]);
            tokens.push_back(token);
         }
      }
      cout << "Case #" << (i+1) << ": " << calculate(tokens, dict, 0) << "\n";
   }
   return 0;
}

Problem B: Watersheds
Category: Depth First Search, Recursion
URL: http://code.google.com/codejam/contest/dashboard?c=90101#s=p1

The problem asks us to determine the different drainage basins given a heightmap as input. There is a set of rules in which the rainfall falls down to which is given in the problem statement. We can go through each cell and do a simple modified depth first search (really just recursion) once we reach the sink, we check whether or not the sink has been labelled. If it hasn't, then we label it with the next smallest unused lower-case character. We then backtrack and assign this character to all the nodes which were used to get to the sink. A simple optimisation is to stop recursing if we reach a node in which has already been assigned a lower-case letter as opposed to waiting until we reach the sink. This isn't required as the constraints are quite low (100x100) in the worst case.

There are some minor implementation details: we have to ensure that we consider adjacent nodes in the correct order (i.e. North, West, East, South). This is easily achieved by a direction array/vector which prioritises the order. This way we don't have to worry about the cases in which the lowest adjacent altitudes are equal, it's implicitly done for us. The constraint of placing the colour so the map is lexicographically smallest can be achieved by greedily filling in the nodes by row-order then by column-order. We are guaranteed to fill a cell with the smallest available character if we start traversing from (x,y) as it will eventually reach a sink. The last note is to not consider nodes that are outside the heightmap (nodes which exist outside the land) as well as not considering adjacent cells which have a height larger or equal to the current cell's height.

Apart from following the given rules - it's a straightforward simulation problem. The constraints are rather lenient, so any decent algorithm should pass within the time limits.

// Google Code Jam 2009 - Qualification Round (Watersheds)

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>

using namespace std;

int heightmap[111][111];   // matrix to keep track of the input height map
char output[111][111];     // the final set of basins
char comp;                 // the minimum un-used character component

// directional vectors in preference of N, W, E, S
int dx[] = {-1, 0, 0, 1};
int dy[] = {0, -1, 1, 0};

// the grid size - used for checking boundaries
int szX, szY;

char dfs(int x, int y) {
   // search for the lowest altitude given the rules defined 
   // in the problem statement
   int lowest = 20000;
   int nx = -1, ny = -1;
   for (int k = 0; k < 4; k++) {
      int mx = dx[k] + x;
      int my = dy[k] + y;
      if (mx < 0 || my < 0 || mx >= szX || my >= szY || heightmap[x][y] <= 
         heightmap[mx][my]) continue;
      if (heightmap[mx][my] < lowest) {
         lowest = heightmap[mx][my];
         nx = mx;
         ny = my;
      }
   }
   // sink reached
   if (nx < 0) {
      if (output[x][y] != 0) return output[x][y];
      return output[x][y] = comp++;
   }
   // backtrack the solution
   output[x][y] = dfs(nx,ny);
}

int main() {
   int T;
   cin >> T;
   for (int i = 0; i < T; i++) {
      memset(heightmap,0,sizeof(heightmap));
      memset(output,0,sizeof(output));
      int X, Y;
      cin >> X >> Y;
      for (int j = 0; j < X; j++) {
         for (int k = 0; k < Y; k++) {
            int val; cin >> val;
            heightmap[j][k] = val;
         }
      }
      // reset the dfs parameters
      comp = 'a';
      szX = X;
      szY = Y;
      for (int j = 0; j < X; j++) {
         for (int k = 0; k < Y; k++) {
            // if we have assigned a character to the cell - don't need to dfs
            if (output[j][k]) continue;
            dfs(j,k);
         }
      }
      // output the assigned basins
      cout << "Case #" << (i+1) << ":\n";
      for (int j = 0; j < X; j++) {
         for (int k = 0; k < Y; k++) {
            if (k != 0) cout << " ";
            cout << output[j][k];
         }
         cout << "\n";
      }
   }
   return 0;
}

Problem C: Welcome to Code Jam
Category: Recursion, Dynamic Programming
URL: http://code.google.com/codejam/contest/dashboard?c=90101#s=p2

The problem asks us to find the number of occurrences of "welcome to code jam" inside a given text. Take note that it's not necessarily a contiguous subsequence. This concept is best illustrated with a diagram for those unfamiliar:

Take a phrase P ("wel") and a text T ("wweel"). Then there are 4 occurrences:

|w|w|e|e|l| (the text T)
|w| |e| |l| (#1)
|w| | |e|l| (#2)
| |w|e| |l| (#3)
| |w| |e|l| (#4)

Hence your allowed to skip characters in T so long as you use the all the characters in P in order. This easily lends itself to a recursive solution, we just need to keep track of 2 indexes: the index of P and the index of T. This alone let's us to determine how far we are within a particular instance of the problem and lets us determine what next character in P we need to match within T. There are two conditions in which we end the recursion: we have reached the end of P (i.e. this is a valid occurrence of P inside T), we have reached the end of T but there are still characters to process in P (i.e. invalid occurrence of P in T as we haven't finished laying out the characters in P on T). We return 1 and 0 on the respective cases.

This solution will timeout for the large input as the number of possible occurrences skyrockets (this should be evident enough from the problem statement which states 400263727 occurrences of "welcome to code jam" in the small paragraph). The simple optimisation is to use memoisation and cache the indexes so we don't re-compute sub-problems we have already calculated before. Another approach is to convert the memoisation/recurrence relation into a bottom-up dynamic programming algorithm. A minor implementation note is keeping track of the last 4 digits, the safest and easiest is to use modulo arithmetic: we simply mod all our calculations with 10000. Note that it is not sufficient to do this at the end (before outputting) as the calculations may have already overflowed the integer type and modding it by 10000 will just yield the incorrect solution.

// Google Code Jam 2009 - Qualification Round (Welcome to Code Jam)

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>

using namespace std;

int dp[511][21];   // memo cache
string lineR;      // used to keep track of the line we are given
string textR = "welcome to code jam";   // the phrase we are looking for

int calc(int lineIdx, int textIdx) {
   // base cases
   if (textIdx >= textR.size()) return 1;
   if (lineIdx >= lineR.size()) return 0;
   // seen it before - return the previously computed value
   if (dp[lineIdx][textIdx] != -1) return dp[lineIdx][textIdx];
   dp[lineIdx][textIdx] = 0;
   for (int i = lineIdx; i < lineR.size(); i++) {
      // valid transition - add all the combinations
      if (lineR[i] == textR[textIdx]) {
         dp[lineIdx][textIdx] = (dp[lineIdx][textIdx] + calc(i+1, textIdx+1)) % 10000;
      }
   }
   return dp[lineIdx][textIdx] % 10000;
}

int main() {
   int N;
   cin >> N;
   // get rid of the trailing newline character
   string dump; getline(cin,dump);
   for (int i = 0; i < N; i++) {
      // reset the memo cache
      memset(dp,-1,sizeof(dp));
      string line; getline(cin,line);
      lineR = line;
      int v = calc(0,0);
      // correct the output into the required 4 digits
      ostringstream oss; oss << v;
      string output = oss.str();
      while (output.size() < 4) output = "0" + output;
      cout << "Case #" << (i+1) << ": " << output << "\n";
   }
   return 0;
}