Friday, September 4, 2009

Google Code Jam 2009 Qualification Round

Overview:
The GCJ qualification round started smoothly until the input downloading system stopped working. This isn't crucial to the qualification round as participants can just move on to the next problem and submit all the problems at the same time when the system started working again. The number of competitors came close to almost 10,000 international participants - many of which were able to solve at least 1 problem. The problems itself required no specialised algorithmic or mathematical knowledge so they proved to be accessible to most participants. A score of 33 was required to progress to Round 1, so any combination of a correct small input and a large input solution was sufficient enough to qualify.

Links:
Google Code Jam Statistics: http://www.go-hero.net/jam/
Google Code Jam Scoreboard: http://code.google.com/codejam/contest/scoreboard?c=90101

Problem A: Alien Language
Category: Brute Force, String Manipulation
URL: http://code.google.com/codejam/contest/dashboard?c=90101#s=p0

The problem asks us to determine the number of dictionary words (which are guaranteed to be unique) such that it matches the given set of token inputs. There are several ways to solve this, one way is to convert the patterns into regular expressions and run through the dictionary list and compute whether or not the word matches the regex. Programming languages like Java or scripting languages like Perl or Python support this method natively. Converting the given pattern into regex is simple: convert the left brackets into '[' and convert the right brackets into ']'. For example, (ab)(bc)(ca) becomes [ab][bc][ca].

The alternative approach is to keep a set of dictionary words that are correct so far and keep reducing this set based on the constraints imposed by the patterns. For example, given the dictionary list in the sample input:

S1 = {abc, bca, cba, dac, dbc}

Given (ab)(bc)(ca) as the pattern, we start with the first token (ab). So we eliminate all strings/words in S1 which do not have 'a' or 'b' in the first index. This yields the second set:

S2 = {abc, bca}

We proceed to process the second token (bc). Again we eliminate all strings/words in S2 which do not have a 'b' or 'c' as the second letter/index. Since both words in S2 contain this, we don't eliminate any. We simply keep repeating this process until we reach the last index. The answer is simply the size of the set which remains.

// Google Code Jam 2009 - Qualification Round (Alien Language)

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>

using namespace std;

vector<string> dict;

// determine the number of dictionary words that match the given pattern
int calculate(const vector<string>& tokens, 
   const vector<string>& remain, int idx) {
   if (idx >= tokens.size()) return remain.size();
   if (remain.size() == 0) return 0;
   string t = tokens[idx];
   vector<string> next;
   for (int i = 0; i < remain.size(); i++) {
      // a valid word given the current idx comparison - 
      // append it for the next iteration
      if (t.find(remain[i][idx]) != string::npos) {
         next.push_back(remain[i]);
      }
   }
   // run the next iteration
   return calculate(tokens, next, idx+1);
}

int main() {
   int L, D, N;
   cin >> L >> D >> N;
   for (int i = 0; i < D; i++) {
      string word; cin >> word;
      dict.push_back(word);
   }
   sort(dict.begin(),dict.end());
   // get rid of the trailing newline from using cin
   string dump; getline(cin,dump);
   for (int i = 0; i < N; i++) {
      string line;
      getline(cin,line);
      vector<string> tokens;
      // parse the (xxx) patterns into separate strings
      for (int j = 0; j < line.size(); j++) {
         if (line[j] == '(') {
            string token;
            j++;
            while (line[j] != ')') {
               token += line[j];
               j++;
            }
            tokens.push_back(token);
         } else {
            string token = string(1,line[j]);
            tokens.push_back(token);
         }
      }
      cout << "Case #" << (i+1) << ": " << calculate(tokens, dict, 0) << "\n";
   }
   return 0;
}

Problem B: Watersheds
Category: Depth First Search, Recursion
URL: http://code.google.com/codejam/contest/dashboard?c=90101#s=p1

The problem asks us to determine the different drainage basins given a heightmap as input. There is a set of rules in which the rainfall falls down to which is given in the problem statement. We can go through each cell and do a simple modified depth first search (really just recursion) once we reach the sink, we check whether or not the sink has been labelled. If it hasn't, then we label it with the next smallest unused lower-case character. We then backtrack and assign this character to all the nodes which were used to get to the sink. A simple optimisation is to stop recursing if we reach a node in which has already been assigned a lower-case letter as opposed to waiting until we reach the sink. This isn't required as the constraints are quite low (100x100) in the worst case.

There are some minor implementation details: we have to ensure that we consider adjacent nodes in the correct order (i.e. North, West, East, South). This is easily achieved by a direction array/vector which prioritises the order. This way we don't have to worry about the cases in which the lowest adjacent altitudes are equal, it's implicitly done for us. The constraint of placing the colour so the map is lexicographically smallest can be achieved by greedily filling in the nodes by row-order then by column-order. We are guaranteed to fill a cell with the smallest available character if we start traversing from (x,y) as it will eventually reach a sink. The last note is to not consider nodes that are outside the heightmap (nodes which exist outside the land) as well as not considering adjacent cells which have a height larger or equal to the current cell's height.

Apart from following the given rules - it's a straightforward simulation problem. The constraints are rather lenient, so any decent algorithm should pass within the time limits.

// Google Code Jam 2009 - Qualification Round (Watersheds)

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>

using namespace std;

int heightmap[111][111];   // matrix to keep track of the input height map
char output[111][111];     // the final set of basins
char comp;                 // the minimum un-used character component

// directional vectors in preference of N, W, E, S
int dx[] = {-1, 0, 0, 1};
int dy[] = {0, -1, 1, 0};

// the grid size - used for checking boundaries
int szX, szY;

char dfs(int x, int y) {
   // search for the lowest altitude given the rules defined 
   // in the problem statement
   int lowest = 20000;
   int nx = -1, ny = -1;
   for (int k = 0; k < 4; k++) {
      int mx = dx[k] + x;
      int my = dy[k] + y;
      if (mx < 0 || my < 0 || mx >= szX || my >= szY || heightmap[x][y] <= 
         heightmap[mx][my]) continue;
      if (heightmap[mx][my] < lowest) {
         lowest = heightmap[mx][my];
         nx = mx;
         ny = my;
      }
   }
   // sink reached
   if (nx < 0) {
      if (output[x][y] != 0) return output[x][y];
      return output[x][y] = comp++;
   }
   // backtrack the solution
   output[x][y] = dfs(nx,ny);
}

int main() {
   int T;
   cin >> T;
   for (int i = 0; i < T; i++) {
      memset(heightmap,0,sizeof(heightmap));
      memset(output,0,sizeof(output));
      int X, Y;
      cin >> X >> Y;
      for (int j = 0; j < X; j++) {
         for (int k = 0; k < Y; k++) {
            int val; cin >> val;
            heightmap[j][k] = val;
         }
      }
      // reset the dfs parameters
      comp = 'a';
      szX = X;
      szY = Y;
      for (int j = 0; j < X; j++) {
         for (int k = 0; k < Y; k++) {
            // if we have assigned a character to the cell - don't need to dfs
            if (output[j][k]) continue;
            dfs(j,k);
         }
      }
      // output the assigned basins
      cout << "Case #" << (i+1) << ":\n";
      for (int j = 0; j < X; j++) {
         for (int k = 0; k < Y; k++) {
            if (k != 0) cout << " ";
            cout << output[j][k];
         }
         cout << "\n";
      }
   }
   return 0;
}

Problem C: Welcome to Code Jam
Category: Recursion, Dynamic Programming
URL: http://code.google.com/codejam/contest/dashboard?c=90101#s=p2

The problem asks us to find the number of occurrences of "welcome to code jam" inside a given text. Take note that it's not necessarily a contiguous subsequence. This concept is best illustrated with a diagram for those unfamiliar:

Take a phrase P ("wel") and a text T ("wweel"). Then there are 4 occurrences:

|w|w|e|e|l| (the text T)
|w| |e| |l| (#1)
|w| | |e|l| (#2)
| |w|e| |l| (#3)
| |w| |e|l| (#4)

Hence your allowed to skip characters in T so long as you use the all the characters in P in order. This easily lends itself to a recursive solution, we just need to keep track of 2 indexes: the index of P and the index of T. This alone let's us to determine how far we are within a particular instance of the problem and lets us determine what next character in P we need to match within T. There are two conditions in which we end the recursion: we have reached the end of P (i.e. this is a valid occurrence of P inside T), we have reached the end of T but there are still characters to process in P (i.e. invalid occurrence of P in T as we haven't finished laying out the characters in P on T). We return 1 and 0 on the respective cases.

This solution will timeout for the large input as the number of possible occurrences skyrockets (this should be evident enough from the problem statement which states 400263727 occurrences of "welcome to code jam" in the small paragraph). The simple optimisation is to use memoisation and cache the indexes so we don't re-compute sub-problems we have already calculated before. Another approach is to convert the memoisation/recurrence relation into a bottom-up dynamic programming algorithm. A minor implementation note is keeping track of the last 4 digits, the safest and easiest is to use modulo arithmetic: we simply mod all our calculations with 10000. Note that it is not sufficient to do this at the end (before outputting) as the calculations may have already overflowed the integer type and modding it by 10000 will just yield the incorrect solution.

// Google Code Jam 2009 - Qualification Round (Welcome to Code Jam)

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <stack>

using namespace std;

int dp[511][21];   // memo cache
string lineR;      // used to keep track of the line we are given
string textR = "welcome to code jam";   // the phrase we are looking for

int calc(int lineIdx, int textIdx) {
   // base cases
   if (textIdx >= textR.size()) return 1;
   if (lineIdx >= lineR.size()) return 0;
   // seen it before - return the previously computed value
   if (dp[lineIdx][textIdx] != -1) return dp[lineIdx][textIdx];
   dp[lineIdx][textIdx] = 0;
   for (int i = lineIdx; i < lineR.size(); i++) {
      // valid transition - add all the combinations
      if (lineR[i] == textR[textIdx]) {
         dp[lineIdx][textIdx] = (dp[lineIdx][textIdx] + calc(i+1, textIdx+1)) % 10000;
      }
   }
   return dp[lineIdx][textIdx] % 10000;
}

int main() {
   int N;
   cin >> N;
   // get rid of the trailing newline character
   string dump; getline(cin,dump);
   for (int i = 0; i < N; i++) {
      // reset the memo cache
      memset(dp,-1,sizeof(dp));
      string line; getline(cin,line);
      lineR = line;
      int v = calc(0,0);
      // correct the output into the required 4 digits
      ostringstream oss; oss << v;
      string output = oss.str();
      while (output.size() < 4) output = "0" + output;
      cout << "Case #" << (i+1) << ": " << output << "\n";
   }
   return 0;
}

1 comment:

  1. Problem B: Watersheds: Hi Wilan, thanks for the solution! I want to add that provided that from each cell, water flows down to at most one of its 4 neighboring cells recursive backtracking can be replaced with a simple brute force in a loop.

    ReplyDelete