Friday, September 11, 2009

TC TCO08 Qualification Round 1 D1 600 (PrimeSums)

Category: Dynamic Programming
URL: http://www.topcoder.com/stat?c=problem_statement&pm=8596

This problem continues with our DP-themed run of recent problems. It asks us to compute the number of subsets (including the empty subset) in which the total bag weight is a prime number. The constraints give us an upper bound of 500,000 (50 * 10,000) for the maximum prime. So the first task is to compute all the prime numbers under this number, this is an easy and efficient task if you know the Sieve of Eratosthenes. The next step is determining how many ways we can make a sub-bag of a weight W, we can use this in our last step by adding up all the weights W which are prime to yield our final answer. Before that though, we need to determine the number of sub-bags which totals a weight of W.

We notice that there are overlapping subproblems which screams to us: dynamic programming! To see this, imagine we have a bag of {1, 1, 2, 7}. To compute the number which weights are possible for all subsets: we start off with the first element. We can choose to use this element and place it into our bag, or we can reject this element and not put it into our bag. Either way, we have {1, 2, 7} as a sub-problem however if we don't do any caching or DP we will compute this sub-problem twice. This effect carries on from the first element to the n-th element and the number of sub-problems we are forced to compute grows exponentially. So we can simply memoise our solution, or can we?

Looking at the constraints we need to keep track of our current index inside the bag (so we know which element we are considering) and also the current weight of the bag (which has an upper bound of 500,000). Since the number of elements is 50 are the worst case, using a memoisation table of 50 * 500,000 64-bit integers is too large to fit into the memory limit. There must be another way to optimise our space complexity. As with all DP problems, we only need to keep sub-solutions to as much as our recurrence relation needs them. So let's define our recurrence relation in hopes of a good optimisation:

DP State: F(n,w) = the number of sub-bags we can construct using a set of 1..n bags (1-based index) that sums to exactly a weight of w.
DP Recurrence: F(n,w) = F(n-1,w - b[n]) + F(n-1,w)

The recurrence comes from the fact that we are faced with 2 decisions at a particular state, to take the item and place it into the bag (the first recursive call) and to skip the item and retain our weight (the second recursive call). As you can see from the depth of the recurrence relation is only at most 1 (i.e. we only need to keep track of the n-1'th pre-computed values), then we can reduce our memoisation table to 2 * 500,000 64-bit integers and use modulus arithmetic to implicitly swap the rows around. See the DP: State Compression article for more information on this technique.

As usual, we also need to define a base case. This is pretty simple, we are allowed to have an empty subset so naturally there is 1 way to have a weight of 0 without using any elements in the bag. Hence F(0,0) = 1. However, we encounter a major problem when we try and implement the solution. The fact there are non-unique elements in the bag can cause our dynamic programming algorithm to double-count the same sub-bag. For example, consider the bag {1, 1, 2, 7} if we skip the first bag and decide to use all the other bags we yield {1, 2, 7} with a weight of 10. However, we can also choose to use the first bag and skip the second bag to yield the same sub-bag {1, 2, 7} also with a weight of 10. Our current algorithm does not take this into account and hence will produce incorrect results for certain inputs.

A neat way to tackle this problem is to condense the same weights into one bag and keep track of how many elements we have of that weight. In the example used previously, we can make a mapping of { {1,2}, {2,1}, {7,1} } which has a pair in each element denoting the weight of the element and how many times it is inside our bag. Then we apply the same DP algorithm to this modified mapping and introduce additional decisions to include k = 1 to n of each element, where n is defined as the number of times it is in our bag. So we can choose to use 0, 1 or 2 elements of weight 1 but we are only allowed to choose 0 or 1 elements of either weight 2 and 7. Note that this does not cause us to double count as we only allow a transition of 2 elements of weight 1 once in our algorithm.

Lastly, we just need to add up our DP table for all the prime weights which is pretty self-explanatory.

class PrimeSums {
public:
   long long getCount(vector <int>);
};

int prime[500011];
long long dp[2][500011];

long long PrimeSums::getCount(vector <int> bag) {
   long long res = 0;
   sort(bag.begin(),bag.end());
   // sieve to generate the primes
   memset(prime,1,sizeof(prime));
   prime[0] = prime[1] = 0;
   for (int i = 2; i * i <= 500000; i++) {
      if (prime[i]) {
         for (int j = i * i; j <= 500000; j += i) {
            prime[j] = 0;
         }
      }
   }
   // merge identical bag elements together
   map<int,int> mappings;
   for (int i = 0; i < bag.size(); i++) mappings[bag[i]]++;
   
   int kth = 1;
   
   // base case
   dp[0][0] = 1;
   
   for (map<int,int>::iterator it = mappings.begin(); it != mappings.end(); it++, 
   kth++) {
      for (int i = 0; i <= 500000; i++) {
         dp[kth&1][i] = dp[!(kth&1)][i];
         // loop through possible weight arrangements based on the available set
         for (int j = 1; j <= it->second; j++) {
            if (i-j*it->first < 0) break;
            dp[kth&1][i] += dp[!(kth&1)][i-j*it->first];
         }
      }
   }
   // sum up all the number of sub-bags which have a prime weight
   for (int j = 0; j <= 500000; j++) {
      if (prime[j]) res += dp[!(kth&1)][j];
   }
   return res;
}

No comments:

Post a Comment