Max coverage disjoint intervals
Assume you have k<=10^5 intervals [a_i, b_i] \in [1,10^18] (some of them may overlap), and you need to choose a set of intervals mutually disjoint such that their union is maximal. Not maximum number of disjoint intervals, but the union must cover the most.
Can't try all possible subsets 2^k infeasible. Greedy approaches ordering by a_i ( interval covering algorithm) and ordering by b_i ( maximum number of disjoint intervals algorithm ) didn't work Can't figure out if there is a dynamic program solution. Given the size of the input, I think the solution should be O(k log k) or O(k)
Examples 1. [1,4], [3,5], [5,9], [7, 18] Sol [3,5]u[7,18]
-
[1,2], [2,6], [3,4], [5,7] Sol [1,2]u[3,4]u[5,7]
-
[2,30], [25,39], [30,40] Sol [2,30]
Here is an O(nlog n)-time, O(n)-space algorithm. First, sort the array of tuples by their starting position if they are not already in this order. I'll assume zero-based array indices.
Let's call the beginning position of tuple i b(i) and the ending position e(i), so that its total length is e(i) - b(i) + 1. Also let's define a function next(i) that returns the position within the tuple list of the first tuple that can appear to the right-hand side of tuple i. Notice that next(i) can be calculated in O(log n) time with a binary search: just keep all the tuple beginning positions b(i) in an array b[], and search for the first j in the subarray b[i+1 .. n-1] having b[j] > e(i).
Let's define f(i) to be the maximum coverage of any nonoverlapping set of tuples that begins at or after tuple i. Since tuple i itself is either in this optimal set or not, we have:
f(i) = max(e(i) - b(i) + 1 + f(next(i)), f(i+1)) for 0 <= i < n
We also have the boundary condition f(n) = 0
.
Clearly the largest possible coverage is given by f(0). This is easily calculated. In pseudo-C++:
1 int b[] = /* Tuple beginning positions, in nondecreasing order */; 2 int e[] = /* Tuple end positions */; 3 int n = /* Number of tuples */; 4 5 // Find the array position of the leftmost tuple that begins to the right of 6 // where tuple i ends. 7 int next(int i) { 8 return upper_bound(b + i + 1, b + n, e[i]); 9 } 10 11 int maxCov[n + 1]; // In practice you should dynamically allocate this 12 13 // After running this, maxCov[i] will contain the maximum coverage of any 14 // nonoverlapping subset of the set of n - i tuples whose beginning positions 15 // are given by b[i .. n-1] and whose ending points are given by e[i .. n-1]. 16 // In particular, maxCov[0] will be the maximum coverage of the entire set. 17 void calc() { 18 maxCov[n] = 0; 19 for (int i = n - 1; i >= 0; --i) { 20 maxCov[i] = max(e[i] - b[i] + 1 + maxCov[next(i)], maxCov[i + 1]); 21 } 22 }
The loop in calc()
runs n times, and each iteration makes one O(log n) call to the binary search function upper_bound()
.
We can reconstruct an actual set of this size by calculating both the inputs to max() for f(0), seeing which one actually produced the maximum, recording whether it implies the presence or absence of tuple 0, and then recursing to handle the remainder (corresponding to either f(next(0)) or f(1)). (If both inputs are equal then there are multiple optimal solutions and we can follow either one.)