讨论记录：求大于一个时间段的最大平均积分，O(n)时间实现

虽然下周就考试了，还有很多没复习……我还如此淡定地花了整整一个下午+晚上想了一个实际问题……（其实也不是那么特别实际，但的确是来自于现实生活的DP问题）

问题+讨论的URL：http://topic.csdn.net/u/20110421/17/dc8e63bf-7992-487f-a112-049cf390cb1c.html?seed=1122826158&r=72946293#r_72946293

（因为考虑到很多人很懒，不想打开URL去看，我把里面重点的内容复制过来）

问题陈述：

比如CSDN的积分形式（以下为假设），每当你获得一次积分，服务器便保留这时的时间以及总分。
现在要求在哪一段时间内获得积分的效率最高（前提这段时间内最少获得100分）。求一个时间复杂
度低于n^2的算法。除了穷举的算法O（n^2)，有没有时间复杂度更低的算法呢？

以二维数组模拟以上问题（以下为一个例子）

时间 总分
0 0
10 5
30 28
44 71
99 97
110 105
125 120
198 181
203 190
230 201

时间和总分都是严格递增的。
效率计算方法举例：比如在时间段30到110之间的效率为（105-28）/（110-30）。

于是在仔细思考了这种计算方法之后，我提出了这个质疑：

积分是以时间点为单位获得的。
比如，如果input为：
0 0
5 10
10 20
20 100

实际上的获得积分的情况为：
0 0
5 10
10 10
20 80 （在这一点获得了80分）

定义 slope = 在单位时间内获得的最大积分，那么明显在 5~20区间内，slope最大，为：100/(20-5) = 6.67。

但题目给的理解方式为，积分差 = A时间点得分 - B时间点得分；所以依照这个方法计算，slope = 100/（20-0）= 5；
这实际上是从一个得了分的时间点之后开始算的，但没有算这个时间点的得分（就是说，多算了A时间点之后到A之后第一次得
分之前的时间）。
明显这样的计算方式并不能算出最大的slope。

但既然现在已经在考虑问题本身，那么我们来看看依照原来的计算方法，怎样算。重新表述一下这个问题：

input:  
若干行 以 time,score 形式的数据，其中time表示一个时间点，score表示在这个时间点的累积得分；time、score均严格递增。

output：
end,start;
其中end，start的意义为:
定义 Δslope = (time[i] - time[j])/(score[i]-score[j])；
end,start满足：
1. score[end] - score[start] >= score interval  (这里题目给的是100)
2. Δslope = (time[end]-time[start])/(score[end]-score[start]) 的值为最大。

于是开始漫长的思考过程...发现（x-时间，y-得分）的坐标系不是很便于思考（需要考虑在Δy>score interval），所以把坐标系变成了（x-得分，y-时间）——这样只需要保证在x轴上大于score interval就好了；但slope从原来寻找最大slope，变成了现在寻找最小slope。

思维过程很漫长……建议下周期末考试，要抓紧复习，直接给出我回复的、成型了的想法：

这个方法的time complexity: O(n) ~~~

为了思考方便，首先我做了一个坐标系的转化：
把time看成纵坐标（y轴），score看成横坐标（x轴）；这样问题转化为，求在Δscore （即Δx
）>= 100 （为了一般性，我在代码里面用scoreInterval表示这个数字）的时候，使得slope最
小的 startIndex 和 endIndex。
假设一共输入了n组数据。（代码中的 score.size()等价于n）

*****************************preprocess***************************************
建立两个表：predecessor[n]， minSlopeStartIndexTable[n];
第一个表，predecessor[i]表示：
对于index i，startIndex = predecessor[i]是i之前的第一个使得 Δscore = score
[i] - score[startIndex] >= scoreInterval 的 startIndex。对应的建表方法见code。这步
需要 O(n)的时间。

第二个表，minSlopeStartIndexTable[n]表示：
对于index i，假设 sIndex = minSlopeStartIndexTable[i]，那么 对于j = 从index  
0开始，到index i，使得 所有终结于i的点中，slope(j,i)的值为最小的，记为sIndex，存储
在这个表中。
易知递归关系有：
minSlopeStartIndexTable[i] =  
  (slope(minSlopeStartIndexTable[i-1],i)<slope(i-1,i))?
  minSlopeStartIndexTable[i-1]:i-1;
这一步需要O(n)的时间。

*****************************calculation***************************************
于是我们开始进行计算：
保持一个minSlope：这是进行到当前计算状态中，能得到的最小的slope。初始化为正无穷
（INF，一个非常大的数值）
同时有bestStartIndex，bestEndIndex表示对应minSlope的两个index。

对于index i：
假设：minSlope是从0~i-1中所有可能中的最小斜率。
首先我们需要计算从predecessor[i]到i的斜率，同minSlope比较；
然后考虑predecessor[i]之前的最小的斜率（即从 minSlopeStartIndexTable
[predecessor[i]] 到 predecessor[i]的斜率），如果比（predecessor[i] 到i 的斜率）要小
，那么他们合起来的斜率是一个比predecessor[i]到i的斜率更小的斜率。所以，我们用这个和
minSlope比较。
在以上比较进行完成之后，可以保证 minSlope 是从 0~i中所有可能中的最小斜率。
每次对 index i的比较需要 O(1)的时间，一共进行n次，所以也是O(n)的时间。

*****************************Test Case***************************************
另外，为了测试方便，给出Test Case：
（测试的人把code里面的freopen、结尾的while(1)取消注释就好了；同时建立测试文件
MaxSlopeOverInterval_in.txt，保存在和这个code的相同目录下）

然后，这是做出来的code，个人感觉注释写得还是很清楚的：

#include <iostream>
#include <vector>

#define INF 1000000000
using namespace std;

vector<int> timePoint, score;
int scoreInterval = 100;    // by default the score interval is 100

double slope(int startIndex,int endIndex){
    return ((double)(timePoint[endIndex] - timePoint[startIndex]))/(
        score[endIndex] - score[startIndex]);
}


int main(){
//    freopen("MaxSlopeOverInterval_in.txt","r",stdin);
    
    // handle the input
    int t,p;
    while(scanf("%d%d",&t,&p)!=EOF){
        timePoint.push_back(t);score.push_back(p);
    }
    
    for(int i=0;i<score.size();i++){
        printf("%d: %d, %d\n",i,timePoint[i],score[i]);
    }
    
    // preporcessing: 
    // *1. store the predecessor of score[i]
    int predecessor[score.size()]; int predecessorIndex = 0;
    for(int i=0;i<score.size();i++){
        if(score[i]<scoreInterval)    predecessor[i] = -1;
        else{
            while(score[i] - score[predecessorIndex + 1] >= scoreInterval){
                predecessorIndex++;
            }
            predecessor[i] = predecessorIndex;
        }
        printf("predecessor index of %d: %d\n",i,predecessorIndex);
    }
    // *2. stores the minSlopeStartIndex from its value to i.
    //         minSlopeStartIndexTable[i] is the startIndex of the slope, and
    //        i is the endIndex of the slope, which makes the slope(j,i) to be
    //        smallest for all j from 0 to i-1
    int minSlopeStartIndexTable[score.size()];
    minSlopeStartIndexTable[0]=0;minSlopeStartIndexTable[1]=0;
    for(int i=2;i<score.size();i++){
        minSlopeStartIndexTable[i] = 
            (slope(minSlopeStartIndexTable[i-1],i)<slope(i-1,i))?
                minSlopeStartIndexTable[i-1]:i-1;
        printf("%dth minSlope: %d\n",i,minSlopeStartIndexTable[i]);
    }
    
        
    int formerBestStartIndex=0;    
        // score[i] - score[formerBestStartIndex] is the min for start = 0~i, 
        // which satisfies score[end] - score[start] > scoreInterval
    int bestStartIndex=0,bestEndIndex=0;
        // the start and end of the time period with max score/timePeriod gained
    double minSlope = INF;
        
    /* basic idea:
      to compare (minSlope) and 
          slope(iScorePredecessorIndex,i), slope(formerBestStartIndex,i)
    */
    for(int i=1;i<score.size();i++){
        int iScorePredecessorIndex = predecessor[i];
        if(iScorePredecessorIndex >= 0){
            // compare (minSlope) and slope(islope(iScorePredecessorIndex,i))
            printf("slope from %d to %d: %lf\n",iScorePredecessorIndex,i,
                slope(iScorePredecessorIndex,i));
            if(minSlope > slope(iScorePredecessorIndex,i)){
                minSlope = slope(iScorePredecessorIndex,i);
                bestStartIndex = iScorePredecessorIndex;
                bestEndIndex = i;
            }

            // compare (minSlope) and slope(formerBestStartIndex,i)            
            formerBestStartIndex = minSlopeStartIndexTable[predecessor[i]];
            printf("%dth element's former Best Start Index: %d, slope = %lf\n",
                i,formerBestStartIndex,slope(iScorePredecessorIndex,i));
            if(minSlope > slope(formerBestStartIndex,i)){
                minSlope = slope(formerBestStartIndex,i);
                bestStartIndex = formerBestStartIndex;
                bestEndIndex = i;
            }
        }
    }
    
    printf("\n\nThe best one is : start time = %d, end time = %d, slope = %lf\n",
        timePoint[bestStartIndex],timePoint[bestEndIndex],1/minSlope);
    
//    while(1);
    return 0;
}

发表于 2011-04-23 22:42 songyy 阅读(344) 评论(0) 收藏举报

公告