ZOJ 1387
Total Submit: 834 Accepted Submit: 306
Before the digital age, the most common "binary" code for radio communication was the Morse code. In Morse code, symbols are encoded as sequences of short and long pulses (called dots and dashes respectively). The following table reproduces the Morse code for the alphabet, where dots and dashes are represented as ASCII characters "." and "-":
Notice that in the absence of pauses between letters there might be multiple interpretations of a Morse sequence. For example, the sequence -.-..-- could be decoded both as CAT or NXT (among others). A human Morse operator would use other context information (such as a language dictionary) to decide the appropriate decoding. But even provided with such dictionary one can obtain multiple phrases from a single Morse sequence.
Task
Write a program which for each data set:
reads a Morse sequence and a list of words (a dictionary),
computes the number of distinct phrases that can be obtained from the given Morse sequence using words from the dictionary,
writes the result.
Notice that we are interested in full matches, i.e. the complete Morse sequence must be matched to words in the dictionary.
Input
The rst line of the input contains exactly one positive integer d equal to the number of data sets, 1 <= d <= 20. The data sets follow.
The first line of each data set contains a Morse sequence - a nonempty sequence of at most 10 000 characters "." and "-" with no spaces in between.
The second line contains exactly one integer n, 1 <= n <= 10 000, equal to the number of words in a dictionary. Each of the following n lines contains one dictionary word - a nonempty sequence of at most 20 capital letters from "A" to "Z". No word occurs in the dictionary more than once.
Output
The output should consist of exactly d lines, one line for each data set. Line i should contain one integer equal to the number of distinct phrases into which the Morse sequence from the i-th data set can be parsed. You may assume that this number is at most 2 * 10^9 for every single data set.
Sample Input
1
.---.--.-.-.-.---...-.---.
6
AT
TACK
TICK
ATTACK
DAWN
DUSK
Sample Output
2
这题的DP公式很容易想到,重点在于优化,对于优化我做过几次尝试都不理想,最后借鉴了网上一位大牛的方法终于达到了满意的时间。
我第一次使用的DP公式:
F[x] 表示Morse串中x位置到结尾能解码为词典中的词的不同方法
设S为Morse串长度
则有 F[S] = 1 (下标从0开始算)
F[x] = sigma( F[x+len(T)] | T属于词典且Morse串x位置开始和T匹配 )Morse串最大长度10000,词典的词汇量最大10000。纵使给你10s,不做优化直接写也铁定超。
所以我先用kmp计算出所有词在Morse串中可能出现的位置,
从F[S]到F[0]计算,凡是出现匹配的位置应用一下以上公式。最后F[0]是要求的结果。
这样写C++用了00:04.36时间
改成C的用了00:01.66,但是自己写KMP,代码变长了。
时间主要花在找匹配位置上了。
后来参考了一位大牛的文章
他的DP方法正好跟我反着的:
F[x]表示从Morse串开头到x位置的串的解码种数,
则F[x] = sigma( F[x-len(T)] | T属于词典且Morse串x-len(T)位置开始和T匹配 )
从F[0]到F[S]计算,最后F[S]是要求的结果。
最关键的地方就是,如果F[x-len(T)]为0,就可以不累加到F[x]上了。
DP部分代码如下
其中data[]是Morse串
dlen是Morse串长度
seq[][] 记录所有编码后的词
bool Match(char* S, char* T, int tlen){
// 判断串S后tlen长度是否和串T匹配
for(int i=0; i<tlen; i++){
if( S[i] != T[i] )
return false;
}
return true;
}
int Solve(){
// F[x]表示从Morse串开头到x位置的串的解码种数
memset(F, 0, sizeof(int)*(dlen+1));
F[0] = 1;
int i, k;
int seqlen;
for(i=0; i<dlen; i++){
if( !F[i] ) continue; // 减少很大运算量的一句
for(k=0; k<n; k++){
seqlen = (int)strlen(seq[k]);
if( seqlen+i<=dlen && Match(data+i, seq[k], seqlen) ){
F[i+seqlen] += F[i];
}
}
}
return F[dlen];
}
这样一来00:00.07过了!
大牛就是大牛,呵呵。
其实减少运算量的两种DP都可以做,但个人感觉第二种会比较容易理解。
感想:DP很灵活。