hdu 3695 Computer Virus on Planet Pandora AC自动机

Computer Virus on Planet Pandora

Time Limit: 6000/2000 MS (Java/Others)    Memory Limit: 256000/128000 K (Java/Others)
Total Submission(s): 1609    Accepted Submission(s): 454


Problem Description
    Aliens on planet Pandora also write computer programs like us. Their programs only consist of capital letters (‘A’ to ‘Z’) which they learned from the Earth. On 
planet Pandora, hackers make computer virus, so they also have anti-virus software. Of course they learned virus scanning algorithm from the Earth. Every virus has a pattern string which consists of only capital letters. If a virus’s pattern string is a substring of a program, or the pattern string is a substring of the reverse of that program, they can say the program is infected by that virus. Give you a program and a list of virus pattern strings, please write a program to figure out how many viruses the program is infected by.
 

Input
There are multiple test cases. The first line in the input is an integer T ( T<= 10) indicating the number of test cases.

For each test case:

The first line is a integer n( 0 < n <= 250) indicating the number of virus pattern strings.

Then n lines follows, each represents a virus pattern string. Every pattern string stands for a virus. It’s guaranteed that those n pattern strings are all different so there
are n different viruses. The length of pattern string is no more than 1,000 and a pattern string at least consists of one letter.

The last line of a test case is the program. The program may be described in a compressed format. A compressed program consists of capital letters and 
“compressors”. A “compressor” is in the following format:

[qx]

q is a number( 0 < q <= 5,000,000)and x is a capital letter. It means q consecutive letter xs in the original uncompressed program. For example, [6K] means 
‘KKKKKK’ in the original program. So, if a compressed program is like:

AB[2D]E[7K]G

It actually is ABDDEKKKKKKKG after decompressed to original format.

The length of the program is at least 1 and at most 5,100,000, no matter in the compressed format or after it is decompressed to original format.
 

Output
For each test case, print an integer K in a line meaning that the program is infected by K viruses.
 

Sample Input
3 2 AB DCB DACB 3 ABC CDE GHI ABCCDEFIHG 4 ABB ACDEE BBB FEEE A[2B]CD[4E]F
 

Sample Output
0 3 2
Hint
In the second case in the sample input, the reverse of the program is ‘GHIFEDCCBA’, and ‘GHI’ is a substring of the reverse, so the program is infected by virus ‘GHI’.
 

-----------------------

用virus建立AC自动机,将program正反查询一次,和即为答案。注意[qx]( 0 < q <= 5,000,000)

-------------------

#include <cstdio>
#include <cstring>
#include <iostream>
#include <queue>
#include <cstring>

using namespace std;

//子树节点是在插入时new的,
//寻找失配指针中使用的队列是直接调用STL的
const int kind = 26;
struct node
{
    node *fail;
    node *next[kind];
    bool fff;
    int count;//记录当前前缀是完整单词出现的个数
    node()
    {
        fff=false;
        fail = NULL;
        count = 0;
        memset(next,NULL,sizeof(next));
    }
};

void insert(char *str,node *root)
{
    node *p=root;
    int i=0,index;
    while(str[i])
    {
        index = str[i]-'A';
        if(p->next[index]==NULL) p->next[index]=new node();
        p=p->next[index];
        i++;
    }
    p->count++;

}

//寻找失败指针
void build_ac_automation(node *root)
{
    int i;
    queue<node *>Q;
    root->fail = NULL;
    Q.push(root);
    while(!Q.empty())
    {
        node *temp = Q.front();//q[head++];//取队首元素
        Q.pop();
        node *p = NULL;
        for(i=0; i<kind; i++)
        {
            if(temp->next[i]!=NULL)//寻找当前子树的失败指针
            {
                p = temp->fail;
                while(p!=NULL)
                {
                    if(p->next[i]!=NULL)//找到失败指针
                    {
                        temp->next[i]->fail = p->next[i];
                        break;
                    }
                    p = p->fail;
                }

                if(p==NULL)//无法获取,当前子树的失败指针为根
                    temp->next[i]->fail = root;

                Q.push(temp->next[i]);
            }
        }
    }
}

//询问str中包含n个关键字中多少种即匹配
int query(char *str,node *root)
{
    int i = 0,cnt = 0,index,len;
    len = strlen(str);
    node *p = root;
    while(str[i])
    {
        index = str[i]-'A';
        while(p->next[index]==NULL&&p!=root)//失配
            p=p->fail;
        p=p->next[index];
        if(p==NULL)//失配指针为根
            p = root;

        node *temp = p;
        while(temp!=root&&temp->count!=-1)//寻找到当前位置为止是否出现病毒关键字
        {
            cnt+=temp->count;
            temp->count=-1;
            temp=temp->fail;
        }
        i++;
    }
    return cnt;
}

char str[6100000];
char s1[6100000];
char s2[6100000];
char words[1111111];
int T,n;
node* root;

int main()
{
    scanf("%d",&T);
    while (T--)
    {
        memset(s1,0,sizeof(s1));
        memset(s2,0,sizeof(s2));
        root=new node();
        scanf("%d",&n);
        while (n--)
        {
            scanf("%s",words);
            insert(words,root);
        }
        build_ac_automation(root);
        getchar();
        char sc;
        int j=0;
        while (scanf("%c",&sc))
        {
            if (!( ((sc>='A')&&(sc<='Z')) || ((sc>='0')&&(sc<='9')) || (sc=='[') || (sc==']') )) break;
            if (sc>='A'&&sc<='Z')
            {
                s1[j++]=sc;
            }
            else if (sc=='[')
            {
                int d;
                char c;
                scanf("%d%c",&d,&c);
                getchar();
                while (d--)
                {
                    s1[j++]=c;
                }
            }
        }
        s1[j]='\0';
        int l=strlen(s1);
        for (int i=0; i<l; i++)
        {
            s2[i]=s1[l-i-1];
        }
        s2[l]='\0';
        //cerr<<endl<<s1<<endl<<s2<<endl<<endl;
        int reta=query(s1,root);
        int retb=query(s2,root);
        printf("%d\n",reta+retb);
    }
    return 0;
}






posted on 2013-04-20 20:38  电子幼体  阅读(186)  评论(0编辑  收藏  举报

导航