DNA Sequence POJ - 2778 AC自动机 && 矩阵快速幂

It's well known that DNA Sequence is a sequence only contains A, C, T and G, and it's very useful to analyze a segment of DNA Sequence，For example, if a animal's DNA sequence contains segment ATC then it may mean that the animal may have a genetic disease. Until now scientists have found several those segments, the problem is how many kinds of DNA sequences of a species don't contain those segments.

Suppose that DNA sequences of a species is a sequence that consist of A, C, T and G，and the length of sequences is a given integer n.

Input

First line contains two integer m (0 <= m <= 10), n (1 <= n <=2000000000). Here, m is the number of genetic disease segment, and n is the length of sequences.

Next m lines each line contain a DNA genetic disease segment, and length of these segments is not larger than 10.

Output

An integer, the number of DNA sequences, mod 100000.

Sample Input

4 3
AT
AC
AG
AA

Sample Output

题意：

多组输入，n代表下面有n个模式串。m代表你要构造的DNA序列长度

题目让你求DNA序列为m，序列中不出现以上n个模式串的DNA序列有多少

题解：

我的上一代码用的我自己写的矩阵快速幂，那个版本每次都要初始化数组，太耗时了就TLE。找了个结构体版本的快速幂

这一道题就是问你你能找到多少个长度最长为m的正常DNA序列(什么叫正常？就是不含病毒的那种)
对于矩阵(这里以2*2为例)
|X(11) X(12)|
|X(21) X(22)|
我们设我们构造的矩阵为ans，其中ans[i][j]表示DNA序列中从i节点到j节点（下面都直接称为i、j）走一步的方法数(就是最后找出来的DNA序列中第i个位置和第j个
个位置,因为是一步，所以i和j在DNA序列中是相邻的。可以构成正常DNA的方法数)

什么叫i、j节点，我们在构造字典树的过程中会产生节点，那个节点是第几个产生的，那个就是它的序号

然后矩阵的二次方的ans[i][j]的意思就是从DNA序列中从i先到一个中转点再到j有多少种DNA序列满足
那么矩阵的n次方的ans[i][j]意思就是DNA序列中从i到n个中转点再到j有多少种DNA序列满足

那么我们只需要求出来ans[i][j]DNA序列中从i走一步的j这一段在DNA序列中方案数
然后这个矩阵求m次方就可以了

ans[i][j]DNA序列中从i走一步的j这一段在DNA序列中方案数的这个ans矩阵怎么求？

把每一个病毒串的终止位置给标记了，在字典树上通过失败指针跳转如果能到病毒串终止位置也要把这个点标记了，然后沿着字典树的next数组跑一边就完了(具体看代码)

然后第一行ans[0][1],ans[0][1]...ans[0][99]的所有数的和就可以了
ans[0][1]的意思就是从DNA序列0号位置(即，开头)经过m-1个中转点到1号节点 //0号位置是字典树根的位置，所以不算在DNA序列内
这个样子加起来就是答案

为什么可以这样做？

ans[i][j]最初的一步矩阵：

他就代表i这个点后面可以往j这个位置走一步有多少种不重复走法

ans[i][j]的m次方矩阵：

相当于字典树就是一个有向图，你就沿着它的方向跑下去。每跑m路程就会有一个对应DNA序列，又因为你已经把病毒串终止位置标记了，所以根本不可能跑到那个位置。所以你跑出来的DNA序列中也不会包含病毒序列

代码：

  1 #include<stdio.h>  
  2 #include<iostream>
  3 #include<string.h>
  4 #include<algorithm>
  5 #include<queue>
  6 using namespace std;
  7 const int maxn=115;
  8 const int N=4;
  9 const int mod=100000;
 10 typedef long long ll;
 11 int n,m;
 12 int w[120];
 13 struct Matrix
 14 {
 15     int mat[maxn][maxn],n;
 16     Matrix() {}
 17     Matrix(int _n)
 18     {
 19         n = _n;
 20         for(int i=0; i<n; i++)
 21             for(int j=0; j<n; j++)
 22                 mat[i][j]=0;
 23     }
 24     Matrix operator *(const Matrix &b)const
 25     {
 26         Matrix ret=Matrix(n);
 27         for(int i=0; i<n; i++)
 28             for(int j=0; j<n; j++)
 29                 for(int k=0; k<n; k++)
 30                 {
 31                     int tmp=(long long)mat[i][k]*b.mat[k][j]%mod;
 32                     ret.mat[i][j]=(ret.mat[i][j]+tmp)%mod;
 33                 }
 34         return ret;
 35     }
 36 };
 37 struct Trie
 38 {
 39     int next[maxn][N],fail[maxn],ends[maxn];
 40     int root,L;
 41     int New_node() //创建一个新节点
 42     {
 43         for(int i=0; i<N; ++i)
 44         {
 45             next[L][i]=-1;
 46         }
 47         ends[L++]=0;
 48         return L-1;
 49     }
 50     void init()  //创建根节点
 51     {
 52         L=0;
 53         root=New_node();
 54     }
 55     void inserts(char s[])  //往字典树里面插入新字符串
 56     {
 57         int len=strlen(s);
 58         int now=root;
 59         for(int i=0; i<len; ++i)
 60         {
 61             if(next[now][w[s[i]]]==-1)
 62                 next[now][w[s[i]]]=New_node();
 63             now=next[now][w[s[i]]];
 64         }
 65         ends[now]=1;  //病毒串终点要标记
 66     }
 67     void build()
 68     {
 69         queue<int>r;
 70         fail[root]=root;
 71         for(int i=0; i<N; ++i)
 72         {
 73             if(next[root][i]==-1)
 74             {
 75                 next[root][i]=root;
 76             }
 77             else
 78             {
 79                 fail[next[root][i]]=root;
 80                 r.push(next[root][i]);
 81             }
 82         }
 83         while(!r.empty())
 84         {
 85             int now=r.front();
 86             r.pop();
 87             if(ends[fail[now]])  //如果now节点的失败指针指向病毒串终点，那么也要把now这个点给标记了
 88             {
 89                 ends[now]=1;
 90             }
 91             for(int i=0; i<N; ++i)
 92             {
 93                 if(next[now][i]==-1)
 94                 {
 95                     next[now][i]=next[fail[now]][i];  
 96                 }
 97                 else
 98                 {
 99                     fail[next[now][i]]=next[fail[now]][i]; 
100                     r.push(next[now][i]);
101                 }
102             }
103         }
104     }
105     Matrix Build_c()
106     {
107         Matrix res=Matrix(L);
108         for(int i=0; i<L; ++i)
109         {
110             for(int j=0; j<N; ++j)
111             {
112                 if(ends[next[i][j]]==0)  //创建走一步的ans矩阵
113                 {
114                     res.mat[i][next[i][j]]++;
115                 }
116             }
117         }
118         return res;
119     }
120 };
121 char s[20];
122 Trie ac;
123 Matrix pow_M(Matrix a,int n)
124 {
125     Matrix ret = Matrix(a.n);
126     for(int i = 0; i < ret.n; i++)
127         ret.mat[i][i]=1;
128     Matrix tmp=a;
129     while(n)
130     {
131         if(n&1)ret=ret*tmp;
132         tmp=tmp*tmp;
133         n>>=1;
134     }
135     return ret;
136 }
137 int main()
138 {
139     w['A']=0;
140     w['C']=1;
141     w['G']=2;
142     w['T']=3;
143     while(~scanf("%d%d",&n,&m))
144     {
145         ac.init();
146         while(n--)
147         {
148             scanf("%s",s);
149             ac.inserts(s);
150         }
151         ac.build();
152         Matrix ans=ac.Build_c();
153         ans=pow_M(ans,m);
154         int sum=0;
155         for(int i=0; i<ac.L; ++i)
156             sum+=ans.mat[0][i],sum%=mod;
157         printf("%d\n",sum);
158     }
159     return 0;
160 }

posted @ 2019-12-12 15:58 kongbursi 阅读(210) 评论(0) 编辑收藏举报

刷新页面返回顶部

哟吼！

DNA Sequence POJ - 2778 AC自动机 && 矩阵快速幂

公告