[Pku 3691 1625] 字符串(四) {自动机应用}

{

这一篇文章基于AC自动机

介绍AC自动机的应用

引出AC自动机的优化 有限状态自动机

}

AC自动机实质上是一个图

为了解决某些关于字符串的最优化或是统计问题

我们可以结合AC自动机和动态规划

也就是在自动机上动态规划

先来看一个具体的问题

Pku 3691 http://poj.org/problem?id=3691

题意

　　给定N个模式串(1 ≤ N ≤ 50) 最大长度为20

　　一个主串(长最大为1000)

　　允许涉及的字符为4个 {'A','T','G','C'}

　　求最少修改几个字符使主串不包含所有模式串

分析

我们可以考虑一个最简单的算法

搜索枚举每一位的值然后比较复杂度达到了O(4^N)

时间复杂度会太大考虑如何优化

先看一个样例

模式串{'AT','G'}

那么无论是G AG AAG AAAG 还是AAAAAAAG

他们的宿命都是一样的最后与G发生匹配导致不是可行解

再看AA AC CC CA TA TT TC CT 等等

也是一样的只要不含G 不是AT 都是可行解

我们发现似乎有很多状态都是统一的 这就是搜索低效的原因

那么如何让这些雷同的状态统一起来呢？

考虑到这是一个涉及到多模式串匹配的问题

我们尝试建立AC自动机

为了讨论方便我们换一个样例

模式串{'AT','TAT','C','GC'} 建立AC自动机如下图

我们发现很多串在AC自动机上匹配的时候 都走到了同一个节点

譬如 AA GA TAA GAA 等等都走到了根节点而GG TG等等都走到了G节点

特别的 含有模式串的字符串都走到过粉色节点

我们考虑是不是可以用模式串在AC自动机上匹配最后走到的节点来概括这个字符串的状态呢?

到这里暂停一下请思考能否用有向图上的动态规划来代替搜索?

需要两维来记录状态 Opt[i][j]

其中第一维代表走到第i步(也就是主串处理到第i位)

第二维代表在AC自动机上走到j节点

记录当前最少修改几个字符使得不包含所有模式串

也就是不能走到任何一个粉红色节点

这个状态用顺推比较方便

也就是用Opt[i][j]+0/1去更新Opt[i+1][Next[j]]

Next[j]就是决策第i+1位是什么的时候 下一步在AC自动机上会走到哪里

有可能通过j节点一步就到也有可能通过Fail指针回溯 这个和匹配基本类似

具体要不要+1需要看决策修改还是不修改第i+1位

注意决策的时候不要走到粉色节点上!

最后Max{Opt[Length(MainString)][j]}就是答案了

实际上通过设立第一维由第I步推出第I+1步 保证了动态规划的拓扑性质

第二维用字符串在AC自动机上的结束节点很好的概括了字符串的状态

这就是这个DP的核心思想

给出核心的DP代码

DPonAC

    fillchar(opt,sizeof(opt),$3F);
    opt[0][root]:=0;
    for i:=0 to m-1 do
        for j:=1 to tt do
            for k:=1 to 4 do
                if not d[j]
                    then begin
                    p:=j;
                    while (p<>root)and(s[p][k]=0) do
                        p:=f[p];
                    if s[p][k]=0
                        then p:=root
                        else p:=s[p][k];
                    if d[p] then continue;
                    t:=i+1;
                    h:=opt[i][j]+1;
                    if k=tr[c[t]]
                        then if opt[t][p]>opt[i][j]
                            then opt[t][p]:=opt[i][j]
                            else
                        else if opt[t][p]>h
                            then opt[t][p]:=h;
                    end;
    ans:=oo;
    for i:=1 to tt do
        if ans>opt[m][i]
            then ans:=opt[m][i];
    write('Case ',ca,': ');

完整代码在文章最后

请注意理解Next[j]的含义再向下阅读

我们在实现上面一个问题的时候可以注意到一个问题

如果我们确定了从节点j开始具体通过哪一个决策来转移 Next[j]是固定的

我们萌生了一个优化的思路先预处理出所有决策K对应的Next[j]的边

对于上面的AC自动机(Fail全指向了根省略)我们可以构建出这样的图

实际上就是在构建AC自动机的时候如果某个节点的儿子不存在

就通过Fail指针回溯 直到有长辈节点这样的儿子存在或到根节点为止

把本来不存在的儿子指针指向某个长辈的儿子或者根即可

核心的构建代码如下

Re Build

h:=1; t:=0;
    f[root]:=root;
    for i:=1 to 4 do
        if s[root][i]<>0
            then begin
            inc(t);
            q[t]:=s[root][i];
            f[q[t]]:=root;
            end
            else s[root][i]:=root;
    while h<=t do
        begin
        for i:=1 to 4 do
            if s[q[h]][i]<>0
                then begin
                inc(t);
                q[t]:=s[q[h]][i];
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                if s[p][i]=0
                    then f[q[t]]:=root
                    else begin
                    if d[s[p][i]]=1
                        then d[q[t]]:=1;
                    f[q[t]]:=s[p][i];
                    end;
                end
                else begin
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                if s[p][i]=0
                    then s[q[h]][i]:=root
                    else s[q[h]][i]:=s[p][i];
                end;
        inc(h);
        end;

这样我们的DP过程就可以大大的简化

不用再每次回溯了

reDP

    fillchar(opt,sizeof(opt),$3F);
    opt[0][1]:=0;
    for i:=0 to m-1 do
        for j:=1 to tt do
            for k:=1 to 4 do
                if (d[j]=0)and(d[s[j][k]]=0)
                    then begin
                    t:=i+1;
                    if k=tr[c[t]]
                        then temp:=opt[i][j]
                        else temp:=opt[i][j]+1;
                    if temp<opt[t][s[j][k]]
                        then opt[t][s[j][k]]:=temp;
                    end;
    ans:=oo;
    for j:=1 to tt do
        if opt[m][j]<ans
            then ans:=opt[m][j];
    write('Case ',ca,': ');

注意到现在的这个图和AC自动机的本质不同

首先每个节点都有所有的后继了 而且每个后继状态都是确定的

确定性和有限性成为这种图和AC自动机的根本区别

这是AC自动机的一种优化

我们称这种图为 有限状态自动机(DFA)

具有这些性质我们可以实现更为强大的功能

下一篇文章将会探讨有限状态自动机的具体优势

Pku1625 是Pku3961的类似题只不过改成了统计种数

方程和更新过程稍微变变即可

注意需要高精度

代码在文章最后

Bob Han原创转载请注明出处http://www.cnblogs.com/Booble/

代码:

Rdna1

const    maxn=2000;
    oo=maxlongint;
var    tr:array['A'..'Z']of longint;
    root,tt,n,m,h,t,i,j,k,temp,p,ans,ca:longint;
    opt:array[0..maxn,1..maxn]of longint;
    s:array[1..maxn,1..4]of longint;
    d,q,f:array[1..maxn]of longint;
    c:array[1..maxn]of char;
    ch:char;
procedure allot(var x:longint);
var    i:longint;
begin
inc(tt); x:=tt;
d[x]:=0; f[x]:=0;
for i:=1 to 4 do
    s[x][i]:=0;
end;
begin
assign(input,'RDNA.in'); reset(input);
assign(output,'RDNA1.out'); rewrite(output);
tr['A']:=1; tr['T']:=2;
tr['G']:=3; tr['C']:=4;
ca:=0;
readln(n);
while n<>0 do
    begin
    inc(ca);
    tt:=0;
    allot(root);
    for i:=1 to n do
        begin
        p:=root;
        while not eoln do
            begin
            read(ch);
            temp:=tr[ch];
            if s[p][temp]=0
                then allot(s[p][temp]);
            p:=s[p][temp];
            end;
        d[p]:=1;
        readln;
        end;
    h:=1; t:=0;
    f[root]:=root;
    for i:=1 to 4 do
        if s[root][i]<>0
            then begin
            inc(t);
            q[t]:=s[root][i];
            f[q[t]]:=root;
            end
            else s[root][i]:=root;
    while h<=t do
        begin
        for i:=1 to 4 do
            if s[q[h]][i]<>0
                then begin
                inc(t);
                q[t]:=s[q[h]][i];
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                if s[p][i]=0
                    then f[q[t]]:=root
                    else begin
                    if d[s[p][i]]=1
                        then d[q[t]]:=1;
                    f[q[t]]:=s[p][i];
                    end;
                end
                else begin
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                if s[p][i]=0
                    then s[q[h]][i]:=root
                    else s[q[h]][i]:=s[p][i];
                end;
        inc(h);
        end;
    m:=0;
    while not eoln do
        begin
        inc(m);
        read(c[m]);
        end;
    readln;
    fillchar(opt,sizeof(opt),$3F);
    opt[0][1]:=0;
    for i:=0 to m-1 do
        for j:=1 to tt do
            for k:=1 to 4 do
                if (d[j]=0)and(d[s[j][k]]=0)
                    then begin
                    t:=i+1;
                    if k=tr[c[t]]
                        then temp:=opt[i][j]
                        else temp:=opt[i][j]+1;
                    if temp<opt[t][s[j][k]]
                        then opt[t][s[j][k]]:=temp;
                    end;
    ans:=oo;
    for j:=1 to tt do
        if opt[m][j]<ans
            then ans:=opt[m][j];
    write('Case ',ca,': ');
    if ans>maxn
        then writeln(-1)
        else writeln(ans);
    readln(n);
    end;
close(input); close(output);
end.

Rdna2

const    maxn=1000;
    oo=$3F3F3F3F;
var    tr:array['A'..'Z']of longint;
    opt:array[0..maxn,1..maxn]of longint;
    s:array[1..maxn,1..4]of longint;
    f,q:array[1..maxn]of longint;
    d:array[1..maxn]of boolean;
    c:array[1..maxn]of char;
    root,p,n,m,i,j,k,h,t,tt,ans,temp,ca:longint;
    ch:char;
procedure allot(var x:longint);
var    i:longint;
begin
inc(tt); x:=tt;
d[x]:=false; f[x]:=0;
for i:=1 to 4 do
    s[x][i]:=0;
end;
begin
assign(input,'RDNA.in'); reset(input);
assign(output,'RDNA2.out'); rewrite(output);
tr['A']:=1; tr['C']:=2;
tr['G']:=3; tr['T']:=4;
readln(n);
while n<>0 do
    begin
    tt:=0;
    inc(ca);
    allot(root);
    for i:=1 to n do
        begin
        p:=root;
        while not eoln do
            begin
            read(ch);
            k:=tr[ch];
            if s[p][k]=0
                then allot(s[p][k]);
            p:=s[p][k];
            end;
        d[p]:=true;
        readln;
        end;
    h:=1; t:=0;
    f[root]:=root;
    for i:=1 to 4 do
        if s[root][i]<>0
            then begin
            inc(t);
            q[t]:=s[root][i];
            f[q[t]]:=root;
            end;
    while h<=t do
        begin
        for i:=1 to 4 do
            if s[q[h]][i]<>0
                then begin
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                inc(t); q[t]:=s[q[h]][i];
                if s[p][i]=0
                    then f[q[t]]:=root
                    else f[q[t]]:=s[p][i];
                if d[f[q[t]]]
                    then d[q[t]]:=true;
                end;
        inc(h);
        end;
    m:=0;
    while not eoln do
        begin
        inc(m);
        read(c[m]);
        end;
    readln;
    fillchar(opt,sizeof(opt),$3F);
    opt[0][root]:=0;
    for i:=0 to m-1 do
        for j:=1 to tt do
            for k:=1 to 4 do
                if not d[j]
                    then begin
                    p:=j;
                    while (p<>root)and(s[p][k]=0) do
                        p:=f[p];
                    if s[p][k]=0
                        then p:=root
                        else p:=s[p][k];
                    if d[p] then continue;
                    t:=i+1;
                    h:=opt[i][j]+1;
                    if k=tr[c[t]]
                        then if opt[t][p]>opt[i][j]
                            then opt[t][p]:=opt[i][j]
                            else
                        else if opt[t][p]>h
                            then opt[t][p]:=h;
                    end;
    ans:=oo;
    for i:=1 to tt do
        if ans>opt[m][i]
            then ans:=opt[m][i];
    write('Case ',ca,': ');
    if ans=oo
        then writeln(-1)
        else writeln(ans);
    readln(n);
    end;
close(input); close(output);
end.

Censored!

const    maxm=50;
    maxk=50;
    maxn=101;
    maxl=16;
    base=1000000;
    maxc=char(255);
var    tr:array['!'..maxc]of longint;
    n,m,h,t,tt,root,p,i,j,k:longint;
    opt:array[0..maxm,1..maxn,1..maxl]of longint;
    l:array[0..maxm,1..maxn]of longint;
    temp:array[1..maxl]of longint;
    s:array[1..maxn,1..maxk]of longint;
    f,q:array[1..maxn]of longint;
    d:array[1..maxn]of boolean;
    ch:char;
procedure allot(var x:longint);
var    i:longint;
begin
inc(tt); x:=tt;
d[x]:=false; f[x]:=0;
for i:=1 to n do
    s[x][i]:=0;
end;
procedure plus(x,y,a,b:longint);
var    i,m:longint;
begin
fillchar(temp,sizeof(temp),0);
if l[x][y]<l[a][b]
    then m:=l[a][b]
    else m:=l[x][y];
for i:=1 to m do
    begin
    temp[i]:=temp[i]+opt[x][y][i]+opt[a][b][i];
    temp[i+1]:=temp[i+1]+temp[i] div base;
    temp[i]:=temp[i] mod base;
    end;
if temp[m+1]=0
    then l[x][y]:=m
    else l[x][y]:=m+1;
for i:=1 to l[x][y] do
    opt[x][y][i]:=temp[i];
end;
begin
assign(input,'Censor.in'); reset(input);
assign(output,'Censor.out'); rewrite(output);
readln(n,m,t);
for i:=1 to n do
    begin
    read(ch);
    tr[ch]:=i;
    end;
readln;
tt:=0;
allot(root);
for i:=1 to t do
    begin
    p:=root;
    while not eoln do
        begin
        read(ch);
        k:=tr[ch];
        if k=0
            then begin
            while true do;
            end;
        if s[p][k]=0
            then allot(s[p][k]);
        p:=s[p][k];
        end;
    d[p]:=true;
    readln;
    end;
h:=1; t:=0;
f[root]:=root;
for i:=1 to n do
    if s[root][i]<>0
        then begin
        inc(t);
        q[t]:=s[root][i];
        f[q[t]]:=root;
        end;
while h<=t do
    begin
    for i:=1 to n do
        if s[q[h]][i]<>0
            then begin
            inc(t);
            q[t]:=s[q[h]][i];
            p:=f[q[h]];
            while (p<>root)and(s[p][i]=0) do
                p:=f[p];
            if s[p][i]=0
                then f[q[t]]:=root
                else f[q[t]]:=s[p][i];
            if d[f[q[t]]]
                then d[q[t]]:=true;
            end;
    inc(h);
    end;
l[0][root]:=1;
opt[0][root][1]:=1;
for i:=0 to m-1 do
    for j:=1 to tt do
        for k:=1 to n do
            if not d[j]
                then begin
                p:=j;
                while (p<>root)and(s[p][k]=0) do
                    p:=f[p];
                if s[p][k]=0
                    then p:=root
                    else p:=s[p][k];
                if d[p] then continue;
                plus(i+1,p,i,j);
                end;
opt[0][1][1]:=0;
for i:=1 to tt do
    plus(0,1,m,i);
t:=l[0][1];
write(opt[0][1][t]);
dec(t);
for i:=t downto 1 do
    begin
    if opt[0][1][i]<100000 then write(0);
    if opt[0][1][i]<10000 then write(0);
    if opt[0][1][i]<1000 then write(0);
    if opt[0][1][i]<100 then write(0);
    if opt[0][1][i]<10 then write(0);
    write(opt[0][1][i]);
    end;
writeln;
close(input); close(output);
end.

h:=1; t:=0;
    f[root]:=root;
    for i:=1 to 4 do
        if s[root][i]<>0
            then begin
            inc(t);
            q[t]:=s[root][i];
            f[q[t]]:=root;
            end
            else s[root][i]:=root;
    while h<=t do
        begin
        for i:=1 to 4 do
            if s[q[h]][i]<>0
                then begin
                inc(t);
                q[t]:=s[q[h]][i];
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                if s[p][i]=0
                    then f[q[t]]:=root
                    else begin
                    if d[s[p][i]]=1
                        then d[q[t]]:=1;
                    f[q[t]]:=s[p][i];
                    end;
                end
                else begin
                p:=f[q[h]];
                while (p<>root)and(s[p][i]=0) do
                    p:=f[p];
                if s[p][i]=0
                    then s[q[h]][i]:=root
                    else s[q[h]][i]:=s[p][i];
                end;
        inc(h);
        end;

posted on 2010-12-07 20:53 Master_Chivu 阅读(2701) 评论(2) 编辑收藏举报

刷新页面返回顶部

[Pku 3691 1625] 字符串(四) {自动机应用}

导航

公告