Python批量删除字符串中两个字符中间值

之前我在爬取豆瓣电影,遇到一些问题,比如

<span class="actor"><span class='pl'>主演</span>: <span class='attrs'><a href="/subject_search?search_text=Arash%20Marandi" rel="v:starring">Arash Marandi

</a> / <a href="/subject_search?search_text=Flor%20Eduarda%20Gurrola" rel="v:starring">Flor Eduarda Gurrola</a> / <a href="/celebrity/1352068/" rel="v:
starring">路易斯·阿伯提</a> / <a href="/celebrity/1291820/" rel="v:starring">埃利希奥·梅兰德斯</a> / <a href="/subject_search?search_text=Eduardo%20Mendiz%
C3%A1bal
" rel="v:starring">Eduardo Mendizábal</a> / <a href="/subject_search?search_text=Edwarda%20Gurrola" rel="v:starring">Edwarda Gurrola</a> /

<a href="/subject_search?search_text=Uriel%20Ledesma" rel="v:starring">Uriel Ledesma</a> / <a href="/subject_search?search_text=Ishbel%20Mata" rel="

v:starring">Ishbel Mata</a></span></span><br/>

  我想爬取下来里面的所有主演(动态数量),不能固定的直接用正则爬取,但是可以直接

     items = re.findall('<span class="actor"><span class=.*?>主演</span>: <span class=.*?><a href=".*?" rel="v:starring">(.*?)</a></span></span><br/>',page,re.S)

  爬取下来之后,直接

items1=items[0][0]
     print(re.sub('</a>.*?>', '/', items1))

  结果如下:

Arash Marandi/Flor Eduarda Gurrola/路易斯·阿伯提/埃利希奥·梅兰德斯/Eduardo Mendizábal/Edwarda Gurrola/Uriel Ledesma/Ishbel Mata

 

在这之前,我自已定义函数:

def delete_word(code):
    temp = re.findall('</a>(.*?)">',code, re.S)
    return temp

  

 # for i in range(len((delete_word(items[0][2])))):
     #     print(items1.replace(delete_word(items[0][2])[i],""))
     # print(items1)
         # print("导演:"+items[0][1].replace(str(delete_word(items[0][1])),"").replace("</a>\">","/"))
         # print("编剧:"+items[0][2].replace(str(delete_word(items[0][2])), "").replace("</a>\">", "/"))
         # print(len(items))

  反而不行,所以说在一条路走不通的时候,换条路走。

posted @ 2020-04-17 16:33  Double晨  阅读(893)  评论(0编辑  收藏  举报