Regex - Greedy vs Lazy

We need to add a prefix "prefix-" to <MsgId><Id> element in xml messages but there might be a new line between <MsgId> and <Id>.

Example of the xml messages are

复制代码
        <MsgId>
          <Id>XXX</Id>
          <CreDtTm>2018-03-22T09:05:24.334054Z</CreDtTm>
        </MsgId>
        <Els>
              <Id>BBB</Id>
        </Els>



        <MsgId><Id>XXX</Id>
          <CreDtTm>2018-03-22T09:05:24.334054Z</CreDtTm>
        </MsgId>
        <Els> <Id>BBB</Id></Els>
复制代码

1. Greedy match

 file_content = re.sub('(<MsgId>.*(\\n)?.*<Id>)', r'\1' + PARALLEL_PREFIX, file_content)

regular expression will match the element using greedy method by default, so the above regex will actually match "<MsgId>...</MsgId>..<Els><Id>" not "<MsgId><Id>"

2. Lazy match

Adding "?" to quantifiers like "+", "?", "*" will do a lazy match instead of greedy match. so

 file_content = re.sub('(<MsgId>.*?(\\n)??.*?<Id>)', r'\1' + PARALLEL_PREFIX, file_content)

the above regex will match "<MsgId><Id>", generally means match the keyword using as less characters as possible.

posted @   小张的练习室  阅读(179)  评论(0)    收藏  举报
努力加载评论中...
点击右上角即可分享
微信分享提示