ABAP学习(22):正则表达式使用

ABAP 正则表达式

 ABAP支持正则表达式。

支持正则表达式的语句:

1.FIND,REPLACE语句;

2.Functions:count,count_xxx,contains,find,find_xxx,match,matches,replace,substring,substring_xxx;

3.类:CL_ABAP_REGEX,CL_ABAP_MATCHER;

 

1.     正则表达式语句规则

1.1 Single Character Patterns

单个普通字符:A-B,0-9等单个字符,以及一些特殊字符通过反斜杠(\)转义变成普通字符;

特殊字符:. , [,],-,^这些字符作为特殊操作符,-,^只有在[]中有特殊意义;

示例:

"1.Single Character Patterns 
  "示例:
  "regex:A  string:a  结果:不匹配
  "regex:AB string:A  结果:不匹配
  IF cl_abap_matcher=>matches( pattern = 'A' text = 'A' ) = abap_true.
    WRITE:/ '1.true'.
  ENDIF.

  ".,[,],-,^特殊操作字符
  ".可以替换任意单个字符;
  "\使用反斜杠将特殊字符变成普通字符;
  "\和一些字符一起表示一组字符(不能再[]中使用):
  "1.\C:表示字母字符集;
  "2.\d:表示数字字符集;
  "3.\D:表示非数字字符集;
  "4.\l:表示小写字符集;
  "5.\L:表示非小写字符集;
  "6.\s:表示空白字符;
  "7.\S:表示非空白字符;
  "8.\u:表示大写字符集;
  "9.\U:表示非大写字符集;
  "10.\w:表示字母数字下划线字符集;
  "11.\W:表示非字母数字下划线字符集;

  "[]表示一个字符集,只需要匹配字符集中一个字符,表示匹配;
  "[^x]表示对该字符集取反,只需要不匹配字符集中任意字符,表示匹配;
  "[x-x]表示字符集范围,A-Z,a-z,0-1等;
  "ABAP定义的字符集
  "1.[:alnum:]字母数字集;
  "2.[:alpha:]字母集;
  "3.[:digit:]数字集;
  "4.[:blank:]空白字符,水平制表符;
  "5.[:cntrl:]所有控制字符集;
  "6.[:graph:]可显示字符集,除空白和水平制表符;
  "7.[:lower:]小写字符集;
  "8.[:print:]所有可显示字符的集合([:graph:]和[:blank:]的并集);
  "9.[:punct:]所有标点字符集;
  "10.[:space:]所有空白字符、制表符和回车符的集合;
  "11.[:unicode:]字符表示大于255的所有字符集(仅在Unicode系统中);
  "12.[:upper:]所有大写字符集;
  "13.[:word:]包括下划线在内的所有字母数字字符集_;
  "14.[:xdigit:]所有十六进制数字的集合(“0”-“9”,“A”-“F”,和“A”-“F”);

  "示例:
  "regex:\. string:.  结果:匹配
  "regex:\C string:A  结果:匹配
  "regex:.. string:AB 结果:匹配
  "regex:[ABC]  string:A  结果:匹配
  "regex:[AB][CD] string:AD 结果:匹配
  "regex:[^A-Z] string:1 结果:匹配
  "regex:[A-Z-] string:- 结果:匹配
  IF cl_abap_matcher=>matches( pattern = '[A-Z-]' text = 'A' ) = abap_true.
    WRITE:/ '2.true'.
  ENDIF.

1.2 Character string patterns

       多正则表达式连接匹配。

       特殊字符{,},*,+,?,(,),|,\

示例:

 "2.Character string patterns 
  "示例:
  "regex:h[ae]llo  string:hello 结果:匹配;
  "regex:h[ae]llo  string:hallo 结果:匹配;
  IF cl_abap_matcher=>matches( pattern = '[A-Z-]' text = 'A' ) = abap_true.
    WRITE:/ '3.true'.
  ENDIF.

  "{,},*,+,?,(,),|,\特殊字符
  "x{n}:表示修饰的字符出现n次;
  "x{n,m}:表示修饰字符出现n~m次;
  "x*:表示修饰字符出现{0,}次;
  "x+:表示修饰字符出现{1,}次;
  "x?:表示修饰字符出现{0,1}次;
  "a|b:表示匹配a或b字符;
  "():表示分组匹配
  "(?:xxx):表示xxx出现一次
  "使用\1,\2代表分组从左到右
  "\Qxxx\E之间的特殊字符变成普通字符
  "示例:
  "regex:hi{2}o string:hiio 结果:匹配
  "regex:hi{1,3}o string:hiiio 结果:匹配
  "regex:hi?o  string:ho 结果:匹配
  "regex:hi*o    string:ho  结果:匹配
  "regex:hi+o    string:hio  结果:匹配
  "regex:.{0,4}  string:匹配0~4个字符
  "regex:a|bb|c   string:bb    结果:匹配
  "regex:h(a|b)o string:hao   结果:匹配
  "regex:(a|b)(?:ac)  string:bac  结果:匹配
  "regex:(").*\1  string:"hi"  结果:匹配
  IF cl_abap_matcher=>matches( pattern = '(a|b)(?:ac)' text = 'bac' ) = abap_true.
    WRITE:/ '4.true'.
  ENDIF.
  IF cl_abap_matcher=>matches( pattern = '(").*\1' text = '"hi"' ) = abap_true.
    WRITE:/ '5.true'.
  ENDIF.

  DATA:TEXT type STRING.
  DATA:result_tab TYPE match_result_tab.
  DATA:wa_result_tab TYPE match_result.
  text = 'aaaaaabaaaaaaacaaaa'.
  FIND ALL OCCURRENCES OF REGEX '(a+)(a)' IN text RESULTS result_tab.
  WRITE:/ text.
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.

1.3 Search Pattern

       开始结尾字符匹配

示例:

 "3.Search Pattern
  "特殊字符:^,$,\,(,),=,!
  "示例1:Start and end of a line
  "^,$表示前置符号,结尾符号,每一行
  text = |Line1\nLine2\nLine3|.
  FIND ALL OCCURRENCES OF REGEX '^'
     IN text RESULTS result_tab.
  WRITE:/ text.
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.
  FIND ALL OCCURRENCES OF REGEX '$'
     IN text RESULTS result_tab.
  WRITE:/ text.
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.

  "示例2:Start and end of a character string
  "\A,\z作为前置符号,结尾符号,字符串开始结尾
  DATA:t_text(10) TYPE c.
  DATA:t_text_tab LIKE TABLE OF text.
  APPEND '     Smile' TO t_text_tab.
  APPEND '     Smile' TO t_text_tab.
  APPEND '     Smile' TO t_text_tab.
  APPEND '     Smile' TO t_text_tab.
  APPEND '     Smile' TO t_text_tab.
  APPEND '     Smile' TO t_text_tab.
  FIND ALL OCCURRENCES OF regex '\A(?:Smile)|(?:Smile)\z'
       IN TABLE t_text_tab RESULTS result_tab.
  WRITE:/ 'Smile匹配'.
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.

  "示例3
  "\z匹配最后行,\Z忽略换行匹配最后字符
  text = |... this is the end\n\n\n|.
  FIND REGEX 'end\z' IN text.
  IF sy-subrc <> 0.
    WRITE  / `There's no end.`.
  ENDIF.
  FIND  REGEX 'end\Z' IN text.
  IF sy-subrc = 0.
    WRITE / `The end is near the end.`.
  ENDIF.

  "示例4:Start and End of Word
  "\<,\>也表示匹配开头,结尾单词
  "\b表示开头结尾匹配
  "查找s开头
  text = `Sometimes snow seems so soft.`.
  FIND ALL OCCURRENCES OF regex '\<s'
       IN text IGNORING CASE
       RESULTS result_tab.
  WRITE:/ 's开头',text.
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.
  FIND ALL OCCURRENCES OF regex 's\b'
       IN text IGNORING CASE
       RESULTS result_tab.
  WRITE:/ 's开头或结尾',text.
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.

  "示例5:Preview Condition
  "预定义匹配内容不作为匹配结果内容
  "(?=x),相当于匹配x
  "(?!x),相当于不匹配x
  text = `Shalalala!`.
  FIND ALL OCCURRENCES OF REGEX '(?:la)(?=!)'
       IN text RESULTS result_tab.
  WRITE:/ text.
  "这里匹配到最后'la','!'不作为匹配到内容
  LOOP AT result_tab INTO wa_result_tab.
    WRITE:/ wa_result_tab-line,wa_result_tab-offset,wa_result_tab-length.
  ENDLOOP.

  "示例6:Cut operator
  DATA:s_text TYPE string.
  DATA:moff TYPE i.
  DATA:mlen TYPE i.
  s_text = `xxaabbaaaaxx`.

  FIND REGEX 'a+b+|[ab]+' IN text
    MATCH OFFSET moff
    MATCH LENGTH mlen.
  WRITE:/ s_text.
  IF sy-subrc = 0.
    WRITE:/ moff.
    WRITE:/ mlen.
    WRITE:/ text+moff(mlen).
  ENDIF.

  FIND REGEX '(?>a+b+|[ab]+)' IN text
    MATCH OFFSET moff
    MATCH LENGTH mlen.
  WRITE:/ s_text.
  IF sy-subrc = 0.
    WRITE:/ moff.
    WRITE:/ mlen.
    WRITE:/ text+moff(mlen).
  ENDIF.

  FIND REGEX '(?>a+|a)a' IN text
    MATCH OFFSET moff
    MATCH LENGTH mlen.
  WRITE:/ s_text.
  IF sy-subrc <> 0.
    WRITE:/ moff.
    WRITE:/ mlen.
    WRITE:/ 'Nothing found'.
  ENDIF.

1.4 Replace Patterns

       替换字符REPLACE

示例:

"4.Replace Patterns
  "REPLACE关键词替换字符
  "特殊字符:$,&,`,`
  "示例1:Addressing the Full Occurrence
  text = `Yeah!`.
  REPLACE REGEX `\w+` IN text WITH `$0,$&`.
  WRITE:/ text.

  "示例2:Addressing the Registers of Subgroups
  "自身分组替换,返回`CBA'n'ABC`
  text = `ABC'n'CBA`.
  REPLACE REGEX `(\w+)(\W\w\W)(\w+)` IN text WITH `$3$2$1`.
  WRITE:/ text.

  "示例3:Addressing the Text Before the Occurrence
  text = `ABC and BCD`.
  REPLACE REGEX 'and' IN text WITH '$0 $`'.
  "ABC and ABC  BCD
  WRITE:/ text.

1.5 Simplified Regular Expressions

       简化正则表达式

示例:

 "5.Simplified Regular Expressions
  "这个类CL_ABAP_REGEX,仅支持简化正则表达式
  "不支持+,|,(?=),(?!),(?:);
  "{} => \{\}
  "() => \(\)
  "示例1
  DATA:lo_regex TYPE REF TO cl_abap_regex.
  DATA:t_res   TYPE match_result_tab.
  DATA:wa_res  TYPE match_result.
  "不使用simplified Regular,+表示前面字符出现{1,}
  CREATE OBJECT lo_regex
    EXPORTING
      pattern      = 'a+'
      ignore_case  = abap_true "忽略大小写
      simple_regex = abap_false.
  FIND ALL OCCURRENCES OF REGEX lo_regex IN 'aaa+bbb' RESULTS t_res.
  LOOP AT t_res INTO wa_res.
     WRITE:/ wa_res-line,wa_res-offset,wa_res-length.
  ENDLOOP.
  "使用simplified Regular,+表示普通+
  CREATE OBJECT lo_regex
    EXPORTING
      pattern      = 'a+'
      simple_regex = abap_true.
  FIND ALL OCCURRENCES OF REGEX lo_regex IN 'aaa+bbb' RESULTS t_res.
  LOOP AT t_res INTO wa_res.
     WRITE:/ wa_res-line,wa_res-offset,wa_res-length.
  ENDLOOP.

1.6 Special Characters in Regular Expressions

示例:

  "6.Special Characters in Regular Expressions
  "正则表达式中特殊表达式
  "\ Escape character for special characters
  "反斜杠转义字符
  "$0, $& Placeholder for the whole found location
  "$1, $2, $3... Placeholder for the registration of subgroups
  "$` Placeholder for the text before the found location
  "$' Placeholder for the text after the found location

2.     正则表达式使用

2.1FIND,REPLACE关键词

示例:

 "使用FIND,REPLACE关键词
  "FIND
  "语法:FIND [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF] pattern
  "  IN [section_of] dobj
  "  [IN {CHARACTER|BYTE} MODE]
  "  [find_options].
  "pattern = {[SUBSTRING] substring} | {REGEX regex}
  "可以查找substring或匹配regex
  "section_of = SECTION [OFFSET off] [LENGTH len] OF
  "可以指定查找dobj字符串匹配范围,off匹配开始位置,len偏移长度
  "find_options = [{RESPECTING|IGNORING} CASE]
  "   [MATCH COUNT  mcnt]
  "   { {[MATCH OFFSET moff]
  "   [MATCH LENGTH mlen]}
  "  | [RESULTS result_tab|result_wa] }
  "  [SUBMATCHES s1 s2 ...]
  "mcnt:匹配次数,如果first occurrence,mcnt一直为1
  "moff:最后一次匹配偏移值,如果是first occurrence,则是第一次匹配值
  "mlen:最后一次匹配字符串长度,如果是first occurence,则是第一次匹配值
  "submatches:分组匹配字符串
  "示例1
  DATA:s1   TYPE string.
  DATA:s2   TYPE string.
  text = `Hey hey, my my, Rock and roll can never die`.
  FIND REGEX `(\w+)\W+\1\W+(\w+)\W+\2` IN text
       IGNORING CASE
       MATCH OFFSET moff
       MATCH LENGTH mlen
       SUBMATCHES s1 s2.
  WRITE:/ moff,mlen,s1,s2.

  "REPLACE
  "语法:
  "1. REPLACE [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF] pattern
  "  IN [section_of] dobj WITH new
  "  [IN {CHARACTER|BYTE} MODE]
  "  [replace_options].
  "replace_options = [{RESPECTING|IGNORING} CASE]
  "  [REPLACEMENT COUNT  rcnt]
  "  {{[REPLACEMENT OFFSET roff][REPLACEMENT LENGTH rlen]}
  " |[RESULTS result_tab|result_wa]}

  "2. REPLACE SECTION [OFFSET off] [LENGTH len] OF dobj WITH new
  " [IN {CHARACTER|BYTE} MODE].
  text = 'hello1 world!22'.
  REPLACE
    ALL OCCURRENCES OF
    REGEX '[0-9]'
    IN SECTION OFFSET 0 LENGTH 10 OF text
    WITH '!'.
  WRITE:/ text.
  "指定位置范围替换
  REPLACE SECTION OFFSET 10 LENGTH 5 OF text WITH '!'.
  WRITE:/ text.

2.2使用function

       可以使用到正则表达式的function:find,count,match等方法。

示例:

 "使用function
  "find
  "返回匹配字符位置
  "语法:
  "1.find( val = text  {sub = substring}|{regex = regex}[case = case][off = off] [len = len] [occ = occ] )
  "2.find_end( val = text regex = regex [case = case][off = off] [len = len] [occ = occ] )
  "3.find_any_of( val = text  sub = substring [off = off] [len = len] [occ = occ] )
  "4.find_any_not_of( val = text  sub = substring [off = off] [len = len] [occ = occ] )
  "occ表是返回第几次匹配值,如果为正从左到右匹配,如果为负从右到左匹配
  "示例
  DATA:mocc TYPE I VALUE 1.
  DATA:result TYPE I.
  text = 'hello world world'.
  result = find( val = text sub = 'wo' case = abap_true off = moff len = mlen occ = mocc ).
  WRITE:/ text,result,moff,mlen,mocc.

  "count
  "返回匹配次数
  "语法:
  "1.count( val = text  {sub = substring}|{regex = regex} [case = case][off = off] [len = len] )
  "2.count_any_of( val = text  sub = substring [off = off] [len = len] )
  "3.count_any_not_of( val = text  sub = substring [off = off] [len = len] )
  result = count( val = text sub = 'wo' case = abap_true off = moff len = mlen ).
  WRITE:/ text,result,moff,mlen.

  "match
  "返回匹配结果子串
  "语法:
  "match( val = text regex = regex [case = case] [occ = occ] )
  DATA:s_result TYPE string.
  s_result = match( val = text regex = 'wor' case = abap_true occ = 1 ).
  WRITE:/ s_result.

  "contains
  "返回字符串是否包含子串,boolean
  "1.contains( val = text  sub|start|end = substring [case = case][off = off] [len = len] [occ = occ] )
  "2.contains( val = text regex = regex [case = case][off = off] [len = len] [occ = occ] )
  "3.contains_any_of( val = text sub|start|end = substring [off = off] [len = len] [occ = occ] )
  "4.contains_any_not_of( val = text sub|start|end = substring [off = off] [len = len] [occ = occ] )
  "off:匹配开始位置
  "len:从开始偏移量
  "occ:指定匹配次数,如果匹配字符串没有出现大于等于指定次数,返回false
  "case:大小写敏感
  text = 'abcdef egg help'.
  IF contains( val = text sub = 'e' case = abap_true off = 0 len = 15 occ = 2 ).
    WRITE:/ 'contains:匹配成功'.
  ENDIF.

  "matches
  "返回字符串匹配结果,boolean
  "语法:matches( val = text regex = regex [case = case] [off = off] [len = len] ) ...
  "示例:
  text = '33340@334.com'.
  "匹配邮箱
  IF matches( val   = text
              regex = `\w+(\.\w+)*@(\w+\.)+((\l|\u){2,4})` ).
    MESSAGE 'Format OK' TYPE 'S'.
  ELSEIF matches(
           val   = text
           regex = `[[:alnum:],!#\$%&'\*\+/=\?\^_``\{\|}~-]+`     &
                  `(\.[[:alnum:],!#\$%&'\*\+/=\?\^_``\{\|}~-]+)*` &
                  `@[[:alnum:]-]+(\.[[:alnum:]-]+)*`              &
                  `\.([[:alpha:]]{2,})` ).
    MESSAGE 'Syntax OK but unusual' TYPE 'S' DISPLAY LIKE 'W'.
  ELSE.
    MESSAGE 'Wrong Format' TYPE 'S' DISPLAY LIKE 'E'.
  ENDIF.

  "replace
  "替换指定范围字符串,off,len指定
  "1.replace( val = text [off = off] [len = len] with = new )
  "替换匹配字符子串
  "如果off有值,len = 0,表示插入到off处;
  "如果len有值,off = 0,替换头部len长度字符串;
  "如果off等于字符串长度,len=0,表示将子串拼接到字符串后;
  "2.replace( val = text {sub = substring}|{regex = regex} with = new [case = case] [occ = occ] )
  "occ指定替换次数
  "示例:
  text = 'hello world! welcome china!'.
  text = replace( val = text off = 0 len = 5 with = 'hi' ).
  WRITE:/ 'replace:',text.
  "这里只替换第一次匹配的'!'
  text = replace( val = text sub = '!' with = '.' case = abap_true occ = 1 ).
  WRITE:/ 'replace:',text.

  "substring
  "返回子字符串
  "1.substring( val = text [off = off] [len = len] )
  "2.substring_from( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len]  )
  "3.substring_after( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len] )
  "4.substring_before( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len]  )
  "5.substring_to( val = text {sub = substring}|{regex = regex}[case = case] [occ = occ] [len = len]  )
  text = 'ABCDEFGHJKLMN'.
  text = substring( val = text off = 0 len = 10 ).
  WRITE:/ 'substring:',text.
  "返回ABCDE,返回匹配子字符串,len指定返回长度
  text = 'ABCDEFGHJKLMN'.
  text = substring_from( val = text sub = 'ABCDEF' case = abap_true occ = 1 len = 5 ).
  WRITE:/ 'substring:',text.
  "返回DEFGH,返回查找到字符串后面len长度部分
  text = 'ABCDEFGHJKLMN'.
  text = substring_after( val = text sub = 'ABC' case = abap_true occ = 1 len = 5 ).
  WRITE:/ 'substring:',text.
  "返回DEFGH,返回查找到字符串前面len长度部分
  text = 'ABCDEFGHJKLMN'.
  text = substring_before( val = text sub = 'JKL' case = abap_true occ = 1 len = 5 ).
  WRITE:/ 'substring:',text.
  "返回GHJKL,返回查找到字符串前面len长度部分(包含匹配字符串)
  text = 'ABCDEFGHJKLMN'.
  text = substring_to( val = text sub = 'JKL' case = abap_true occ = 1 len = 5 ).
  WRITE:/ 'substring:',text. 

2.3使用cl_abap_regex,cl_abap_matcher

       类cl_abap_regex,用来创建正则表达式,cl_abap_matcher,用来进行匹配,查找,替换等操作。

示例:

 "使用类
  "CL_ABAP_REGEX
  "CL_ABAP_MATCHER
  DATA:lo_matcher TYPE REF TO cl_abap_matcher.
  DATA:ls_match TYPE match_result.
  DATA:lv_match TYPE C LENGTH 1.
  "直接使用cl_abap_matcher类方法matches
  IF cl_abap_matcher=>matches( pattern = 'ABC.*' text = 'ABCDABCE' ) = abap_true.
    "返回静态实例
    lo_matcher = cl_abap_matcher=>get_object( ).
    "获取匹配结果
    ls_match = lo_matcher->get_match( ).
    "cl_abap_matcher的attribute
    "text:匹配的字符串
    "table:匹配的table
    "regex:匹配的正则表达式
    WRITE:/ 'cl_abap_matcher:',lo_matcher->text,ls_match-offset,ls_match-length.
  ENDIF.

  "创建matcher对象,然后匹配
  lo_matcher = cl_abap_matcher=>create( pattern = 'A.*'
    ignore_case = abap_true
    text        = 'ABC' ).
  "匹配结果,匹配‘X’,不匹配为空
  lv_match = lo_matcher->match( ).
  WRITE:/ 'cl_abap_matcher:',lv_match.

  "创建cl_abap_regex,正则表达式对象
  "通过regex对象创建matcher
  "DATA: lo_regex TYPE REF TO cl_abap_regex.
  CREATE OBJECT lo_regex EXPORTING pattern = '^add.*' ignore_case = abap_true.
  lo_matcher = lo_regex->create_matcher( text = 'addition' ).
  lv_match = lo_matcher->match( ).
  WRITE:/'cl_abap_matcher:',lv_match.


  "创建matcher对象,使用构造方法
  DATA:t_result_tab TYPE MATCH_RESULT_TAB.
  DATA:s_result_tab TYPE MATCH_RESULT.
  CREATE OBJECT lo_regex EXPORTING pattern = 'A'.
  CREATE OBJECT lo_matcher EXPORTING REGEX = lo_regex TEXT = 'ABCDABCD'.
  t_result_tab = lo_matcher->find_all( ).
  LOOP AT t_result_tab INTO s_result_tab.
    WRITE:/ 'find_all:',s_result_tab-offset,s_result_tab-length.
  ENDLOOP.

 

posted @ 2020-10-16 19:17  渔歌晚唱  阅读(2690)  评论(0编辑  收藏  举报