1、比如下面这个用rdf3x处理过后的TTL文档片段:
注意缩进的是两个空格
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2622>. <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>; <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2659"; <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2659"; <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965>; <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "30S ribosomal protein S1". <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659> , <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623>. <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>; <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2623"; <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2623"; <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965>; <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "16S/23S ribosomal RNA interface". <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>; <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2624"; <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2624"; <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022>; <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "23S ribosomal RNA". <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624>.
2、Java编写的正则表达式代码
代码里注释的部分和上面那行是输出三种所需的不同结果
package com.jena; import java.io.BufferedReader; import java.io.FileReader; import java.util.regex.Matcher; import java.util.regex.Pattern; public class rdfReader3 { static String url=""; public static void main(String[] args) { FileReader fr=null; BufferedReader br=null; try{ fr=new FileReader("C:/Users/Don/workspace/Jena/src/com/jena/bindingsite"); br=new BufferedReader(fr); String s=" "; StringBuffer str=new StringBuffer(); while((s=br.readLine())!=null){ Pattern p= Pattern.compile("<([^<>]*)>"); //匹配所有尖括号里的内容 // Pattern p= Pattern.compile("^\n*<([^<>]*)>"); //匹配每一个主语,开头匹配“除了空格所有字符”,后面匹配"<>里的所有内容,内容为非尖括号" // Pattern p= Pattern.compile(" <([^<>]*)>"); //匹配“两个空格开头”,后面匹配"<>里的所有内容,内容为非尖括号" Matcher m=p.matcher(s); while(m.find()){ System.out.println(m.group(1)); } } }catch(Exception e){ System.out.println(e.getMessage()); } } }
(1)匹配所有尖括号里的内容
运行结果
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853 http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2622 http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://rdf.ebi.ac.uk/terms/chembl#BindingSite http://www.w3.org/2000/01/rdf-schema#label http://rdf.ebi.ac.uk/terms/chembl#chemblId http://rdf.ebi.ac.uk/terms/chembl#hasTarget http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965 http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965 http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659 http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623 http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://rdf.ebi.ac.uk/terms/chembl#BindingSite http://www.w3.org/2000/01/rdf-schema#label http://rdf.ebi.ac.uk/terms/chembl#chemblId http://rdf.ebi.ac.uk/terms/chembl#hasTarget http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965 http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://rdf.ebi.ac.uk/terms/chembl#BindingSite http://www.w3.org/2000/01/rdf-schema#label http://rdf.ebi.ac.uk/terms/chembl#chemblId http://rdf.ebi.ac.uk/terms/chembl#hasTarget http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624
(2)匹配每一个主语,即开头不是两个空格的那一行数据的第一对尖括号里的内容
运行结果
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853 http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659 http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965 http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623 http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624 http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022
(3)匹配“两个空格开头”,后面匹配"<>里的所有内容,内容为非尖括号"
http://www.w3.org/2000/01/rdf-schema#label http://rdf.ebi.ac.uk/terms/chembl#chemblId http://rdf.ebi.ac.uk/terms/chembl#hasTarget http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName http://www.w3.org/2000/01/rdf-schema#label http://rdf.ebi.ac.uk/terms/chembl#chemblId http://rdf.ebi.ac.uk/terms/chembl#hasTarget http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName http://www.w3.org/2000/01/rdf-schema#label http://rdf.ebi.ac.uk/terms/chembl#chemblId http://rdf.ebi.ac.uk/terms/chembl#hasTarget http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
匹配前面两个空格开始的数据时,在前面直接输入两个空格即可
Pattern p= Pattern.compile(" <([^<>]*)>");