摘要: There are many programs that can be used to extract bulk information from a web site, including browser extensions and some web services. Depending on your browser, tools likeReadability(which helps extract text from a page) orDownThemAll(which allows you to download many files at once) will help yo 阅读全文
posted @ 2014-03-06 15:23 愚人_同乐 阅读(202) 评论(0) 推荐(0) 编辑
摘要: http://www.cnblogs.com/longwu/archive/2011/12/24/2300110.html1)、学习网页数据采集,首先必不可少的是学习java的正则表达式(Regex) Java的正则表达式类文件放置在java.util.regex包中,java.util.regex包含三个类:Pattern,MatcherandPatternSyntaxException 1.1 Pattern对象是正则表达式的编译版本。它没有包含任何的公共构造器。我们传递正则表达式参数给它的公共静态方法compile来建立一个Pattern对象。 1.2 Matcher是一个正则引... 阅读全文
posted @ 2014-03-06 13:38 愚人_同乐 阅读(169) 评论(0) 推荐(0) 编辑