Jsoup爬虫的简单使用

添加POM依赖

<dependency>
    <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
    <version>1.7.3</version>
</dependency>

JAVA代码示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
public static void main(String[] args) throws IOException{
        // 天眼查
        // String result1= HttpRequest.get("http://open.api.tianyancha.com/services/open/cb/ic/2.0?keyword=XXXX公司").header("Authorization", "").execute().body();;
        // System.err.println(result1);
        /*Document doc = Jsoup.connect("https://www.tianyancha.com/search?key=北京百度网讯科技有限公司").timeout(3000).get();
        System.err.println(doc.title());
        Elements newsHeadlines = doc.select(".cate_menu_lk");
        System.err.println(newsHeadlines.size());
        for (Element headline : newsHeadlines) {
          System.err.println(
            headline.text());
        }
        */
        try {
            Document document = Jsoup.connect("https://www.so.com/s?ie=utf-8&fr=so.com&src=home_so.com&ssid=&q=java")
                    .timeout(5000)
                    .get();
             
            Elements elements = document.select(".res-title a");
  
            elements.forEach(element -> {
                System.out.println(element.text());
                System.err.println(element.attr("href"));
            });
            System.err.println("---------------------");
            for(int i=2;i<=10;i++){
                 Document documentt = Jsoup.connect("https://www.so.com/s?q=java&pn="+i+"&src=srp_paging&fr=so.com")
                         .timeout(5000)
                         .get();
                  
                 Elements eelements = documentt.select(".res-title a");
       
                 eelements.forEach(element -> {
                     System.out.println(element.text());
                     System.err.println(element.attr("href"));
                 });
                 System.err.println("---------------------");
            }
  
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

  

posted on   -韩帅  阅读(37)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示