app第二阶段冲刺第九天—— jsoup 1

今天开始写爬虫了，从网上的资料可以知道，爬虫可以用 jsoup 来写，还可以用 python 来写，今天试试水，先用 jsoup ,现在开搞。

我们还是先将布局文件先写好

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    android:orientation="vertical"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <TextView
        android:id="@+id/title"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

    <TextView
        android:id="@+id/image"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

    <TextView
        android:id="@+id/author"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

    <TextView
        android:id="@+id/context"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

    <TextView
        android:id="@+id/articleUrl"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>
</LinearLayout>

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <ListView
        android:id="@+id/lv_mytest"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

</LinearLayout>

接下来我们写什么呢，先用Java写爬取数据吧，

由于放在之前那个项目里面，害怕将项目弄的无法运行，所以单独建立一个新项目，将爬虫实现在这个上面。

package com.example.crawler.Tools;

import android.util.Log;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.util.ArrayList;

//GetData 解析 html 整理成列表Article 模型数据
public class GetData {

    private static final String TAG ="GetData" ;

    /**
     * 抓取精选文章
     * @param html
     * @return  ArrayList<Article> articles
     */
    public static ArrayList<Article> spiderArticle(String html){
        ArrayList<Article> articles = new ArrayList<>();

        Document document = (Document) Jsoup.parse(html);
        Elements elements = document
                .select("ul[class=feed-list-hits feed-list-index]")
                .select("li[class=feed-row-wide J_feed_za feed-haojia]");

        Log.i(TAG, "spiderArticle: elements " +elements.html());

        for (Element element : elements) {
            String title = element
                    .select("h5[class=feed-block-title has-price]")
                    .text();

            String author = element
                    .select("div[class=z-feed-foot]")
                    .select("span[class=feed-block-extras]")
                    .select("a")
                    .select("span")
                    .text();

            String imgurl = element
                    .select("div[class=z-feed-img]")
                    .select("a")
                    .select("img")
                    .attr("src");

            String context = element
                    .select("div[class=feed-block-descripe]")
                    .text();

            String articleUrl = element
                    .select("div[class=z-feed-img ]")
                    .select("a")
                    .attr("href");

            Article article = new Article(title,author,imgurl,context,articleUrl);
            articles.add(article);
            //Log.e("DATA>>",article.toString());
        }
        return articles;
    }
}

明天接着写获取网页 html，对抓取到的文章数据封装

posted @ 2022-05-12 00:00 kuaiquxie 阅读(48) 评论(0) 收藏举报

刷新页面返回顶部

kuaiquxie

感悟代码魅力，追求极限人生！

app第二阶段冲刺第九天—— jsoup 1

公告