作业①

1）作业要求：在中国气象网（http://www.weather.com.cn）给定城市集的7日天气预报，并保存在数据库。
输出信息：

序号	地区	日期	天气信息	温度
1	北京	7日（今天）	晴间多云，北部山区有阵雨或雷阵雨转晴转多云	31℃/17℃
2	北京	8日（明天）	多云转晴，北部地区有分散阵雨或雷阵雨转晴	34℃/20℃
3	北京	9日（后天）	晴转多云	36℃/22℃
4	北京	10日（周六）	阴转阵雨	30℃/19℃
5	北京	11日（周日）	阵雨	27℃/18℃

部分代码展示：

class WeatherDB:
    def __init__(self):
        self.cursor = None
        self.con = None

    def openDB(self):
        self.con = sqlite3.connect("weathers.db")
        self.cursor = self.con.cursor()
        try:
            self.cursor.execute("""
                create table weathers (
                    id integer primary key autoincrement,
                    wCity varchar(16),
                    wDate varchar(16),
                    wWeather varchar(64),
                    wTemp varchar(32),
                    unique (wCity, wDate))
                """)
        except sqlite3.OperationalError as err:
            if "already exists" in str(err):
                print("Table 'weathers' already exists.")
            else:
                print(err)
                self.cursor.execute("delete from weathers")
            self.con.commit()

    def closeDB(self):
        self.con.commit()
        self.con.close()

    def insert(self, city, date, weather, temp):
        try:
            self.cursor.execute("""
                insert into weathers (wCity, wDate, wWeather, wTemp) 
                values (?, ?, ?, ?)
            """, (city, date, weather, temp))
        except Exception as err:
            print(err)

    def show(self):
        self.cursor.execute("select * from weathers")
        rows = self.cursor.fetchall()
        print("%-10s%-16s%-16s%-32s%-16s" % ("id", "city", "date", "weather", "temp"))
        for row in rows:
            print("%-10s%-16s%-16s%-32s%-16s" % (row[0], row[1], row[2], row[3], row[4]))

运行图片展示：

2）心得体会：

收获：提升了数据获取与处理能力，熟练掌握网络爬虫流程，包括使用requests请求和BeautifulSoup解析。
困难及解决：网页结构复杂，通过调试和参考文档确定定位方式；数据库操作不熟练，学习相关知识解决连接和插入问题。
总结展望：成功实现功能，积累经验，未来想应用于更多领域，优化代码和存储方式。

作业②

1）作业要求：
用 requests 和 BeautifulSoup 库方法定向爬取股票相关信息，并存储在数据库中。
输出信息：

序号	股票代码	股票名称	最新报价	涨跌幅	涨跌额	成交量	成交额	振幅	最高	最低	今开	昨收
1	688093	N世华	28.47	62.22%	10.92	26.13万	7.6亿	22.34	32.0	28.08	30.2	17.55

部分代码展示：

def fetch_stock_data():
    url = "http://33.push2.eastmoney.com/api/qt/clist/get"
    params = {
        "pn": 1,
        "pz": 20,
        "po": 1,
        "np": 1,
        "ut": "bd1d9ddb04089700cf9c27f6f7426281",
        "fltt": 2,
        "invt": 2,
        "fid": "f3",
        "fs": "m:0+t:80",
        "fields": "f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f115,f152",
    }
    response = requests.get(url, params=params)
    data = response.text
    return data

def parse_stock_data(data):
    soup = BeautifulSoup(data, 'lxml')  # 使用 lxml 作为解析器
    data_json = json.loads(soup.get_text())  # 将文本转换为 JSON 对象
    stocks = data_json["data"]["diff"]
    new_stocks = []
    for stock in stocks:
        new_stock = {
            "股票代码": stock["f12"],
            "股票名称": stock["f14"],
            "最新报价": stock["f2"],
            "涨跌幅": stock["f3"],
            "涨跌额": stock["f4"],
            "成交": stock["f6"],
            "成交额": stock["f7"],
            "振幅": stock["f8"],
            "最高": stock["f15"],
            "最低": stock["f22"],
            "今开": stock["f23"],
            "昨收": stock["f24"]
        }
        new_stocks.append(new_stock)
    return new_stocks

运行图片展示：

2）心得体会：

收获：深入理解网络爬虫原理，学会分析 API，能根据数据格式解析，提升股票数据分析能力，认识到股票数据的重要性。
困难及解决：API 复杂，通过试验和参考资料确定参数；数据存在准确性和完整性问题，增加验证和清洗步骤并设置错误处理。
总结展望：提高了技术和分析能力，未来将拓展股票数据分析方向，优化爬虫性能，应用于其他金融领域。

作业③：

1）作业要求：
爬取中国大学 2021 主榜（https://www.shanghairanking.cn/rankings/bcur/2021）所有院校信息，并存储在数据库中，同时将浏览器 F12 调试分析的过程录制 Gif 加入至博客中。
输出信息：

排名	学校	省市	类型	总分
1	清华大学	北京	综合	969.2

部分代码展示：

def printUnivList(ulist, html, num):
    data = json.loads(html) 
    content = data['data']['rankings']

    # 打印表头
    tplt ="{0:^10}\t{1:^20}\t{2:^10}\t{3:^10}\t{4:^10}"
    print(tplt.format("排名", "学校名称", "总分", "省市", "类型"))

    for item in content[:num]:
        index = item['rankOverall']
        name = item['univNameCn']
        score = item['score']
        province = item['province']
        category = item['univCategory']
        ulist.append([index, name, score, province, category])
        print(tplt.format(index, name, score, province, category))

运行图片展示：

2）心得体会：

收获：突破复杂网站爬虫技术，学会分析隐蔽 API 和处理动态加载数据，拓展对教育领域数据挖掘的认知，了解高校差异。
困难及解决：API 隐藏且复杂，耐心排查确定；动态加载数据处理难，使用selenium模拟操作，虽有性能问题但解决了数据获取。
总结展望：完成任务，提高多方面能力，未来优化爬虫，深入分析数据，整合教育数据构建平台。

posted on 2024-10-22 22:02 吴鱼子阅读(19) 评论(0) 编辑收藏举报