python xml操作
1) 导入import xml.etree.ElementTree as ET
2)需要一个xml文件(sensorexpert_brand.xml),结构如下所示:
<?xml version="1.0" encoding="UTF-8"?> <response> <error>0</error> <msg>success</msg> <data> <total>10602</total> <pageSize>100</pageSize> <rows> <item> <id>11683</id> <full_name>Zywyn</full_name> <flag></flag> <country_name></country_name> <product_total>0</product_total> <brand_type>制造商</brand_type> <link>/brand/11683.html</link> </item> <item> <id>6005</id> <full_name>阿尔法电线</full_name> <flag></flag> <country_name></country_name> <product_total>1</product_total> <brand_type></brand_type> <link>/brand/6005.html</link> </item> </rows> </data> </response>
3) 读取相关代码
def parse(self, response): filename='ebs_crawler/files/sensorexpert_brand.xml' with open(filename,'w',encoding='UTF-8-sig') as file_object: file_object.write(response.text) self.logger.info(f"覆盖xml文件成功:{response.request.url}") tree = ET.parse(filename) root = tree.getroot() links=[] for chlid in root[2][2]: links.append('https://www.sensorexpert.com.cn'+chlid[6].text) self.save(links)
注意,ET.parse 方法中参数,只支持文件地址,不支持文件内容。