随笔分类 - python入门
摘要:创建空的excel import pandas as pd # 表示excel的sheet页 df = pd.DataFrame() df.to_excel("D:/pycode/output/output.xlsx") df = pd.DataFrame({"ID":[1,2,3],"Name":
阅读全文
摘要:def add(x,y): return x + y sum = add(3,5) #print(sum) dict = {"add":add} sum1 = dict.get("add")(4,6) 通过传参把列表list传进去,在调用的方法中添加元素,原来的列表list也就成功添加了元素 def
阅读全文
摘要:1.创建项目:scrapy startproject dushuproject 2.跳转到spiders路径 cd\dushuproject\dushuproject\spiders 3.创建爬虫类:scrapy genspider read www.dushu.com import scrapy
阅读全文
摘要:import scrapy import json class TransferpostSpider(scrapy.Spider): name = 'transferPost' allowed_domains = ['fanyi.baidu.com'] # start_urls = ['http:/
阅读全文
摘要:settings.py DB_HOST = 'localhost' DB_PORT = 3306 DB_USER = 'root' DB_PWD = '1234' DB_NAME = 'guli' DB_CHARSET = 'utf8' # Configure item pipelines # Se
阅读全文
摘要:movie.py import scrapy from movieProject.items import MovieprojectItem class MovieSpider(scrapy.Spider): name = 'movie' allowed_domains = ['www.ygdy8.
阅读全文
摘要:def parse(self, response): print('当当网') li = response.xpath('//ul[@id="component_59"]/li') #src,name,price有个共同的父元素li,但是对于第一个li,没有data-original,所以遍历根据l
阅读全文
摘要:1.创建scrapy项目: 终端输入 scrapy startproject 项目名称 在spiders文件夹下创建py文件 scrapy genspider baidu http://www.baidu.com settings.py ROBOTSTXT_OBEY = False 4.运行爬虫文件
阅读全文
摘要:import requests from lxml import etree import urllib.request url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.asp
阅读全文
摘要:import requests url = 'http://www.baidu.com' res = requests.get(url)# 去除响应的乱码问题 res.encoding = 'utf-8' print(res.text) 3.response的属性以及类型 类型 :models.Re
阅读全文
摘要:from selenium import webdriver path = 'chromedriver.exe' broswer = webdriver.Chrome(path) url = 'http://www.baidu.com' broswer.get(url) 元素定位: 1.find_e
阅读全文
摘要:import urllib.request from lxml import etree # https://sc.chinaz.com/tupian/siwameinvtupian.html url = 'https://sc.chinaz.com/tupian/siwameinvtupian_2
阅读全文
摘要:from lxml import etree # 获取本地文件 tree = etree.parse('bendi.html') print(tree) # /表示子元素,//表示子孙后代元素 li = tree.xpath('//body/ul/li') print(li) print(len(l
阅读全文
摘要:import urllib.request import urllib.parse headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
阅读全文
摘要:import urllib.request import urllib.parse import json def getKenData(index): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl
阅读全文
摘要:import urllib.request url = 'https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20' headers = { 'User-Agent':
阅读全文
摘要:原始数据: from: en to: zh query: love transtype: realtime simple_means_flag: 3 sign: 198772.518981 token: 1b434ed1e595135ac1b2959f4430a51f domain: common
阅读全文
摘要:import urllib.request import urllib.parse headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
阅读全文
摘要:import urllib.request url = "http://www.baidu.com" response = urllib.request.urlopen(url) content = response.read().decode('utf-8') print(content) 如果不
阅读全文
摘要:json序列号和反序列化: file1 = open('test1.txt','r') content = file1.read() print(content) result = json.loads(content) print(result) print(type(result)) for i
阅读全文