有什么岁月静好,不过是有人替你负重前行!哪

python网络爬虫-提升爬虫的速度(八)

提升爬虫的速度

从前面几篇已经可以从获取网页、解析网页、存储数据来实现一些基本的爬虫。现在记录一些进阶部分:提升爬虫速度,主要有3中方法:多线程爬虫、多进程爬虫、多协程爬虫。对比普通单线程爬虫,使用这3种方法爬虫的速度能成倍的提升。

并发和并行

并发是指在一段时间内发生的若干时间的情况
并行是值在同一时刻发生若干事件的情况

同步和异步

同步就是并发并行的各个任务不是独自运行的,任务之间有一定交替顺序,像接力赛一样。
异步就是并发和并行的各个任务独立运行互不干扰。每个任务都不在同一个赛道上面跑步的速度不受其他选手影响

多线程爬虫

多线程爬虫是以并发的方式执行的。也就是说,多线程并不能真正的同时执行,而是通过进程的快速切换加快网络爬虫的速度的。
在操作IO的时候使用多线程可以提升程序执行效率

简单单线程爬虫

点击查看代码
http://www.baidu.com
http://www.qq.com
http://www.naver.com
http://www.taobao.com
http://www.reddit.com
http://www.sohu.com
http://www.tmall.com
http://www.sina.com.cn
http://www.daum.net
http://www.jd.com
http://www.360.cn
http://www.weibo.com
http://www.aliexpress.com
http://www.linkedin.com
http://www.alipay.com
http://www.hao123.com
http://www.csdn.net
http://www.youth.cn
http://www.live.com
http://www.tianya.cn
http://www.microsoftonline.com
http://www.office.com
http://www.soso.com
http://www.so.com
http://www.gmw.cn
http://www.china.com
http://www.nate.com
http://www.huaban.com
http://www.bing.com
http://www.xinhuanet.com
http://www.youku.com
http://www.zhihu.com
http://www.cctv.com
http://www.airasia.com
http://www.douyu.com
http://www.babytree.com
http://www.apple.com
http://www.sogou.com
http://www.china.com.cn
http://www.yelp.com
http://www.ocbc.com
http://www.microsoft.com
http://www.mama.cn
http://www.bitauto.com
http://www.bankofamerica.com
http://www.1688.com
http://www.stackoverflow.com
http://www.163.com
http://www.39.net
http://www.cnblogs.com
http://www.bilibili.com
http://www.interpark.com
http://www.huanqiu.com
http://www.cnzz.com
http://www.chinadaily.com.cn
http://www.openrice.com
http://www.msn.com
http://www.k618.cn
http://www.yesky.com
http://www.caijing.com.cn
http://www.emirates.com
http://www.amazon.cn
http://www.aliyun.com
http://www.eastday.com
http://www.youdao.com
http://www.oeeee.com
http://www.ci123.com
http://www.baike.com
http://www.adobe.com
http://www.rednet.cn
http://www.iqiyi.com
http://www.wemakeprice.com
http://www.douban.com
http://www.familydoctor.com.cn
http://www.agoda.com
http://www.jrj.com.cn
http://www.read01.com
http://www.17ok.com
http://www.chinaz.com
http://www.youboy.com
http://www.tesco.com
http://www.alibaba.com
http://www.gearbest.com
http://www.51sole.com
http://www.dbs.com
http://www.suning.com
http://www.oschina.net
http://www.voc.com.cn
http://www.zol.com.cn
http://www.asos.com
http://www.chinaso.com
http://www.jianshu.com
http://www.ifeng.com
http://www.stockstar.com
http://www.zhanqi.tv
http://www.52pk.com
http://www.whatsbuying.com
http://www.cqnews.net
http://www.gongchang.com
http://www.godaddy.com
http://www.godaddy.com
http://www.wtoip.com
http://www.segmentfault.com
http://www.evernote.com
http://www.dianping.com
http://www.qingdaonews.com
http://www.guancha.cn
http://www.standardchartered.com
http://www.singaporeair.com
http://www.toutiao.com
http://www.jiameng.com
http://www.dm5.com
http://www.w3school.com.cn
http://www.zhaopin.com
http://www.99.com
http://www.mi.com
http://www.b2b.cn
http://www.cathaypacific.com
http://www.southcn.com
http://www.battle.net
http://www.ups.com
http://www.jb51.net
http://www.comcast.net
http://www.alicdn.com
http://www.v2ex.com
http://www.firefoxchina.cn
http://www.360doc.com
http://www.xunlei.com
http://www.sharepoint.com
http://www.scol.com.cn
http://www.admaimai.com
http://www.v1.cn
http://www.51cto.com
http://www.jqw.com
http://www.bzw315.com
http://www.126.com
http://www.beanfun.com
http://www.chooseauto.com.cn
http://www.renren.com
http://www.taleo.net
http://www.51.la
http://www.zcool.com.cn
http://www.4399.com
http://www.duba.com
http://www.globaltimes.cn
http://www.ycwb.com
http://www.sfacg.com
http://www.hotelscombined.com
http://www.mydrivers.com
http://www.taoche.com
http://www.runoob.com
http://www.tlscontact.com
http://www.nba.com
http://www.gamebase.com.tw
http://www.zhibo8.cc
http://www.hexun.com
http://www.xiami.com
http://www.finnair.com
http://www.feng.com
http://www.cdstm.cn
http://www.uniqlo.com
http://www.iciba.com
http://www.qudong.com
http://www.panda.tv
http://www.cnbeta.com
http://www.nipic.com
http://www.sznews.com
http://www.huawei.com
http://www.tuicool.com
http://www.baimao.com
http://www.umeng.com
http://www.ccidnet.com
http://www.klm.com
http://www.qcloud.com
http://www.hupu.com
http://www.ikanman.com
http://www.3dmgame.com
http://www.icolor.com.cn
http://www.360.com
http://www.36kr.com
http://www.miui.com
http://www.boc.cn
http://www.gamersky.com
http://www.joyme.com
http://www.17173.com
http://www.uc.cn
http://www.alimama.com
http://www.oasgames.com
http://www.focus.cn
http://www.cnr.cn
http://www.miomio.tv
http://www.jjwxc.net
http://www.5dcar.com
http://www.hjenglish.com
http://www.dangdang.com
http://www.springer.com
http://www.to8to.com
http://www.xiaomi.com
http://www.ctrip.com
http://www.delta.com
http://www.anjuke.com
http://www.cnki.net
http://www.surveymonkey.com
http://www.tower.im
http://www.baiducontent.com
http://www.acfun.cn
http://www.people.com.cn
http://www.jmw.com.cn
http://www.worktile.com
http://www.newsmth.net
http://www.vmall.com
http://www.07073.com
http://www.qyer.com
http://www.hujiang.com
http://www.cnnic.cn
http://www.meituan.com
http://www.yinxiang.com
http://www.ngacn.cc
http://www.smzdm.com
http://www.ccb.com
http://www.ali213.net
http://www.alibaba-inc.com
http://www.3158.cn
http://www.vmall.com
http://www.nike.com
http://www.eqxiu.com
http://www.jandan.net
http://www.office365.com
http://www.imooc.com
http://www.ikea.com
http://www.united.com
http://www.ly.com
http://www.epwk.com
http://www.tudou.com
http://www.leagueoflegends.com
http://www.aa.com
http://www.garena.com
http://www.mafengwo.cn
http://www.ifensi.com
http://www.pptv.com
http://www.fobshanghai.com
http://www.asiamiles.com
http://www.znds.com
http://www.hc360.com
http://www.job853.com
http://www.sf-express.com
http://www.lianjia.com
http://www.guokr.com
http://www.cmbchina.com
http://www.modernweekly.com
http://www.ynet.com
http://www.dell.com
http://www.dict.cn
http://www.yinyuetai.com
http://www.aizhan.com
http://www.gome.com.cn
http://www.meishichina.com
http://www.51hejia.com
http://www.ule.com
http://www.ea3w.com
http://www.saraba1st.com
http://www.chsi.com.cn
http://www.vlive.tv
http://www.sonhoo.com
http://www.hongkongairlines.com
http://www.jxnews.com.cn
http://www.free.com.tw
http://www.docin.com
http://www.liepin.com
http://www.chinaunix.net
http://www.weibo.cn
http://www.ifanr.com
http://www.51auto.com
http://www.ebrun.com
http://www.10010.com
http://www.hebei.com.cn
http://www.tgbus.com
http://www.mtime.com
http://www.vip.com
http://www.kdslife.com
http://www.www.gov.cn
http://www.cncn.org.cn
http://www.techcrunch.com
http://www.zbj.com
http://www.ip138.com
http://www.cyol.com
http://www.pc6.com
http://www.joox.com
http://www.178.com
http://www.lagou.com
http://www.18183.com
http://www.365jia.cn
http://www.autohome.com.cn
http://www.battlenet.com.cn
http://www.oracle.com
http://www.miaopai.com
http://www.sina.cn
http://www.ch.com
http://www.yxdown.com
http://www.etao.com
http://www.vietnamairlines.com
http://www.iyiou.com
http://www.shop.com
http://www.588ku.com
http://www.le.com
http://www.sina.com
http://www.jstv.com
http://www.ceconline.com
http://www.koreanair.com
http://www.skype.com
http://www.ih5.cn
http://www.ems.com.cn
http://www.efu.com.cn
http://www.pcbaby.com.cn
http://www.shimo.im
http://www.macaolife.com
http://www.xiu.com
http://www.eastmoney.com
http://www.xiumi.us
http://www.yhd.com
http://www.jiemian.com
http://www.daikuan.com
http://www.ximalaya.com
http://www.marriott.com
http://www.d1ev.com
http://www.xitek.com
http://www.chuansong.me
http://www.alitrip.com
http://www.xiaomi.cn
http://www.51job.com
http://www.91jm.com
http://www.2cto.com
http://www.qoo10.com
http://www.centadata.com
http://www.lufthansa.com
http://www.techweb.com.cn
http://www.kugou.com
http://www.80018.cn
http://www.tmtpost.com
http://www.house365.com
http://www.hp.com
http://www.unity3d.com
http://www.zoom.us
http://www.kafan.cn
http://www.liansuo.com
http://www.netease.com
http://www.10jqka.com.cn
http://www.xiazaiba.com
http://www.fang.com
http://www.smartisan.com
http://www.photofans.cn
http://www.ooopic.com
http://www.zybang.com
http://www.gw-ec.com
http://www.wed114.cn
http://www.huomao.com
http://www.ithome.com
http://www.ccb.com.cn
http://www.chinanews.com
http://www.doc88.com
http://www.sanguosha.com
http://www.evaair.com
http://www.icbc.com.cn
http://www.youxidudu.com
http://www.verycd.com
http://www.netcoc.com
http://www.pepper.com
http://www.dygang.com
http://www.liaoxuefeng.com
http://www.flyasiana.com
http://www.sciencenet.cn
http://www.feiyang.com
http://www.800hr.com
http://www.iconfont.cn
http://www.youzan.com
http://www.360kan.com
http://www.chinabyte.com
http://www.samsung.com
http://www.zxart.cn
http://www.gucheng.com
http://www.bootcss.com
http://www.cankaoxiaoxi.com
http://www.58pic.com
http://www.81.cn
http://www.csair.com
http://www.chiphell.com
http://www.antpedia.com
http://www.xiachufang.com
http://www.winshang.com
http://www.fzg360.com
http://www.chaduo.com
http://www.12306.cn
http://www.morningpost.com.cn
http://www.soku.com
http://www.sspai.com
http://www.yoox.com
http://www.huxiu.com
http://www.nyu.edu
http://www.jiwu.com
http://www.u17.com
http://www.jiayuan.com
http://www.yy.com
http://www.duowan.com
http://www.mbalib.com
http://www.wanfangdata.com.cn
http://www.ibuying.com
http://www.chouti.com
http://www.71.net
http://www.hrloo.com
http://www.meizu.com
http://www.miercn.com
http://www.fengniao.com
http://www.fangdd.com
http://www.htc.com
http://www.jdzj.com
http://www.pcauto.com.cn
http://www.kaola.com
http://www.kuaidi100.com
http://www.yougov.com
http://www.ku6.com
http://www.sanwen8.cn
http://www.yiwugou.com
http://www.lottedfs.com
http://www.cisco.com
http://www.wallstreetcn.com
http://www.gamedog.cn
http://www.tencent.com
http://www.tvhome.com
http://www.xbox.com
http://www.cr173.com
http://www.onlinedown.net
http://www.ebay.com.hk
http://www.searchs.cn
http://www.17track.net
http://www.hyundai.com
http://www.baixing.com
http://www.258.com
http://www.cn2che.com
http://www.pudn.com
http://www.dv37.com
http://www.dv37.com
http://www.uisdc.com
http://www.sojump.com
http://www.d1net.com
http://www.ganji.com
http://www.jobbole.com
http://www.pearsoncmg.com
http://www.kongfz.com
http://www.365jilin.com
http://www.strawberrynet.com
http://www.11467.com
http://www.jobui.com
http://www.hh010.com
http://www.teambition.com
http://www.woshipm.com
http://www.lge.com
http://www.kanxi.cc
http://www.leiphone.com
http://www.d1com.com
http://www.114so.cn
http://www.d1com.com
http://www.114so.cn
http://www.duomai.com
http://www.win007.com
http://www.weidian.com
http://www.qiku.com
http://www.cli.im
http://www.flyertea.com
http://www.lenovo.com.cn
http://www.aso100.com
http://www.xueqiu.com
http://www.bp.com
http://www.dingtalk.com
http://www.processon.com
http://www.flyme.cn
http://www.a9vg.com
http://www.sinaimg.cn
http://www.saic.gov.cn
http://www.mgtv.com
http://www.nuomi.com
http://www.tiexue.net
http://www.vvvdj.com
http://www.tvmao.com
http://www.panduoduo.net
http://www.wechat.com
http://www.52pojie.cn
http://www.miwifi.com
http://www.iteye.com
http://www.kanzhun.com
http://www.mango.com
http://www.cheaa.com
http://www.13322.com
http://www.jikexueyuan.com
http://www.taisha.org
http://www.mydigit.cn
http://www.gusuwang.com
http://www.pinggu.org
http://www.lbldy.com
http://www.sgcn.com
http://www.misumi-ec.com
http://www.lofter.com
http://www.unrealengine.com
http://www.gao7.com
http://www.leju.com
http://www.home77.com
http://www.qunar.com
http://www.xdowns.com
http://www.oa.com
http://www.sgcn.com
http://www.szjy188.com
http://www.tuniu.com
http://www.135editor.com
http://www.f.com
http://www.zhibo.tv
http://www.jiyoujia.com
http://www.95516.com
http://www.yiqifa.com
http://www.cocoachina.com
http://www.babyschool.com.cn
http://www.iweihai.cn
http://www.haowu.com
http://www.hm.com
http://www.wish.com
http://www.fitbit.com
http://www.taojindi.com
http://www.koolearn.com
http://www.xabbs.com
http://www.020.com
http://www.qiniu.com
http://www.25pp.com
http://www.nga.cn
http://www.educity.cn
http://www.zealer.com
http://www.xdowns.com
http://www.liqu.com
http://www.qichacha.com
http://www.51credit.com
http://www.duomai.com
http://www.juooo.com
http://www.shanbay.com
http://www.juooo.com
http://www.shanbay.com
http://www.meishij.net
http://www.th7.cn
http://www.jia400.com
http://www.cas.cn
http://www.wenwuchina.com
http://www.189.cn
http://www.liuxue86.com
http://www.klook.com
http://www.shfft.com
http://www.8264.com
http://www.china.cn
http://www.zhifang.com
http://www.made-in-china.com
http://www.rabbitpre.com
http://www.sap.com
http://www.macx.cn
http://www.everychina.com
http://www.9game.cn
http://www.ca800.com
http://www.dgtle.com
http://www.cloudscar.com
http://www.bdhome.cn
http://www.news18a.com
http://www.shilladfs.com
http://www.net-a-porter.com
http://www.zealer.com
http://www.discoverhongkong.com
http://www.80s.tw
http://www.9ku.com
http://www.33lc.com
http://www.thepaper.cn
http://www.scswl.cn
http://www.officedepot.com
http://www.fx678.com
http://www.banma.com
http://www.eee114.com
http://www.9384.com
http://www.xuexila.com
http://www.9384.com
http://www.xuexila.com
http://www.cheshen.cn
http://www.mr-world.com
http://www.fx112.com
http://www.97665.com
http://www.chinahr.com
http://www.acs.org
http://www.mikecrm.com
http://www.checheng.com
http://www.appgame.com
http://www.linkhaitao.com
http://www.meipai.com
http://www.linuxidc.com
http://www.fliggy.com
http://www.amap.com
http://www.4px.com
http://www.qpic.cn
http://www.modao.cc
http://www.dianxiaomi.com
http://www.56.com
http://www.java.com
http://www.hdpfans.com
http://www.thinkphp.cn
http://www.2345.com
http://www.baoku.com
http://www.tiancity.com
http://www.bcsh.com
http://www.bozhong.com
http://www.zhiding.cn
http://www.longzhu.com
http://www.xjtour.com
http://www.kancloud.cn
http://www.open-open.com
http://www.itpub.net
http://www.elong.com
http://www.pchome.net
http://www.pps.tv
http://www.qinqinbaby.com
http://www.chuandong.com
http://www.coding.net
http://www.yidianzixun.com
http://www.51nb.com
http://www.dhgate.com
http://www.10086.cn
http://www.6vhao.com
http://www.5acbd.com
http://www.atobo.com.cn
http://www.kubo365.com
http://www.111cn.net
http://www.zhongmin.cn
http://www.weiyangx.com
http://www.juesheng.com
http://www.uuu9.com
http://www.siilu.com
http://www.pconline.com.cn
http://www.dji.com
http://www.west.cn
http://www.ctfile.com
http://www.idianfa.com
http://www.smm.cn
http://www.shejis.com
http://www.zhangyu.tv
http://www.17zwd.com
http://www.dhl.com
http://www.shfft.com
http://www.wanmei.com
http://www.122.gov.cn
http://www.51nb.com
http://www.xici.net
http://www.cnki.com.cn
http://www.redocn.com
http://www.qvc.com
http://www.aipai.com
http://www.dapenti.com
http://www.3lian.com
http://www.guidechem.com
http://www.jiankang.com
http://www.tgfcer.com
http://www.freebuf.com
http://www.sodao.com
http://www.zhcw.com
http://www.sh.com
http://www.ablesky.com
http://www.microsoftstore.com.cn
http://www.7k7k.com
http://www.southmoney.com
http://www.btc123.com
http://www.digitaling.com
http://www.meitu.com
http://www.chinaaet.com
http://www.kaoyan.com
http://www.aipai.com
http://www.tripadvisor.cn
http://www.colg.cn
http://www.admin5.com
http://www.ncar.cc
http://www.intel.com
http://www.wanyx.com
http://www.chmotor.cn
http://www.mxhichina.com
http://www.jzb.com
http://www.it168.com
http://www.1kkk.com
http://www.cnodejs.org
http://www.hudong.com
http://www.ucweb.com
http://www.xyw.gov.cn
http://www.airasiago.com
http://www.damai.cn
http://www.farnell.com
http://www.hi-pda.com
http://www.wenku1.com
http://www.haosou.com
http://www.ishuhui.com
http://www.paopaoche.net
http://www.csai.cn
http://www.zhaoshangbao.com
http://www.eol.cn
http://www.excelhome.net
http://www.missevan.com
http://www.cncv.org.cn
http://www.365yg.com
http://www.huim.com
http://www.zxxk.com
http://www.51yes.com
http://www.cainiao.com
http://www.nh87.cn
http://www.b0yp.com
http://www.qdaily.com
http://www.kongzhong.com
http://www.shangc.net
http://www.dongqiudi.com
http://www.jiankang.com
http://www.dzsc.com
http://www.chinaacc.com
http://www.vcg.com
http://www.oneplusbbs.com
http://www.xuetangx.com
http://www.fz222.com
http://www.cnwnews.com
http://www.chinadmd.com
http://www.b2b168.com
http://www.pingan.com
http://www.pushauction.com
http://www.sdo.com
http://www.9978.cn
http://www.ltaaa.com
http://www.gxyj.com
http://www.kuaizhan.com
http://www.airchina.com.cn
http://www.gcl-power.com
http://www.medsci.cn
http://www.lbxcn.com
http://www.lzgd.com.cn
http://www.oray.com
http://www.taobao.org
http://www.btbtdy.com
http://www.i2ya.com
http://www.istar.cn
http://www.xgo.com.cn
http://www.66law.cn
http://www.heiguang.com
http://www.ao.com
http://www.jq22.com
http://www.qidian.com
http://www.goldcarpet.cn
http://www.zxbtz.cn
http://www.jiushang.cn
http://www.cicpa.org.cn
http://www.wowenda.com
http://www.coursera.org
http://www.fangdr.com
http://www.cps.com.cn
http://www.kmf.com
http://www.cri.cn
http://www.lmjx.net
http://www.lonshinetech.cn
http://www.infoq.com
http://www.gushiwen.org
http://www.ecp888.com
http://www.tongtool.com
http://www.dajie.com
http://www.co188.com
http://www.fumanhua.net
http://www.maiche168.com
http://www.sankuai.com
http://www.ucas.ac.cn
http://www.lamabang.com
http://www.huajiao.com
http://www.accorhotels.com
http://www.wendangku.net
http://www.dragonparking.com
http://www.6789.com
http://www.xdf.cn
http://www.tucao.tv
http://www.91yunxiao.com
http://www.liebiao.com
http://www.9lianmeng.com
http://www.51240.com
http://www.zhiyoo.com
http://www.silkair.com
http://www.313.cn
http://www.ssl-images-amazon.com
http://www.eepw.com.cn
http://www.gs307.com
http://www.yindou.com
http://www.i1515.com
http://www.imiker.com
http://www.lvmama.com
http://www.louisvuitton.com
http://www.nowgoal.com
http://www.makeding.com
http://www.xz7.com
http://www.guitarchina.com
http://www.wto168.net
http://www.abchina.com
http://www.fzdm.com
http://www.ichacha.net
http://www.1024sj.com
http://www.ef43.com.cn
http://www.newrank.cn
http://www.ceair.com
http://www.zimuku.net
http://www.ppkoo.com
http://www.jc35.com
http://www.dnspod.cn
http://www.hsw.cn
http://www.caixin.com
http://www.manmanbuy.com
http://www.23us.com
http://www.asus.com
http://www.zoosnet.net
http://www.xp510.com
http://www.vgtime.com
http://www.qiushibaike.com
http://www.jinshuju.net
http://www.115.com
http://www.3367.com
http://www.fanli.com
http://www.newcger.com
http://www.kepu.net.cn
http://www.findlaw.cn
http://www.jiumei.com
http://www.gkstk.com
http://www.ihg.com
http://www.blizzard.com
http://www.lenovo.com
http://www.longau.com
http://www.seedit.com
http://www.ofweek.com
http://www.61baobao.com
http://www.400.cn
http://www.wines-info.com
http://www.innisfree.com
http://www.weather.com.cn
http://www.che168.com
http://www.dilidili.wang
http://www.7po.com
http://www.qiushibaike.com
http://www.9r.cn
http://www.weather.com.cn
http://www.107cine.com
http://www.coolapk.com
http://www.ixueshu.com
http://www.iplaysoft.com
http://www.blizzard.cn
http://www.dangbei.com
http://www.hellorf.com
http://www.21food.cn
http://www.libaclub.com
http://www.outofmemory.cn
http://www.ele.me
http://www.shihuo.cn
http://www.zmz2017.com
http://www.zybuluo.com
http://www.66ys.tv
http://www.sczw.com
http://www.xtx6.com
http://www.tutorabc.com
http://www.zhipin.com
http://www.cgdc.com.cn
http://www.61learn.com
http://www.sm.cn
http://www.571xz.com
http://www.sobt5.org
http://www.starwoodhotels.com
http://www.qqtn.com
http://www.sgamer.com
http://www.120ask.com
http://www.appinn.com
http://www.qianzhan.com
http://www.888pic.com
http://www.tianyancha.com
http://www.k73.com
http://www.yiibai.com
http://www.downxia.com
http://www.managershare.com
http://www.downcc.com
http://www.biquge.tw
http://www.fgowiki.com
http://www.p2peye.com
http://www.haosou.com
http://www.yimu100.com
http://www.fox.com
http://www.mrporter.com
http://www.genshuixue.com
http://www.jisutiyu.com
http://www.topfo.com
http://www.right.com.cn
http://www.5ewin.com
http://www.dongnanshan.com
http://www.jizhangla.com
http://www.laawoo.com
http://www.3618med.com
http://www.ahgame.com
http://www.mamicode.com
http://www.wugu.com.cn
http://www.115.com
http://www.genshuixue.com
http://www.57mh.com
http://www.oiegg.com
http://www.21csp.com.cn
http://www.kekenet.com
http://www.c5game.com
http://www.juejin.im
http://www.baofeng.com
http://www.kuwo.cn
http://www.6.cn
http://www.chayu.com
http://www.sanwen.net
http://www.962.net
http://www.etest.net.cn
http://www.innisfree.com
http://www.dragonair.com
http://www.vjshi.com
http://www.lawtime.cn
http://www.sccnn.com
http://www.qqbaobao.com
http://www.dragonair.com
http://www.vjshi.com
http://www.lawtime.cn
http://www.sccnn.com
http://www.qqbaobao.com
http://www.chinaswitch.com
http://www.5118.com
http://www.cntv.cn
http://www.knowsky.com
http://www.skyscanner.com
http://www.wrz.com
http://www.wasu.cn
http://www.mojifen.com
http://www.nvidia.com
http://www.oceanpark.com.hk
http://www.pcbeta.com
http://www.psnine.com
http://www.228.com.cn
http://www.zhuixinfan.com
http://www.okcoin.cn
http://www.huya.com
http://www.1ppt.com
http://www.fyber.com
http://www.72byte.com
http://www.cpic.com.cn
http://www.wlmq.com
http://www.lusongsong.com
http://www.fanjian.net
http://www.hopetrip.com.hk
http://www.hnjy.com.cn
http://www.8kana.com
http://www.8d.cc
http://www.linux.cn
http://www.enterprise.com
http://www.iqing.in
http://www.sg560.com
http://www.mnw.cn
http://www.trendmicro.com
http://www.sipo.gov.cn
http://www.a.com.cn
http://www.hangame.com
http://www.cngold.org
http://www.95095.com
http://www.ishuo.cn
http://www.tecenet.com
http://www.jinti.com
http://www.sobaidupan.com
http://www.ichunqiu.com
http://www.xilu.com
http://www.3987.com
http://www.rr-sc.com
http://www.99114.com
http://www.haodou.com
http://www.wolfram.com
http://www.expreview.com
http://www.myexception.cn
http://www.shixiseng.com
http://www.bjjs.gov.cn
http://www.xxbiquge.com
http://www.lesports.com
http://www.hea.cn
http://www.24home.com
http://www.yeah.net
http://www.qcw.com
http://www.shoes.net.cn
http://www.9c9v.com
http://www.bjhjyd.gov.cn
http://www.ecvv.com
http://www.fanlibang.com
http://www.jxmall.com
http://www.xcar.com.cn
http://www.go108.com.cn
http://www.divcss5.com
http://www.sc.com
http://www.watchstore.com.cn
http://www.mexgroup.com
http://www.xunyingwang.com
http://www.chinagate.cn
http://www.zdic.net
http://www.bdimg.com
link_list = []
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
file_list = file.readlines()
for e in file_list:
link = e.split('\t')[1]
link = link.replace('\n', '')
link_list.append(link)
stat = time.time()
for e in link_list:
try:
r = requests.get(e)
print(r.status_code, e)
except Exception as erro:
print('Error:', erro)
end = time.time()
print('串行的总时长为:', end - stat)

学习python多线程

python两种使用多线程的方法。
函数式:调用_thread模块中的start_new_thread()
类包装式:调用Threading库创建线程,从threading.thread继承。
1.

# 为线程定义一个函数
def print_time(threadName, delay):
count = 0
while count < 3:
time.sleep(delay)
count += 1
print(threadName, time.ctime())
# _thread.start_new_thread(print_time, ("Thread-1", 1))
# _thread.start_new_thread(print_time, ("Thread-2", 2))
# print("Main Finished")
class myThread(threading.Thread):
def __init__(self, name, delay):
threading.Thread.__init__(self)
self.name = name
self.delay = delay
def run(self):
print("Starting" + self.name)
print_time(self.name, self.delay)
print("Exiting" + self.name)
def print_time(threadName, delay):
counter = 0
while counter < 3:
time.sleep(delay)
print(threadName, time.ctime())
counter += 1
threads = []
# 创建新线程
thread1 = myThread("Thread-1", 1)
thread2 = myThread("Thread-2", 2)
# 开启新线程
thread1.start()
thread2.start()
# 添加线程到线程列表
threads.append(thread1)
threads.append(thread2)
# 等待所有线程完成
for t in threads:
t.join()
print("Exiting Main Thread")

run():以表示线程活动的方法
start():启动线程活动
join([time]):组设调用线程直至线程的join()方法被调用为止
isAlive():返回线程是否是活动的
getNmae():返回线程名称
setName():设置线程名
上面代码中,thread1 = myThread("Thread-1", 1),然后在myThread这个类中对线程进行设置,使用run()表示线程运行方法当counter小于3时打印线程名称和时间。然后使用thread1.start()开启线程,使用threads.append(thread1)添加线程到线程列表中,用t.join()等待所有线程完成才会继续执行主线程。

简单的多线程爬虫实例

import threading
link_list = []
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
file_list = file.readlines()
for e in file_list:
link = e.split('\t')[1]
link = link.replace('\n', '')
link_list.append(link)
stat = time.time()
class myThread(threading.Thread):
def __init__(self, name, link_range):
threading.Thread.__init__(self)
self.name = name
self.link_range = link_range
def run(self):
print("Starting" + self.name)
crawler(self.name, self.link_range)
print("Exiting" + self.name)
def crawler(threaName, link_range):
for i in range(link_range[0], link_range[1] + 1):
try:
r = requests.get(link_list[i], timeout=20)
print(threaName, r.status_code, link_list[i])
except Exception as e:
print(threaName, 'Error:', e)
thread_list = []
link_range_list = [(0, 200), (201, 400), (401, 600), (601, 800), (801, 1000)]
# 创建
for i in range(1, 6):
thread = myThread("Thread-" + str(i), link_range_list[i - 1])
thread.start()
thread_list.append(thread)
# 等待所有线程完成
for i in thread_list:
i.join()
end = time.time()
print('简单多线程爬虫的总时长为:', end - stat)

上面代码中,将1000个网页分成5份,然后利用for循环创建了5个线程,将这些网页分别指派到5个线程中运行

使用Queue的多线程爬虫

python的Queue模块提供了同步的、线程安全的队列类,包括FIFO(先进先出)队列、LIFO(后入先出)队列和优先级队列PriorityQueue。
例子:
开启五个线程然后通过队列的方式,把一千个网页平均分配给这五个线程

link_list = [] # 网页连接
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
file_list = file.readlines()
for e in file_list:
link = e.split('\t')[1]
link = link.replace('\n', '')
link_list.append(link)
# 开始时间
stat = time.time()
# 继承Thread类
class myThread(threading.Thread):
def __init__(self, name, q):
threading.Thread.__init__(self)
self.name = name
self.q = q
def run(self):
print("Starting" + self.name)
while True:
try:
crawler(self.name, self.q)
except Exception as e:
break
print("Exiting" + self.name)
def crawler(threaName, q):
# 获取队列中的链接
url = q.get(timeout=2)
try:
r = requests.get(url, timeout=20)
print(q.qsize(), threaName, r.status_code, url)
except Exception as e:
print(q.qsize(), threaName, url, 'Error', e)
threadlist = ['Thread-1', 'Thread-2', 'Thread-3', 'Thread-4', 'Thread-5']
# 建立一个队列对象
workQueue = Queue.Queue(1000)
threads = []
# 创建新线程
for tName in threadlist:
thread = myThread(tName, workQueue)
thread.start()
threads.append(thread)
# 填充队列
for url in link_list:
workQueue.put(url) # 填充队列
# 等待所有线程完成
for t in threads:
t.join()
end = time.time()
print('简单多线程爬虫的总时长为:', end - stat)

多进程爬虫

python的多线程爬虫只能运行在单核上,各个线程以并发的方式异步运行。由于GIL的存在,多线程并不能发挥多核CPU的资源。
作为提升python网络爬虫的速度的另外一种方法,多进程爬虫则可以利用CPU的多核,多进程就需要用到multiprocessing这个库。
使用multiprocess这个库有两种方法,一种是使用Process+queue的方法,另外一种是pool+queue的方法。

使用multiprocessing的多进程爬虫

当进程数大于cpu的内核数量时,等待运行的进程会等其他进程运行完让出内核。所以我们需要了解计算机的cpu核心数量。

查看当前电脑spu核
from multiprocessing import cpu_count
print(cpu_count())

多线程爬虫实例:
1.Process+queue的方法,在多进程中,每个进程都可以单独设置它的属性,如果将daemon设置为true,当父进程结束后,子进程就会自动终止。

from multiprocessing import Queue, Process
link_list = []
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
file_list = file.readlines()
for e in file_list:
link = e.split('\t')[1]
link = link.replace('\n', '')
link_list.append(link)
stat = time.time()
# Process子进程
class MyProcess(Process):
def __init__(self, q):
Process.__init__(self)
self.q = q
def run(self):
print("Starting", self.pid)
while not self.q.empty():
crawler(self.q)
print("Exiting" + str(self.pid))
def crawler(q):
url = q.get(timeout=2)
try:
r = requests.get(url, timeout=20)
print(q.qsize(), r.status_code, url)
except Exception as e:
print(q.qsize(), url, 'Error', e)
if __name__ == '__main__':
workQueue = Queue(1000)
# 填充队列
for url in link_list:
workQueue.put(url)
for i in range(0, 5):
p = MyProcess(workQueue)
p.daemon = True
p.start()
p.join()
end = time.time()
print('简单多进程爬虫的总时长为:', end - stat)

2.使用pool+queue的多进程爬虫:当被操作数目不大时,可以直接利用multiprocessing中的process动态生成多个进程,十几个还好,如果成百上千个目标,手动的限制进程数量就太繁琐,此时可以使用pool发挥进程池的功效。

posted @   小旺first  阅读(822)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
顶部
点击右上角即可分享
微信分享提示