python网络爬虫-提升爬虫的速度(八)
提升爬虫的速度
从前面几篇已经可以从获取网页、解析网页、存储数据来实现一些基本的爬虫。现在记录一些进阶部分:提升爬虫速度,主要有3中方法:多线程爬虫、多进程爬虫、多协程爬虫。对比普通单线程爬虫,使用这3种方法爬虫的速度能成倍的提升。
并发和并行
并发是指在一段时间内发生的若干时间的情况
并行是值在同一时刻发生若干事件的情况
同步和异步
同步就是并发并行的各个任务不是独自运行的,任务之间有一定交替顺序,像接力赛一样。
异步就是并发和并行的各个任务独立运行互不干扰。每个任务都不在同一个赛道上面跑步的速度不受其他选手影响
多线程爬虫
多线程爬虫是以并发的方式执行的。也就是说,多线程并不能真正的同时执行,而是通过进程的快速切换加快网络爬虫的速度的。
在操作IO的时候使用多线程可以提升程序执行效率
简单单线程爬虫
点击查看代码
http://www.baidu.com http://www.qq.com http://www.naver.com http://www.taobao.com http://www.reddit.com http://www.sohu.com http://www.tmall.com http://www.sina.com.cn http://www.daum.net http://www.jd.com http://www.360.cn http://www.weibo.com http://www.aliexpress.com http://www.linkedin.com http://www.alipay.com http://www.hao123.com http://www.csdn.net http://www.youth.cn http://www.live.com http://www.tianya.cn http://www.microsoftonline.com http://www.office.com http://www.soso.com http://www.so.com http://www.gmw.cn http://www.china.com http://www.nate.com http://www.huaban.com http://www.bing.com http://www.xinhuanet.com http://www.youku.com http://www.zhihu.com http://www.cctv.com http://www.airasia.com http://www.douyu.com http://www.babytree.com http://www.apple.com http://www.sogou.com http://www.china.com.cn http://www.yelp.com http://www.ocbc.com http://www.microsoft.com http://www.mama.cn http://www.bitauto.com http://www.bankofamerica.com http://www.1688.com http://www.stackoverflow.com http://www.163.com http://www.39.net http://www.cnblogs.com http://www.bilibili.com http://www.interpark.com http://www.huanqiu.com http://www.cnzz.com http://www.chinadaily.com.cn http://www.openrice.com http://www.msn.com http://www.k618.cn http://www.yesky.com http://www.caijing.com.cn http://www.emirates.com http://www.amazon.cn http://www.aliyun.com http://www.eastday.com http://www.youdao.com http://www.oeeee.com http://www.ci123.com http://www.baike.com http://www.adobe.com http://www.rednet.cn http://www.iqiyi.com http://www.wemakeprice.com http://www.douban.com http://www.familydoctor.com.cn http://www.agoda.com http://www.jrj.com.cn http://www.read01.com http://www.17ok.com http://www.chinaz.com http://www.youboy.com http://www.tesco.com http://www.alibaba.com http://www.gearbest.com http://www.51sole.com http://www.dbs.com http://www.suning.com http://www.oschina.net http://www.voc.com.cn http://www.zol.com.cn http://www.asos.com http://www.chinaso.com http://www.jianshu.com http://www.ifeng.com http://www.stockstar.com http://www.zhanqi.tv http://www.52pk.com http://www.whatsbuying.com http://www.cqnews.net http://www.gongchang.com http://www.godaddy.com http://www.godaddy.com http://www.wtoip.com http://www.segmentfault.com http://www.evernote.com http://www.dianping.com http://www.qingdaonews.com http://www.guancha.cn http://www.standardchartered.com http://www.singaporeair.com http://www.toutiao.com http://www.jiameng.com http://www.dm5.com http://www.w3school.com.cn http://www.zhaopin.com http://www.99.com http://www.mi.com http://www.b2b.cn http://www.cathaypacific.com http://www.southcn.com http://www.battle.net http://www.ups.com http://www.jb51.net http://www.comcast.net http://www.alicdn.com http://www.v2ex.com http://www.firefoxchina.cn http://www.360doc.com http://www.xunlei.com http://www.sharepoint.com http://www.scol.com.cn http://www.admaimai.com http://www.v1.cn http://www.51cto.com http://www.jqw.com http://www.bzw315.com http://www.126.com http://www.beanfun.com http://www.chooseauto.com.cn http://www.renren.com http://www.taleo.net http://www.51.la http://www.zcool.com.cn http://www.4399.com http://www.duba.com http://www.globaltimes.cn http://www.ycwb.com http://www.sfacg.com http://www.hotelscombined.com http://www.mydrivers.com http://www.taoche.com http://www.runoob.com http://www.tlscontact.com http://www.nba.com http://www.gamebase.com.tw http://www.zhibo8.cc http://www.hexun.com http://www.xiami.com http://www.finnair.com http://www.feng.com http://www.cdstm.cn http://www.uniqlo.com http://www.iciba.com http://www.qudong.com http://www.panda.tv http://www.cnbeta.com http://www.nipic.com http://www.sznews.com http://www.huawei.com http://www.tuicool.com http://www.baimao.com http://www.umeng.com http://www.ccidnet.com http://www.klm.com http://www.qcloud.com http://www.hupu.com http://www.ikanman.com http://www.3dmgame.com http://www.icolor.com.cn http://www.360.com http://www.36kr.com http://www.miui.com http://www.boc.cn http://www.gamersky.com http://www.joyme.com http://www.17173.com http://www.uc.cn http://www.alimama.com http://www.oasgames.com http://www.focus.cn http://www.cnr.cn http://www.miomio.tv http://www.jjwxc.net http://www.5dcar.com http://www.hjenglish.com http://www.dangdang.com http://www.springer.com http://www.to8to.com http://www.xiaomi.com http://www.ctrip.com http://www.delta.com http://www.anjuke.com http://www.cnki.net http://www.surveymonkey.com http://www.tower.im http://www.baiducontent.com http://www.acfun.cn http://www.people.com.cn http://www.jmw.com.cn http://www.worktile.com http://www.newsmth.net http://www.vmall.com http://www.07073.com http://www.qyer.com http://www.hujiang.com http://www.cnnic.cn http://www.meituan.com http://www.yinxiang.com http://www.ngacn.cc http://www.smzdm.com http://www.ccb.com http://www.ali213.net http://www.alibaba-inc.com http://www.3158.cn http://www.vmall.com http://www.nike.com http://www.eqxiu.com http://www.jandan.net http://www.office365.com http://www.imooc.com http://www.ikea.com http://www.united.com http://www.ly.com http://www.epwk.com http://www.tudou.com http://www.leagueoflegends.com http://www.aa.com http://www.garena.com http://www.mafengwo.cn http://www.ifensi.com http://www.pptv.com http://www.fobshanghai.com http://www.asiamiles.com http://www.znds.com http://www.hc360.com http://www.job853.com http://www.sf-express.com http://www.lianjia.com http://www.guokr.com http://www.cmbchina.com http://www.modernweekly.com http://www.ynet.com http://www.dell.com http://www.dict.cn http://www.yinyuetai.com http://www.aizhan.com http://www.gome.com.cn http://www.meishichina.com http://www.51hejia.com http://www.ule.com http://www.ea3w.com http://www.saraba1st.com http://www.chsi.com.cn http://www.vlive.tv http://www.sonhoo.com http://www.hongkongairlines.com http://www.jxnews.com.cn http://www.free.com.tw http://www.docin.com http://www.liepin.com http://www.chinaunix.net http://www.weibo.cn http://www.ifanr.com http://www.51auto.com http://www.ebrun.com http://www.10010.com http://www.hebei.com.cn http://www.tgbus.com http://www.mtime.com http://www.vip.com http://www.kdslife.com http://www.www.gov.cn http://www.cncn.org.cn http://www.techcrunch.com http://www.zbj.com http://www.ip138.com http://www.cyol.com http://www.pc6.com http://www.joox.com http://www.178.com http://www.lagou.com http://www.18183.com http://www.365jia.cn http://www.autohome.com.cn http://www.battlenet.com.cn http://www.oracle.com http://www.miaopai.com http://www.sina.cn http://www.ch.com http://www.yxdown.com http://www.etao.com http://www.vietnamairlines.com http://www.iyiou.com http://www.shop.com http://www.588ku.com http://www.le.com http://www.sina.com http://www.jstv.com http://www.ceconline.com http://www.koreanair.com http://www.skype.com http://www.ih5.cn http://www.ems.com.cn http://www.efu.com.cn http://www.pcbaby.com.cn http://www.shimo.im http://www.macaolife.com http://www.xiu.com http://www.eastmoney.com http://www.xiumi.us http://www.yhd.com http://www.jiemian.com http://www.daikuan.com http://www.ximalaya.com http://www.marriott.com http://www.d1ev.com http://www.xitek.com http://www.chuansong.me http://www.alitrip.com http://www.xiaomi.cn http://www.51job.com http://www.91jm.com http://www.2cto.com http://www.qoo10.com http://www.centadata.com http://www.lufthansa.com http://www.techweb.com.cn http://www.kugou.com http://www.80018.cn http://www.tmtpost.com http://www.house365.com http://www.hp.com http://www.unity3d.com http://www.zoom.us http://www.kafan.cn http://www.liansuo.com http://www.netease.com http://www.10jqka.com.cn http://www.xiazaiba.com http://www.fang.com http://www.smartisan.com http://www.photofans.cn http://www.ooopic.com http://www.zybang.com http://www.gw-ec.com http://www.wed114.cn http://www.huomao.com http://www.ithome.com http://www.ccb.com.cn http://www.chinanews.com http://www.doc88.com http://www.sanguosha.com http://www.evaair.com http://www.icbc.com.cn http://www.youxidudu.com http://www.verycd.com http://www.netcoc.com http://www.pepper.com http://www.dygang.com http://www.liaoxuefeng.com http://www.flyasiana.com http://www.sciencenet.cn http://www.feiyang.com http://www.800hr.com http://www.iconfont.cn http://www.youzan.com http://www.360kan.com http://www.chinabyte.com http://www.samsung.com http://www.zxart.cn http://www.gucheng.com http://www.bootcss.com http://www.cankaoxiaoxi.com http://www.58pic.com http://www.81.cn http://www.csair.com http://www.chiphell.com http://www.antpedia.com http://www.xiachufang.com http://www.winshang.com http://www.fzg360.com http://www.chaduo.com http://www.12306.cn http://www.morningpost.com.cn http://www.soku.com http://www.sspai.com http://www.yoox.com http://www.huxiu.com http://www.nyu.edu http://www.jiwu.com http://www.u17.com http://www.jiayuan.com http://www.yy.com http://www.duowan.com http://www.mbalib.com http://www.wanfangdata.com.cn http://www.ibuying.com http://www.chouti.com http://www.71.net http://www.hrloo.com http://www.meizu.com http://www.miercn.com http://www.fengniao.com http://www.fangdd.com http://www.htc.com http://www.jdzj.com http://www.pcauto.com.cn http://www.kaola.com http://www.kuaidi100.com http://www.yougov.com http://www.ku6.com http://www.sanwen8.cn http://www.yiwugou.com http://www.lottedfs.com http://www.cisco.com http://www.wallstreetcn.com http://www.gamedog.cn http://www.tencent.com http://www.tvhome.com http://www.xbox.com http://www.cr173.com http://www.onlinedown.net http://www.ebay.com.hk http://www.searchs.cn http://www.17track.net http://www.hyundai.com http://www.baixing.com http://www.258.com http://www.cn2che.com http://www.pudn.com http://www.dv37.com http://www.dv37.com http://www.uisdc.com http://www.sojump.com http://www.d1net.com http://www.ganji.com http://www.jobbole.com http://www.pearsoncmg.com http://www.kongfz.com http://www.365jilin.com http://www.strawberrynet.com http://www.11467.com http://www.jobui.com http://www.hh010.com http://www.teambition.com http://www.woshipm.com http://www.lge.com http://www.kanxi.cc http://www.leiphone.com http://www.d1com.com http://www.114so.cn http://www.d1com.com http://www.114so.cn http://www.duomai.com http://www.win007.com http://www.weidian.com http://www.qiku.com http://www.cli.im http://www.flyertea.com http://www.lenovo.com.cn http://www.aso100.com http://www.xueqiu.com http://www.bp.com http://www.dingtalk.com http://www.processon.com http://www.flyme.cn http://www.a9vg.com http://www.sinaimg.cn http://www.saic.gov.cn http://www.mgtv.com http://www.nuomi.com http://www.tiexue.net http://www.vvvdj.com http://www.tvmao.com http://www.panduoduo.net http://www.wechat.com http://www.52pojie.cn http://www.miwifi.com http://www.iteye.com http://www.kanzhun.com http://www.mango.com http://www.cheaa.com http://www.13322.com http://www.jikexueyuan.com http://www.taisha.org http://www.mydigit.cn http://www.gusuwang.com http://www.pinggu.org http://www.lbldy.com http://www.sgcn.com http://www.misumi-ec.com http://www.lofter.com http://www.unrealengine.com http://www.gao7.com http://www.leju.com http://www.home77.com http://www.qunar.com http://www.xdowns.com http://www.oa.com http://www.sgcn.com http://www.szjy188.com http://www.tuniu.com http://www.135editor.com http://www.f.com http://www.zhibo.tv http://www.jiyoujia.com http://www.95516.com http://www.yiqifa.com http://www.cocoachina.com http://www.babyschool.com.cn http://www.iweihai.cn http://www.haowu.com http://www.hm.com http://www.wish.com http://www.fitbit.com http://www.taojindi.com http://www.koolearn.com http://www.xabbs.com http://www.020.com http://www.qiniu.com http://www.25pp.com http://www.nga.cn http://www.educity.cn http://www.zealer.com http://www.xdowns.com http://www.liqu.com http://www.qichacha.com http://www.51credit.com http://www.duomai.com http://www.juooo.com http://www.shanbay.com http://www.juooo.com http://www.shanbay.com http://www.meishij.net http://www.th7.cn http://www.jia400.com http://www.cas.cn http://www.wenwuchina.com http://www.189.cn http://www.liuxue86.com http://www.klook.com http://www.shfft.com http://www.8264.com http://www.china.cn http://www.zhifang.com http://www.made-in-china.com http://www.rabbitpre.com http://www.sap.com http://www.macx.cn http://www.everychina.com http://www.9game.cn http://www.ca800.com http://www.dgtle.com http://www.cloudscar.com http://www.bdhome.cn http://www.news18a.com http://www.shilladfs.com http://www.net-a-porter.com http://www.zealer.com http://www.discoverhongkong.com http://www.80s.tw http://www.9ku.com http://www.33lc.com http://www.thepaper.cn http://www.scswl.cn http://www.officedepot.com http://www.fx678.com http://www.banma.com http://www.eee114.com http://www.9384.com http://www.xuexila.com http://www.9384.com http://www.xuexila.com http://www.cheshen.cn http://www.mr-world.com http://www.fx112.com http://www.97665.com http://www.chinahr.com http://www.acs.org http://www.mikecrm.com http://www.checheng.com http://www.appgame.com http://www.linkhaitao.com http://www.meipai.com http://www.linuxidc.com http://www.fliggy.com http://www.amap.com http://www.4px.com http://www.qpic.cn http://www.modao.cc http://www.dianxiaomi.com http://www.56.com http://www.java.com http://www.hdpfans.com http://www.thinkphp.cn http://www.2345.com http://www.baoku.com http://www.tiancity.com http://www.bcsh.com http://www.bozhong.com http://www.zhiding.cn http://www.longzhu.com http://www.xjtour.com http://www.kancloud.cn http://www.open-open.com http://www.itpub.net http://www.elong.com http://www.pchome.net http://www.pps.tv http://www.qinqinbaby.com http://www.chuandong.com http://www.coding.net http://www.yidianzixun.com http://www.51nb.com http://www.dhgate.com http://www.10086.cn http://www.6vhao.com http://www.5acbd.com http://www.atobo.com.cn http://www.kubo365.com http://www.111cn.net http://www.zhongmin.cn http://www.weiyangx.com http://www.juesheng.com http://www.uuu9.com http://www.siilu.com http://www.pconline.com.cn http://www.dji.com http://www.west.cn http://www.ctfile.com http://www.idianfa.com http://www.smm.cn http://www.shejis.com http://www.zhangyu.tv http://www.17zwd.com http://www.dhl.com http://www.shfft.com http://www.wanmei.com http://www.122.gov.cn http://www.51nb.com http://www.xici.net http://www.cnki.com.cn http://www.redocn.com http://www.qvc.com http://www.aipai.com http://www.dapenti.com http://www.3lian.com http://www.guidechem.com http://www.jiankang.com http://www.tgfcer.com http://www.freebuf.com http://www.sodao.com http://www.zhcw.com http://www.sh.com http://www.ablesky.com http://www.microsoftstore.com.cn http://www.7k7k.com http://www.southmoney.com http://www.btc123.com http://www.digitaling.com http://www.meitu.com http://www.chinaaet.com http://www.kaoyan.com http://www.aipai.com http://www.tripadvisor.cn http://www.colg.cn http://www.admin5.com http://www.ncar.cc http://www.intel.com http://www.wanyx.com http://www.chmotor.cn http://www.mxhichina.com http://www.jzb.com http://www.it168.com http://www.1kkk.com http://www.cnodejs.org http://www.hudong.com http://www.ucweb.com http://www.xyw.gov.cn http://www.airasiago.com http://www.damai.cn http://www.farnell.com http://www.hi-pda.com http://www.wenku1.com http://www.haosou.com http://www.ishuhui.com http://www.paopaoche.net http://www.csai.cn http://www.zhaoshangbao.com http://www.eol.cn http://www.excelhome.net http://www.missevan.com http://www.cncv.org.cn http://www.365yg.com http://www.huim.com http://www.zxxk.com http://www.51yes.com http://www.cainiao.com http://www.nh87.cn http://www.b0yp.com http://www.qdaily.com http://www.kongzhong.com http://www.shangc.net http://www.dongqiudi.com http://www.jiankang.com http://www.dzsc.com http://www.chinaacc.com http://www.vcg.com http://www.oneplusbbs.com http://www.xuetangx.com http://www.fz222.com http://www.cnwnews.com http://www.chinadmd.com http://www.b2b168.com http://www.pingan.com http://www.pushauction.com http://www.sdo.com http://www.9978.cn http://www.ltaaa.com http://www.gxyj.com http://www.kuaizhan.com http://www.airchina.com.cn http://www.gcl-power.com http://www.medsci.cn http://www.lbxcn.com http://www.lzgd.com.cn http://www.oray.com http://www.taobao.org http://www.btbtdy.com http://www.i2ya.com http://www.istar.cn http://www.xgo.com.cn http://www.66law.cn http://www.heiguang.com http://www.ao.com http://www.jq22.com http://www.qidian.com http://www.goldcarpet.cn http://www.zxbtz.cn http://www.jiushang.cn http://www.cicpa.org.cn http://www.wowenda.com http://www.coursera.org http://www.fangdr.com http://www.cps.com.cn http://www.kmf.com http://www.cri.cn http://www.lmjx.net http://www.lonshinetech.cn http://www.infoq.com http://www.gushiwen.org http://www.ecp888.com http://www.tongtool.com http://www.dajie.com http://www.co188.com http://www.fumanhua.net http://www.maiche168.com http://www.sankuai.com http://www.ucas.ac.cn http://www.lamabang.com http://www.huajiao.com http://www.accorhotels.com http://www.wendangku.net http://www.dragonparking.com http://www.6789.com http://www.xdf.cn http://www.tucao.tv http://www.91yunxiao.com http://www.liebiao.com http://www.9lianmeng.com http://www.51240.com http://www.zhiyoo.com http://www.silkair.com http://www.313.cn http://www.ssl-images-amazon.com http://www.eepw.com.cn http://www.gs307.com http://www.yindou.com http://www.i1515.com http://www.imiker.com http://www.lvmama.com http://www.louisvuitton.com http://www.nowgoal.com http://www.makeding.com http://www.xz7.com http://www.guitarchina.com http://www.wto168.net http://www.abchina.com http://www.fzdm.com http://www.ichacha.net http://www.1024sj.com http://www.ef43.com.cn http://www.newrank.cn http://www.ceair.com http://www.zimuku.net http://www.ppkoo.com http://www.jc35.com http://www.dnspod.cn http://www.hsw.cn http://www.caixin.com http://www.manmanbuy.com http://www.23us.com http://www.asus.com http://www.zoosnet.net http://www.xp510.com http://www.vgtime.com http://www.qiushibaike.com http://www.jinshuju.net http://www.115.com http://www.3367.com http://www.fanli.com http://www.newcger.com http://www.kepu.net.cn http://www.findlaw.cn http://www.jiumei.com http://www.gkstk.com http://www.ihg.com http://www.blizzard.com http://www.lenovo.com http://www.longau.com http://www.seedit.com http://www.ofweek.com http://www.61baobao.com http://www.400.cn http://www.wines-info.com http://www.innisfree.com http://www.weather.com.cn http://www.che168.com http://www.dilidili.wang http://www.7po.com http://www.qiushibaike.com http://www.9r.cn http://www.weather.com.cn http://www.107cine.com http://www.coolapk.com http://www.ixueshu.com http://www.iplaysoft.com http://www.blizzard.cn http://www.dangbei.com http://www.hellorf.com http://www.21food.cn http://www.libaclub.com http://www.outofmemory.cn http://www.ele.me http://www.shihuo.cn http://www.zmz2017.com http://www.zybuluo.com http://www.66ys.tv http://www.sczw.com http://www.xtx6.com http://www.tutorabc.com http://www.zhipin.com http://www.cgdc.com.cn http://www.61learn.com http://www.sm.cn http://www.571xz.com http://www.sobt5.org http://www.starwoodhotels.com http://www.qqtn.com http://www.sgamer.com http://www.120ask.com http://www.appinn.com http://www.qianzhan.com http://www.888pic.com http://www.tianyancha.com http://www.k73.com http://www.yiibai.com http://www.downxia.com http://www.managershare.com http://www.downcc.com http://www.biquge.tw http://www.fgowiki.com http://www.p2peye.com http://www.haosou.com http://www.yimu100.com http://www.fox.com http://www.mrporter.com http://www.genshuixue.com http://www.jisutiyu.com http://www.topfo.com http://www.right.com.cn http://www.5ewin.com http://www.dongnanshan.com http://www.jizhangla.com http://www.laawoo.com http://www.3618med.com http://www.ahgame.com http://www.mamicode.com http://www.wugu.com.cn http://www.115.com http://www.genshuixue.com http://www.57mh.com http://www.oiegg.com http://www.21csp.com.cn http://www.kekenet.com http://www.c5game.com http://www.juejin.im http://www.baofeng.com http://www.kuwo.cn http://www.6.cn http://www.chayu.com http://www.sanwen.net http://www.962.net http://www.etest.net.cn http://www.innisfree.com http://www.dragonair.com http://www.vjshi.com http://www.lawtime.cn http://www.sccnn.com http://www.qqbaobao.com http://www.dragonair.com http://www.vjshi.com http://www.lawtime.cn http://www.sccnn.com http://www.qqbaobao.com http://www.chinaswitch.com http://www.5118.com http://www.cntv.cn http://www.knowsky.com http://www.skyscanner.com http://www.wrz.com http://www.wasu.cn http://www.mojifen.com http://www.nvidia.com http://www.oceanpark.com.hk http://www.pcbeta.com http://www.psnine.com http://www.228.com.cn http://www.zhuixinfan.com http://www.okcoin.cn http://www.huya.com http://www.1ppt.com http://www.fyber.com http://www.72byte.com http://www.cpic.com.cn http://www.wlmq.com http://www.lusongsong.com http://www.fanjian.net http://www.hopetrip.com.hk http://www.hnjy.com.cn http://www.8kana.com http://www.8d.cc http://www.linux.cn http://www.enterprise.com http://www.iqing.in http://www.sg560.com http://www.mnw.cn http://www.trendmicro.com http://www.sipo.gov.cn http://www.a.com.cn http://www.hangame.com http://www.cngold.org http://www.95095.com http://www.ishuo.cn http://www.tecenet.com http://www.jinti.com http://www.sobaidupan.com http://www.ichunqiu.com http://www.xilu.com http://www.3987.com http://www.rr-sc.com http://www.99114.com http://www.haodou.com http://www.wolfram.com http://www.expreview.com http://www.myexception.cn http://www.shixiseng.com http://www.bjjs.gov.cn http://www.xxbiquge.com http://www.lesports.com http://www.hea.cn http://www.24home.com http://www.yeah.net http://www.qcw.com http://www.shoes.net.cn http://www.9c9v.com http://www.bjhjyd.gov.cn http://www.ecvv.com http://www.fanlibang.com http://www.jxmall.com http://www.xcar.com.cn http://www.go108.com.cn http://www.divcss5.com http://www.sc.com http://www.watchstore.com.cn http://www.mexgroup.com http://www.xunyingwang.com http://www.chinagate.cn http://www.zdic.net http://www.bdimg.com
link_list = [] with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file: file_list = file.readlines() for e in file_list: link = e.split('\t')[1] link = link.replace('\n', '') link_list.append(link) stat = time.time() for e in link_list: try: r = requests.get(e) print(r.status_code, e) except Exception as erro: print('Error:', erro) end = time.time() print('串行的总时长为:', end - stat)
学习python多线程
python两种使用多线程的方法。
函数式:调用_thread模块中的start_new_thread()
类包装式:调用Threading库创建线程,从threading.thread继承。
1.
# 为线程定义一个函数 def print_time(threadName, delay): count = 0 while count < 3: time.sleep(delay) count += 1 print(threadName, time.ctime())
# _thread.start_new_thread(print_time, ("Thread-1", 1)) # _thread.start_new_thread(print_time, ("Thread-2", 2)) # print("Main Finished") class myThread(threading.Thread): def __init__(self, name, delay): threading.Thread.__init__(self) self.name = name self.delay = delay def run(self): print("Starting" + self.name) print_time(self.name, self.delay) print("Exiting" + self.name) def print_time(threadName, delay): counter = 0 while counter < 3: time.sleep(delay) print(threadName, time.ctime()) counter += 1 threads = [] # 创建新线程 thread1 = myThread("Thread-1", 1) thread2 = myThread("Thread-2", 2) # 开启新线程 thread1.start() thread2.start() # 添加线程到线程列表 threads.append(thread1) threads.append(thread2) # 等待所有线程完成 for t in threads: t.join() print("Exiting Main Thread")
run():以表示线程活动的方法
start():启动线程活动
join([time]):组设调用线程直至线程的join()方法被调用为止
isAlive():返回线程是否是活动的
getNmae():返回线程名称
setName():设置线程名
上面代码中,thread1 = myThread("Thread-1", 1),然后在myThread这个类中对线程进行设置,使用run()表示线程运行方法当counter小于3时打印线程名称和时间。然后使用thread1.start()开启线程,使用threads.append(thread1)添加线程到线程列表中,用t.join()等待所有线程完成才会继续执行主线程。
简单的多线程爬虫实例
import threading link_list = [] with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file: file_list = file.readlines() for e in file_list: link = e.split('\t')[1] link = link.replace('\n', '') link_list.append(link) stat = time.time() class myThread(threading.Thread): def __init__(self, name, link_range): threading.Thread.__init__(self) self.name = name self.link_range = link_range def run(self): print("Starting" + self.name) crawler(self.name, self.link_range) print("Exiting" + self.name) def crawler(threaName, link_range): for i in range(link_range[0], link_range[1] + 1): try: r = requests.get(link_list[i], timeout=20) print(threaName, r.status_code, link_list[i]) except Exception as e: print(threaName, 'Error:', e) thread_list = [] link_range_list = [(0, 200), (201, 400), (401, 600), (601, 800), (801, 1000)] # 创建 for i in range(1, 6): thread = myThread("Thread-" + str(i), link_range_list[i - 1]) thread.start() thread_list.append(thread) # 等待所有线程完成 for i in thread_list: i.join() end = time.time() print('简单多线程爬虫的总时长为:', end - stat)
上面代码中,将1000个网页分成5份,然后利用for循环创建了5个线程,将这些网页分别指派到5个线程中运行
使用Queue的多线程爬虫
python的Queue模块提供了同步的、线程安全的队列类,包括FIFO(先进先出)队列、LIFO(后入先出)队列和优先级队列PriorityQueue。
例子:
开启五个线程然后通过队列的方式,把一千个网页平均分配给这五个线程
link_list = [] # 网页连接 with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file: file_list = file.readlines() for e in file_list: link = e.split('\t')[1] link = link.replace('\n', '') link_list.append(link) # 开始时间 stat = time.time() # 继承Thread类 class myThread(threading.Thread): def __init__(self, name, q): threading.Thread.__init__(self) self.name = name self.q = q def run(self): print("Starting" + self.name) while True: try: crawler(self.name, self.q) except Exception as e: break print("Exiting" + self.name) def crawler(threaName, q): # 获取队列中的链接 url = q.get(timeout=2) try: r = requests.get(url, timeout=20) print(q.qsize(), threaName, r.status_code, url) except Exception as e: print(q.qsize(), threaName, url, 'Error', e) threadlist = ['Thread-1', 'Thread-2', 'Thread-3', 'Thread-4', 'Thread-5'] # 建立一个队列对象 workQueue = Queue.Queue(1000) threads = [] # 创建新线程 for tName in threadlist: thread = myThread(tName, workQueue) thread.start() threads.append(thread) # 填充队列 for url in link_list: workQueue.put(url) # 填充队列 # 等待所有线程完成 for t in threads: t.join() end = time.time() print('简单多线程爬虫的总时长为:', end - stat)
多进程爬虫
python的多线程爬虫只能运行在单核上,各个线程以并发的方式异步运行。由于GIL的存在,多线程并不能发挥多核CPU的资源。
作为提升python网络爬虫的速度的另外一种方法,多进程爬虫则可以利用CPU的多核,多进程就需要用到multiprocessing这个库。
使用multiprocess这个库有两种方法,一种是使用Process+queue的方法,另外一种是pool+queue的方法。
使用multiprocessing的多进程爬虫
当进程数大于cpu的内核数量时,等待运行的进程会等其他进程运行完让出内核。所以我们需要了解计算机的cpu核心数量。
查看当前电脑spu核 from multiprocessing import cpu_count print(cpu_count())
多线程爬虫实例:
1.Process+queue的方法,在多进程中,每个进程都可以单独设置它的属性,如果将daemon设置为true,当父进程结束后,子进程就会自动终止。
from multiprocessing import Queue, Process link_list = [] with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file: file_list = file.readlines() for e in file_list: link = e.split('\t')[1] link = link.replace('\n', '') link_list.append(link) stat = time.time() # Process子进程 class MyProcess(Process): def __init__(self, q): Process.__init__(self) self.q = q def run(self): print("Starting", self.pid) while not self.q.empty(): crawler(self.q) print("Exiting" + str(self.pid)) def crawler(q): url = q.get(timeout=2) try: r = requests.get(url, timeout=20) print(q.qsize(), r.status_code, url) except Exception as e: print(q.qsize(), url, 'Error', e) if __name__ == '__main__': workQueue = Queue(1000) # 填充队列 for url in link_list: workQueue.put(url) for i in range(0, 5): p = MyProcess(workQueue) p.daemon = True p.start() p.join() end = time.time() print('简单多进程爬虫的总时长为:', end - stat)
2.使用pool+queue的多进程爬虫:当被操作数目不大时,可以直接利用multiprocessing中的process动态生成多个进程,十几个还好,如果成百上千个目标,手动的限制进程数量就太繁琐,此时可以使用pool发挥进程池的功效。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通