Scrapy开发爬虫之环境搭建
1.安装python3.4虚拟环境
1 python -m virtualenv -p D:\python3.4.3\python.exe Scrapys
2.安装Scrapy框架
2.1
升级pip
安装好之后在cmd里执行 python -m pip install -upgrade pip
把pip提到最新版本
下载lxml 以及twisted
lxml是解析网页用的,scrapy依赖于它,它是一个第三方的库,这里推荐一个python第三方库的网站
http://www.lfd.uci.edu/~gohlke/pythonlibs/ ,里边都是编译好的,windows下python使用者的福利。
Ctrl+F定位一下lxml
注意:打开网址后ctrl+F,搜索LXML,选择对应版本
下载好之后,cmd进入下载目录,直接用pip安装 pip install lxml-3.6.4-cp35-cp35m-win_32.whl
Twisted是用Python实现的基于事件驱动的网络引擎框架,下载安装方法同上,两个是依赖关系
在这个文档中说明了安装所需的依赖 文档
1 这里列一下 2 Things that are good to know 3 Scrapy is written in pure Python and depends on a few key Python packages (among others): 4 5 lxml, an efficient XML and HTML parser 6 parsel, an HTML/XML data extraction library written on top of lxml, 7 w3lib, a multi-purpose helper for dealing with URLs and web page encodings 8 twisted, an asynchronous networking framework 9 cryptography and pyOpenSSL, to deal with various network-level security needs 10 The minimal versions which Scrapy is tested against are: 11 12 Twisted 14.0 13 lxml 3.4 14 pyOpenSSL 0.14 15 Scrapy may work with older versions of these packages but it is not guaranteed it will continue working because it’s not being tested against them.
注意
在安装scrapy或者是twisted之前,最好检查一下电脑上是否有visual c++ build tools,不然在安装的时候 会报错,至于visual c++ build tools的下载路径,在报的错中有。
安装scrapy
pip安装最方便 pip install scrapy
查看版本 scrapy version
输出应该是Scrapy 1.5,也是官网的最新版本。
可能遇到的问题
以下内容汇总了csdn和stackoverflow部分内容
1.Python error: Unable to find vcvarsall.bat
安装MinGW(实测)
1、下载安装MinGW,下载地址为:点击打开链接
2、在MinGW的安装目录下找到bin文件夹,找到mingw32-make.exe,复制一份更名为make.exe
3、把MinGW的路径添加到环境变量path中,比如我把MinGW安装到D:\MinGW\中,就把D:\MinGW\bin添加到path中;
4、在<python安装目录>\distutils
(需要在python安装目录下ctrl+f一下distutils)增加文件distutils.cfg,在文件里输入 [build]
compiler=mingw32
保存;
2、error: command‘gcc’failed: No such file or directory
解决方案是将D:\MinGW\lib再添加到PATH中。
3、ValueError: Unknown MS Compiler version 1900
Cygwinccompiler.py中
get_msvcr()函数
在
elif msc_ver == '1600': # VS2010 / MSVC 10.0 return ['msvcr100']
后面,添加以下内容
1 elif msc_ver == '1700': 2 # Visual Studio 2012 / Visual C++ 11.0 3 return ['msvcr110'] 4 elif msc_ver == '1800': 5 # Visual Studio 2013 / Visual C++ 12.0 6 return ['msvcr120'] 7 elif msc_ver == '1900': 8 # Visual Studio 2015 / Visual C++ 14.0 9 # "msvcr140.dll no longer exists" http://blogs.msdn.com/b/vcblog/archive/2014/06/03/visual-studio-14-ctp.aspx 10 return ['vcruntime140']
然后将python3.5文件夹下的vcruntime140.dll复制到D:\MinGW\mingw32\lib
下
4、TypeError: unorderable types: NoneType() >= str()
重启一下试试
5、error: The ‘pyasn1’ distribution was not found and is required by service-identity
在运行一下setup.py试试