Which is best in Python: urllib2, PycURL or mechanize? - Stack Overflow
Ok so I need to download some web pages using Python and did a quick investigation of my options.
Included with Python:
urllib - seems to me that I should use urllib2 instead. urllib has no cookie support, HTTP/FTP/local files only (no SSL)
urllib2 - complete HTTP/FTP client, supports most needed things like cookies, does not support all HTTP verbs (only GET and POST, no TRACE, etc.)
Full featured:
mechanize - can use/save Firefox/IE cookies, take actions like follow second link, actively maintained (0.2.5 released in March 2011)
PycURL - supports everything curl does (FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP), bad news: not updated since Sep 9, 2008 (7.19.0)
New possibilities:
urllib3 - supports connection re-using/pooling and file posting
Deprecated (a.k.a. use urllib/urllib2 instead):
httplib - HTTP/HTTPS only (no FTP)
httplib2 - HTTP/HTTPS only (no FTP)
The first thing that strikes me is that urllib/urllib2/PycURL/mechanize are all pretty mature solutions that work well. mechanize and PycURL ship with a number of Linux distributions (e.g. Fedora 13) and BSDs so installation is a non issue typically (so that's good).
urllib2 looks good but I'm wondering why PycURL and mechanize both seem very popular, is there something I am missing (i.e. if I use urllib2 will I paint myself in to a corner at some point?). I'd really like some feedback on the pros/cons of these things so I can make the best choice for myself.
Edit: added note on verb support in urllib2
83% accept rate
upvote flag What does "best" mean? Best with respect to what? Fastest? Largest? Best use of Cookies? What do you need to do? – S.Lott Mar 5 '10 at 11:03
1 upvote flag httplib isn't "deprecated". It is a lower level module that urllib2 is built on top of. you can use it directly, but it is easier via urllib2 – Corey Goldberg Mar 5 '10 at 16:48
1 upvote flag What Corey said, e.g. urllib3 is a layer on top of httplib. Also, httplib2 is not deprecated - in fact it's newer than urllib2 and fixes problems like connection reuse (same with urllib3). – Yang Apr 21 '11 at 1:03
1 upvote flag There is a newer library called requests. See docs.python-requests.org/en/latest/index.html – ustun Jun 30 '11 at 21:11
urllib2
is found in every Python install everywhere, so is a good base upon which to start.PycURL
is useful for people already used to using libcurl, exposes more of the low-level details of HTTP, plus it gains any fixes or improvements applied to libcurl.mechanize
is used to persistently drive a connection much like a browser would.It's not a matter of one being better than the other, it's a matter of choosing the appropriate tool for the job.