浙江省高等学校教师教育理论培训

微信搜索“毛凌志岗前心得”小程序

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

20.5. urllib — Open arbitrary resources by URL — Python v2.7.2 documentation

.4. urllib Restrictions

  • Currently, only the following protocols are supported: HTTP, (versions 0.9 and
    1.0), FTP, and local files.

  • The caching feature of urlretrieve() has been disabled until I find the
    time to hack proper processing of Expiration time headers.

  • There should be a function to query whether a particular URL is in the cache.

  • For backward compatibility, if a URL appears to point to a local file but the
    file can’t be opened, the URL is re-interpreted using the FTP protocol. This
    can sometimes cause confusing error messages.

  • The urlopen() and urlretrieve() functions can cause arbitrarily
    long delays while waiting for a network connection to be set up. This means
    that it is difficult to build an interactive Web client using these functions
    without using threads.

  • The data returned by urlopen() or urlretrieve() is the raw data
    returned by the server. This may be binary data (such as an image), plain text
    or (for example) HTML. The HTTP protocol provides type information in the reply
    header, which can be inspected by looking at the Content-Type
    header. If the returned data is HTML, you can use the module htmllib to
    parse it.

  • The code handling the FTP protocol cannot differentiate between a file and a
    directory. This can lead to unexpected behavior when attempting to read a URL
    that points to a file that is not accessible. If the URL ends in a /, it is
    assumed to refer to a directory and will be handled accordingly. But if an
    attempt to read a file leads to a 550 error (meaning the URL cannot be found or
    is not accessible, often for permission reasons), then the path is treated as a
    directory in order to handle the case when a directory is specified by a URL but
    the trailing / has been left off. This can cause misleading results when
    you try to fetch a file whose read permissions make it inaccessible; the FTP
    code will try to read it, fail with a 550 error, and then perform a directory
    listing for the unreadable file. If fine-grained control is needed, consider
    using the ftplib module, subclassing FancyURLopener, or changing
    _urlopener to meet your needs.

  • This module does not support the use of proxies which require authentication.
    This may be implemented in the future.

  • Although the urllib module contains (undocumented) routines to parse
    and unparse URL strings, the recommended interface for URL manipulation is in
    module urlparse.

 

20.5.5. Examples

Here is an example session that uses the GET method to retrieve a URL
containing parameters:

>>>
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.read()

The following example uses the POST method instead:

>>>
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()

The following example uses an explicitly specified HTTP proxy, overriding
environment settings:

>>>
>>> import urllib
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.FancyURLopener(proxies)
>>> f = opener.open("http://www.python.org")
>>> f.read()

The following example uses no proxies at all, overriding environment settings:

>>>
>>> import urllib
>>> opener = urllib.FancyURLopener({})
>>> f = opener.open("http://www.python.org/")
>>> f.read()
posted on 2012-03-25 15:23  lexus  阅读(380)  评论(0编辑  收藏  举报