【URL Parsing】
urllib.parse.
urlparse
(urlstring, scheme='', allow_fragments=True)
Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment
. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:
Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
The scheme argument gives the default addressing scheme.
If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment
is set to the empty string in the return value.
urllib.parse.
parse_qs
(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.
urllib.parse.
parse_qsl
(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.
urllib.parse.
urlunparse
(parts)
Construct a URL from a tuple as returned by urlparse()
. The parts argument can be any six-item iterable.
urllib.parse.
urlsplit
(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse()
, but does not split the params from the URL.
urllib.parse.
urlunsplit
(parts)
Combine the elements of a tuple as returned by urlsplit()
into a complete URL as a string. The parts argument can be any five-item iterable.
urllib.parse.
urldefrag
(url)
If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.