QUrl
QUrl
Detailed Description
The QUrl class provides a convenient interface for working with URLs.
It can parse and construct URLs in both encoded and unencoded form. QUrl also has support for internationalized domain names (IDNs).
The most common way to use QUrl is to initialize it via the constructor by passing a QString. Otherwise, setUrl() can also be used.
URLs can be represented in two forms: encoded or unencoded. The unencoded representation is suitable for showing to users, but the encoded representation is typically what you would send to a web server. For example, the unencoded URL "http://bühler.example.com/List of applicants.xml" would be sent to the server as "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
A URL can also be constructed piece by piece by calling setScheme(), setUserName(), setPassword(), setHost(), setPort(), setPath(), setQuery() and setFragment(). Some convenience functions are also available: setAuthority() sets the user name, password, host and port. setUserInfo() sets the user name and password at once.
Call isValid() to check if the URL is valid. This can be done at any point during the constructing of a URL. If isValid() returns false, you should clear() the URL before proceeding, or start over by parsing a new URL with setUrl().
Constructing a query is particularly convenient through the use of the QUrlQuery
class and its methods QUrlQuery::setQueryItems(), QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use QUrlQuery::setQueryDelimiters() to customize the delimiters used for generating the query string.
For the convenience of generating encoded URL strings or query strings, there are two static functions called fromPercentEncoding() and toPercentEncoding() which deal with percent encoding and decoding of QString objects.
Calling isRelative() will tell whether or not the URL is relative. A relative URL can be resolved by passing it as argument to resolved(), which returns an absolute URL. isParentOf() is used for determining whether one URL is a parent of another.
fromLocalFile() constructs a QUrl by parsing a local file path. toLocalFile() converts a URL to a local file path.
The human readable representation of the URL is fetched with toString(). This representation is appropriate for displaying a URL to a user in unencoded form. The encoded form however, as returned by toEncoded(), is for internal use, passing to web servers, mail clients and so on. Both forms are technically correct and represent the same URL unambiguously(明确的) -- in fact, passing either form to QUrl's constructor or to setUrl() will yield the same QUrl object.
Error checking
QUrl is capable of detecting many errors in URLs while parsing it or when components of the URL are set with individual setter methods (like setScheme(), setHost() or setPath()). If the parsing or setter function is successful, any previously recorded error conditions will be discarded.
By default, QUrl setter methods operate in QUrl::TolerantMode(宽容模式), which means they accept some common mistakes and mis-representation of data. An alternate method of parsing is QUrl::StrictMode, which applies further checks. See QUrl::ParsingMode for a description of the difference of the parsing modes.
QUrl only checks for conformance with the URL specification. It does not try to verify that high-level protocol URLs are in the format they are expected to be by handlers elsewhere. For example, the following URIs are all considered valid by QUrl, even if they do not make sense when used:
- "http:/filename.html"
- "mailto://example.com"
When the parser encounters an error, it signals the event by making isValid() return false and toString() / toEncoded() return an empty string. If it is necessary to show the user the reason why the URL failed to parse, the error condition can be obtained from QUrl by calling errorString(). Note that this message is highly technical and may not make sense to end-users.
QUrl is capable of recording only one error condition. If more than one error is found, it is undefined which error is reported.
Character Conversions
Follow these rules to avoid erroneous character conversion when dealing with URLs and strings:
When creating an QString to contain a URL from a QByteArray or a char*, always use QString::fromUtf8().
enum QUrl::ParsingMode
The parsing mode controls the way QUrl parses strings.
Constant | Value | Description |
---|---|---|
QUrl::TolerantMode | 0 | QUrl will try to correct some common errors in URLs. This mode is useful for parsing URLs coming from sources not known to be strictly standards-conforming. |
QUrl::StrictMode | 1 | Only valid URLs are accepted. This mode is useful for general URL validation. |
QUrl::DecodedMode | 2 | QUrl will interpret(解释) the URL component in the fully-decoded form, where percent characters stand for themselves, not as the beginning of a percent-encoded sequence. This mode is only valid for the setters setting components of a URL; it is not permitted in the QUrl constructor, in fromEncoded() or in setUrl(). For more information on this mode, see the documentation for QUrl::FullyDecoded. |
In TolerantMode, the parser has the following behaviour:
- Spaces and "%20": unencoded space characters will be accepted and will be treated as equivalent to "%20".
- Single "%" characters: Any occurrences of a percent character "%" not followed by exactly two hexadecimal characters (e.g., "13% coverage.html") will be replaced by "%25". Note that one lone "%" character will trigger the correction mode for all percent characters.
- Reserved and unreserved(保留和未保留) characters: An encoded URL should only contain a few characters as literals; all other characters should be percent-encoded. In TolerantMode, these characters will be accepted if they are found in the URL:
space / double-quote / < / > / "" / ^ / \
/ { / | / } `
Those same characters can be decoded again by passing QUrl::DecodeReserved to toString() or toEncoded(). In the getters of individual components, those characters are often returned in decoded form.
When in StrictMode, if a parsing error is found, isValid() will return false and errorString() will return a message describing the error. If more than one error is detected, it is undefined which error gets reported.
Note that TolerantMode is not usually enough for parsing user input, which often contains more errors and expectations than the parser can deal with. When dealing with data coming directly from the user -- as opposed to data coming from data-transfer sources, such as other programs -- it is recommended to use fromUserInput().
enum QUrl::ComponentFormattingOption
The component formatting options define how the components of an URL will be formatted when written out as text. They can be combined with the options from QUrl::FormattingOptions when used in toString() and toEncoded().
Constant | Value | Description |
---|---|---|
QUrl::PrettyDecoded | 0x000000 | The component is returned in a "pretty form", with most percent-encoded characters decoded. The exact behavior of PrettyDecoded varies from component to component and may also change from Qt release to Qt release. This is the default. |
QUrl::EncodeSpaces | 0x100000 | Leave space characters in their encoded form ("%20"). |
QUrl::EncodeUnicode | 0x200000 | Leave non-US-ASCII characters encoded in their UTF-8 percent-encoded form (e.g., "%C3%A9" for the U+00E9 codepoint, LATIN SMALL LETTER E WITH ACUTE). |
QUrl::EncodeDelimiters | 0x400000 0x800000 | Leave certain delimiters in their encoded form, as would appear in the URL when the full URL is represented as text. The delimiters are affected by this option change from component to component. This flag has no effect in toString() or toEncoded(). |
QUrl::EncodeReserved | 0x1000000 | Leave US-ASCII characters not permitted(允许) in the URL by the specification in their encoded form. This is the default on toString() and toEncoded(). |
QUrl::DecodeReserved | 0x2000000 | Decode the US-ASCII characters that the URL specification does not allow to appear in the URL. This is the default on the getters of individual components. |
QUrl::FullyEncoded | EncodeSpaces EncodeUnicode EncodeDelimiters EncodeReserved | Leave all characters in their properly-encoded form, as this component would appear as part of a URL. When used with toString(), this produces a fully-compliant URL in QString form, exactly equal to the result of toEncoded() |
QUrl::FullyDecoded | FullyEncoded DecodeReserved 0x4000000 | Attempt to decode as much as possible. For individual components of the URL, this decodes every percent encoding sequence, including control characters (U+0000 to U+001F) and UTF-8 sequences found in percent-encoded form. Use of this mode may cause data loss, see below for more information. |
The values of EncodeReserved and DecodeReserved should not be used together in one call. The behavior is undefined if that happens. They are provided as separate values because the behavior of the "pretty mode" with regards to reserved characters is different on certain components and specially on the full URL.
Full decoding
The FullyDecoded mode is similar to the behavior of the functions returning QString in Qt 4.x, in that every character represents itself and never has any special meaning. This is true even for the percent character ('%'), which should be interpreted(解释) to mean a literal(文本的) percent, not the beginning of a percent-encoded sequence. The same actual character, in all other decoding modes, is represented by the sequence "%25".
Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl, care must be taken to use the QUrl::DecodedMode parameter to the setters (like setPath() and setUserName()). Failure to do so may cause re-interpretation of the percent character ('%') as the beginning of a percent-encoded sequence.
This mode is quite useful when portions of a URL are used in a non-URL context. For example, to extract the username, password or file paths in an FTP client application, the FullyDecoded mode should be used.
This mode should be used with care, since there are two conditions that cannot be reliably represented in the returned QString. They are:
-
Non-UTF-8 sequences: URLs may contain sequences of percent-encoded characters that do not form valid UTF-8 sequences. Since URLs need to be decoded using UTF-8, any decoder failure will result in the QString containing one or more replacement characters where the sequence existed.
-
Encoded delimiters: URLs are also allowed to make a distinction between a delimiter found in its literal form and its equivalent in percent-encoded form. This is most commonly found in the query, but is permitted in most parts of the URL.
The following example illustrates(阐明) the problem:
QUrl original("http://example.com/?q=a%2B%3Db%26c");
QUrl copy(original);
copy.setQuery(copy.query(QUrl::FullyDecoded), QUrl::DecodedMode);
qDebug() << original.toString(); // prints: http://example.com/?q=a%2B%3Db%26c
qDebug() << copy.toString(); // prints: http://example.com/?q=a+=b&c
If the two URLs were used via HTTP GET, the interpretation by the web server would probably be different. In the first case, it would interpret as one parameter, with a key of "q" and value "a+=b&c". In the second case, it would probably interpret as two parameters, one with a key of "q" and value "a =b", and the second with a key "c" and no value.
Other Function
QUrl::QUrl(const QString &url, ParsingMode parsingMode = TolerantMode)
Constructs a URL by parsing url. QUrl will automatically percent encode all characters that are not allowed in a URL and decode the percent-encoded sequences that represent an unreserved character (letters, digits, hyphens(连字符), undercores(下划线), dots and tildes(波浪线)). All other characters are left in their original forms.
Parses the url using the parser mode parsingMode. In TolerantMode (the default), QUrl will correct certain mistakes, notably the presence of a percent character ('%') not followed by two hexadecimal digits(两个十六进制数), and it will accept any character in any position. In StrictMode, encoding mistakes will not be tolerated and QUrl will also check that certain forbidden characters are not present in unencoded form. If an error is detected in StrictMode, isValid() will return false. The parsing mode DecodedMode is not permitted in this context.
Example:
QUrl url("http://www.example.com/List of holidays.xml");
// url.toEncoded() == "http://www.example.com/List%20of%20holidays.xml"
To construct a URL from an encoded string, you can also use fromEncoded():
QUrl url = QUrl::fromEncoded("http://qt-project.org/List%20of%20holidays.xml");
Both functions are equivalent and, in Qt 5, both functions accept encoded data. Usually, the choice of the QUrl constructor or setUrl() versus fromEncoded() will depend on the source data: the constructor and setUrl() take a QString, whereas fromEncoded takes a QByteArray.
QString QUrl::fileName(ComponentFormattingOptions options = FullyDecoded) const
Returns the name of the file, excluding the directory path.
Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.
If the path doesn't contain any slash, it is fully returned as the fileName.
Example:
QUrl url("http://qt-project.org/support/file.html");
// url.adjusted(RemoveFilename) == "http://qt-project.org/support/"
// url.fileName() == "file.html"
The options argument controls how to format the file name component. All values produce an unambiguous result. With QUrl::FullyDecoded, all percent-encoded sequences are decoded; otherwise, the returned value may contain some percent-encoded sequences for some control sequences not representable in decoded form in QString.
QUrl QUrl::resolved(const QUrl &relative) const
Returns the result of the merge of this URL with relative. This URL is used as a base to convert relative to an absolute URL.
If relative is not a relative URL, this function will return relative directly. Otherwise, the paths of the two URLs are merged, and the new URL returned has the scheme and authority of the base URL, but with the merged path, as in the following example:
QUrl baseUrl("http://qt.digia.com/Support/");
QUrl relativeUrl("../Product/Library/");
qDebug(baseUrl.resolved(relativeUrl).toString());
// prints "http://qt.digia.com/Product/Library/"
Calling resolved() with ".." returns a QUrl whose directory is one level higher than the original. Similarly, calling resolved() with "../.." removes two levels from the path. If relative is "/", the path becomes "/".
void QUrl::setAuthority(const QString &authority, ParsingMode mode = TolerantMode)
Sets the authority of the URL to authority.
The authority of a URL is the combination of user info, a host name and a port. All of these elements are optional; an empty authority is therefore valid.
The user info and host are separated by a '@', and the host and port are separated by a ':'. If the user info is empty, the '@' must be omitted; although a stray ':' is permitted if the port is empty.
The following example shows a valid authority string:
The authority data is interpreted according to mode: in StrictMode, any '%' characters must be followed by exactly two hexadecimal characters and some characters (including space) are not allowed in undecoded form. In TolerantMode (the default), all characters are accepted in undecoded form and the tolerant parser will correct stray '%' not followed by two hex characters.
This function does not allow mode to be QUrl::DecodedMode. To set fully decoded data, call setUserName(), setPassword(), setHost() and setPort() individually.
void QUrl::setFragment(const QString &fragment, ParsingMode mode = TolerantMode)
Sets the fragment of the URL to fragment. The fragment is the last part of the URL, represented by a '#' followed by a string of characters. It is typically used in HTTP for referring to a certain link or point on a page:
The fragment is sometimes also referred to as the URL "reference".
Passing an argument of QString() (a null QString) will unset the fragment. Passing an argument of QString("") (an empty but not null QString) will set the fragment to an empty string (as if the original URL had a lone "#").
The fragment data is interpreted according to mode: in StrictMode, any '%' characters must be followed by exactly two hexadecimal characters and some characters (including space) are not allowed in undecoded form. In TolerantMode, all characters are accepted in undecoded form and the tolerant parser will correct stray '%' not followed by two hex characters. In DecodedMode, '%' stand for themselves and encoded characters are not possible.
QUrl::DecodedMode should be used when setting the fragment from a data source which is not a URL or with a fragment obtained by calling fragment() with the QUrl::FullyDecoded formatting option.
void QUrl::setPath(const QString &path, ParsingMode mode = DecodedMode)
Sets the path of the URL to path. The path is the part of the URL that comes after the authority but before the query string.
For non-hierarchical schemes, the path will be everything following the scheme declaration, as in the following example:
void QUrl::setScheme(const QString &scheme)
Sets the scheme of the URL to scheme. As a scheme can only contain ASCII characters, no conversion or decoding is done on the input. It must also start with an ASCII letter.
The scheme describes the type (or protocol) of the URL. It's represented by one or more ASCII characters at the start the URL.
A scheme is strictly RFC 3986-compliant:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
The following example shows a URL where the scheme is "ftp":
To set the scheme, the following call is used:
QUrl url;
url.setScheme("ftp");
The scheme can also be empty, in which case the URL is interpreted as relative.
void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode = TolerantMode)
Sets the user info of the URL to userInfo. The user info is an optional part of the authority of the URL, as described in setAuthority().
The user info consists of a user name and optionally a password, separated by a ':'. If the password is empty, the colon must be omitted. The following example shows a valid user info string:
QString QUrl::topLevelDomain(ComponentFormattingOptions options = FullyDecoded) const
Returns the TLD (Top-Level Domain) of the URL, (e.g. .co.uk, .net). Note that the return value is prefixed with a '.' unless the URL does not contain a valid TLD, in which case the function returns an empty string.
Note that this function considers a TLD to be any domain that allows users to register subdomains under, including many home, dynamic DNS websites and blogging providers. This is useful for determining whether two websites belong to the same infrastructure and communication should be allowed, such as browser cookies: two domains should be considered part of the same website if they share at least one label in addition to the value returned by this function.
- foo.co.uk and foo.com do not share a top-level domain
- foo.co.uk and bar.co.uk share the .co.uk domain, but the next label is different
- www.foo.co.uk and ftp.foo.co.uk share the same top-level domain and one more label, so they are considered part of the same site
If options includes EncodeUnicode, the returned string will be in ASCII Compatible Encoding.
QUrlQuery
Detailed Description
The QUrlQuery class provides a way to manipulate a key-value pairs in a URL's query.
It is used to parse the query strings found in URLs like the following:
Query strings like the above are used to transmit options in the URL and are usually decoded into multiple key-value pairs. The one above would contain two entries in its list, with keys "type" and "color". QUrlQuery can also be used to create a query string suitable for use in QUrl::setQuery() from the individual components of the query.
The most common way of parsing a query string is to initialize it in the constructor by passing it the query string. Otherwise, the setQuery() method can be used to set the query to be parsed. That method can also be used to parse a query with non-standard delimiters, after having set them using the setQueryDelimiters() function.
The encoded query string can be obtained again using query(). This will take all the internally-stored items and encode the string using the delimiters.
Encoding
All of the getter methods in QUrlQuery support an optional parameter of type QUrl::ComponentFormattingOptions, including query(), which dictate how to encode the data in question. Except for QUrl::FullyDecoded, the returned value must still be considered a percent-encoded string, as there are certain values which cannot be expressed in decoded form (like control characters, byte sequences not decodable to UTF-8). For that reason, the percent character is always represented by the string "%25".
Handling of spaces and plus ("+")
Web browsers usually encode spaces found in HTML FORM elements to a plus sign ("+") and plus signs to its percent-encoded form (%2B). However, the Internet specifications governing URLs do not consider spaces and the plus character equivalent.
For that reason, QUrlQuery never encodes the space character to "+" and will never decode "+" to a space character. Instead, space characters will be rendered "%20" in encoded form.
To support encoding like that of HTML forms, QUrlQuery also never decodes the "%2B" sequence to a plus sign nor encode a plus sign. In fact, any "%2B" or "+" sequences found in the keys, values, or query string are left exactly like written (except for the uppercasing of "%2b" to "%2B").
Full decoding
With QUrl::FullyDecoded formatting, all percent-encoded sequences will be decoded fully and the '%' character is used to represent itself. QUrl::FullyDecoded should be used with care, since it may cause data loss. See the documentation of QUrl::FullyDecoded for information on what data may be lost.
This formatting mode should be used only when dealing with text presented to the user in contexts where percent-encoding is not desired. Note that QUrlQuery setters and query methods do not support the counterpart(副本) QUrl::DecodedMode parsing, so using QUrl::FullyDecoded to obtain a listing of keys may result in keys not found in the object.
Non-standard delimiters
By default, QUrlQuery uses an equal sign ("=") to separate a key from its value, and an ampersand ("&") to separate key-value pairs from each other. It is possible to change the delimiters that QUrlQuery uses for parsing and for reconstructing the query by calling setQueryDelimiters().
Non-standard delimiters should be chosen from among what RFC 3986 calls "sub-delimiters". They are:
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Use of other characters is not supported and may result in unexpected behaviour. QUrlQuery does not verify that you passed a valid delimiter.
Other Function
void QUrlQuery::setQueryDelimiters(QChar valueDelimiter, QChar pairDelimiter)
Sets the characters used for delimiting between keys and values, and between key-value pairs in the URL's query string. The default value delimiter is '=' and the default pair delimiter is '&'.
valueDelimiter will be used for separating keys from values, and pairDelimiter will be used to separate key-value pairs. Any occurrences of these delimiting characters in the encoded representation of the keys and values of the query string are percent encoded when returned in query().
If valueDelimiter is set to '(' and pairDelimiter is ')', the above query string would instead be represented like this:
http://www.example.com/cgi-bin/drawgraph.cgi?type(pie)color(green)
Note: Non-standard delimiters should be chosen from among what RFC 3986 calls "sub-delimiters". They are:
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Use of other characters is not supported and may result in unexpected behaviour. This method does not verify that you passed a valid delimiter.
版权声明:本博文属于作者原创或从其他地方学习而来的博文,未经许可不得转载.