IIS7 URL Rewrite 2.0 模块配置引用
Functionality Overview
Microsoft URL Rewrite Module 2.0 for IIS is an incremental release that includes all the features from version 1.1, and adds support for response headers and content rewriting. The module applies regular expressions or wildcards pattern to the HTTP response to locate and replace the content parts based on the rewriting logic expressed by outbound rewrite rules. More specifically, the module can be used to:
- Replace the URLs generated by a web application in the response HTML with a more user friendly and search engine friendly equivalent.
- Modify the links in the HTML markup generated by a web application behind a reverse proxy.
- Modify the existing and set new response HTTP headers.
- Fix up the content of any HTTP response, including JavaScript, CSS, RSS, etc.
WARNING: When response headers or the response content is modified by an outbound rewrite rule an extra caution should be taken to ensure that the text which gets inserted into the response does not contain any client side executable code, which can result in cross-site scripting vulnerabilities. This is especially important when rewrite rule uses un-trusted data, such as HTTP headers or the query string, to build the string that will be inserted into the HTTP response. In such cases the replacement string should be HTML encoded by using the HtmlEncode function, e.g:
<action type="Rewrite" value="{HtmlEncode:{HTTP_REFERER}}" />
Outbound Rules Overview
The main configuration concept used for response rewriting is the concept of an outbound rule. An outbound rule is used to express the logic of what to compare or match the response content with and what to do if the comparison was successful.
Conceptually, an outbound rule consists of the following parts:
- Pre-condition - The optional pre-condition is used to to check the request metadata before any rules evaluation begins. Pre-condition may consist of several conditional checks against request metadata and it can be used to filter out responses that should not be rewritten, for example, images or video files.
- Tag filters - The tag filters are used to narrow down the search within the response to a set of well known or custom defined tags. With tag filters only the content of specified tags is matched against the rule pattern, as opposed to matching the entire response content against the pattern.
- Pattern – The rule pattern is used to specify either the regular expression or a wildcard pattern that will be used for searching within the response content.
- Conditions – The optional conditions collection is used to specify additional logical operations to perform if a pattern match has been found within the response content. Within the conditions you can check for certain values of HTTP headers or server variables.
- Action – The action is used to specify what to do if the pattern match has been found and all the rule conditions were evaluated successfully.
Rules Execution
The process of executing outbound rules is different from the one used for inbound rules. The inbound ruleset is evaluated only once per request because its input is just a single request URL string. Outbound ruleset may be evaluated many times per response as it is being applied in multiple places within HTTP response content. For example, if there is a ruleset as below:
Rule 1: applies to <a> tag and <img> tag
Rule 2: applies to <a> tag
and the HTML response contains this markup:
<a href="/default.aspx"><img src="/logo.jpg" />Home Page</a>
Then URL Rewrite Module 2.0 will evaluate Rule 1 against "/default.aspx" string. If rule was executed successfully, then the output of Rule 1 will be given to Rule2. If Rule 2 was executed successfully, then the output of Rule 2 will be used to replace the content of the href attribute in the <a> tag in the response.
After that URL Rewrite Module 2.0 will evaluate Rule1 against the "/logo.jpg" string. If rule was executed successfully then its output will be used to replace the content of the src attribute in the <img> tag in the response.
Rules Inheritance
If rules are defined on multiple configuration levels, then URL rewrite module evaluates the rule set that includes distributed rules from parent configuration levels as well as rules from current configuration level. The evaluation is performed in a parent-to-child order, which means that parent rules are evaluated first and the rules defined on a last child level are evaluated last.
Outbound Rule Configuration
Pre-conditions Collection
Pre-conditions are used to check if a rule should be evaluated against a response content. Pre-conditions collection is defined as a named collection within <preConditions> section and it may contain one or more pre-condition checks. The outbound rule references the pre-conditions collection by name.
A pre-conditions collection has an attribute called logicalGrouping that controls how conditions are evaluated. A pre-conditions collection evaluates to true if:
- All pre-conditions within were evaluated to true, provided that logicalGrouping=“MatchAll” was used.
- At least one of the pre-conditions was evaluated to true, provided that logicalGrouping=”MatchAny” was used.
A pre-condition is defined by specifying the following properties:
- Input string - Pre-condition input specifies which item to use as an input for the condition evaluation. Pre-condition input is an arbitrary string that can include server variables and back-references to prior pre-condition patterns.
- Pattern - Pre-condition pattern can be specified by using either regular expression syntax or by using wildcard syntax. The type of pattern to use in a pre-condition depends on the value of the patternSyntax flag defined for the pre-condition collection.
In addition, the result of the pre-condition evaluation can be negated by using the negate attribute.
An example of a pre-condition that checks if the response content type is text/html:
<preConditions>
<preCondition name="IsHTML">
<add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" />
</preCondition>
</preConditions>
Tag Filters
Tag filters are used to narrow down the search within the response content to a set of well known or custom defined HTML tags. When a rewrite rule uses tag filters then, instead of matching the rule pattern against the entire response, URL Rewrite Module 2.0 looks for an HTML tags that are listed in the rule's tag filter and then takes the content of the URL attribute of that tag and evaluates it against the rule's pattern. Tag filters are specified within the filterByTags attribute of the <match> element of an outbound rule. For example:
<match filterByTags="A" pattern="^/(article\.aspx.*)" />
If an HTTP response contains an anchor tag such as:
<a href="/article.aspx?id=1">link</a>
Then the rewrite rule pattern will be evaluated against the string: "/article.aspx?id=1".
Pre-defined tags
URL Rewrite Module 2.0 includes a set of pre-defined tags that can be used with outbound rules. The table below lists all the pre-defined tags and the attributes, whose values will be used as an input for outbound rule pattern:
Tag | Attributes |
---|---|
A | href |
Area | href |
Base | href |
Form | action |
Frame | src, longdesc |
Head | profile |
IFrame | src, longdesc |
Img | src, longdesc, usemap |
Input | src, usemap |
Link | href |
Script | src |
Custom Tags
If rewriting needs to be performed within an attribute of a tag that is not included in the pre-defined tags collection, then a custom tag collection can be used to specify the tag name and the corresponding attribute that needs to be rewritten. Custom tags collection is defined as a named collection within the <customTags> section. Outbound rule references a custom tags collection by name.
The following example shows a definition of a custom tags collection:
<customTags>
<tags name="My Tags">
<tag name="item" attribute="src" />
<tag name="element" attribute="src" />
</tags>
</customTags>
This custom tags collection can be referenced from an outbound rule as shown in the example below:
<match filterByTags="A, CustomTags" customTags="My Tags" pattern="^/(article\.aspx.*)" />
Rule Pattern
A rule pattern is used to specify what the rule input string should be matched to. Rule input differs based on the rule configuration:
- If rule uses tag filters then the content of the matched tag attributed will be passed as an input for the pattern matching.
- If rule does not use any tag filters then the entire response content will be passed as an input for the pattern matching.
Pattern is specified within a <match> element of a rewrite rule.
Full response pattern matching
When filterByTags attribute is not specified in the match element of the rule then the pattern will be applied on the entire response content. Evaluation of regular expression patterns on the entire response content is a CPU intensive operation and may affect the performance of the web application. There are several options to reduce the performance overhead introduced by the full response pattern matching:
- Use the IIS user mode caching and set the rewriteBeforeCache attribute of <outboundRules> element to true:
<outboundRules rewriteBeforeCache="true">
Note that this setting should not be used if the chunked transfer encoding is used for responses. - Use the occurrences attribute of the match element of the rule. For example, when you use a rule to insert some HTML fragment into the <head> element and that rule has a pattern that searches for the closing tag - </head>, then you can set occurrences="1". This will tell the rewrite module to stop searching the remainder of the response after the </head> tag was found.
<match pattern="</head>" occurrences="1" />
Rule Pattern Syntax
Rule pattern syntax can be specified by using the patternSyntax attribute of a rule. This attribute can be set to one of the following options:
ECMAScript – Perl compatible (ECMAScript standard compliant) regular expression syntax. This is a default option for any rule. This is an example of the pattern format: ”^([_0-9a-zA-Z-]+/)?(wp-.*)”
Wildcard – Wildcard syntax used in IIS HTTP redirection module. This is an example of pattern in this format: “/Scripts/*.js”, where asterisk (“*”) means “match any number of any characters and capture them in a back-reference”. Note that wildcard pattern type cannot be used when rule does not have any tag filters.
ExactMatch - exact string search is performed within the input string.
The scope of the patternSyntax attribute is per rule, meaning that it applies to the current rule’s pattern and to all patterns used within conditions of that rule.
Rule pattern properties
Pattern can be negated by using the negate attribute of the <match> element. When this attribute is used then the rule action will be performed only if the input string does NOT match the specified pattern.
By default, case insensitive pattern match is used. To enable case sensitivity you can use the ignoreCase attribute of the <match> element of the rule.
Rule Conditions
Rule conditions allow defining additional logic for rule evaluation, which can be based on inputs other than just a current input string. Any rule can have zero or more conditions. Rule conditions are evaluated after the rule pattern match is successful.
Conditions are defined within a <conditions> collection of a rewrite rule. This collection has an attribute called logicalGrouping that controls how conditions are evaluated. If a rule has conditions, then the rule action will be performed only if rule pattern is matched and:
- All conditions were evaluated to true, provided that logicalGrouping=“MatchAll” was used.
- At least one of the conditions was evaluated to true, provided that logicalGrouping=”MatchAny” was used.
A condition is defined by specifying the following properties:
- Input string - Condition input specifies which item to use as an input for the condition evaluation. Condition input is an arbitrary string that can include server variables and back-references to prior condition patterns and/or to rule patterns.
- Pattern – A pattern to look for in the condition input. A pattern can be specified by using either regular expression syntax or by using wildcard syntax. The type of pattern to use in a condition depends on the value of the patternSyntax flag defined for the rule to which this condition belongs. This condition type has two related attributes that control pattern matching:
Rule Action
A rewrite rule action is performed when the input string matches the rule pattern and the condition evaluation has succeeded ( depending on rule configuration, either all conditions matched or any one or more of the conditions matched). There are two types of actions available and the “type” attribute of the <action> configuration element can be used to specify which action the rule has to perform. The following sections describe different action types and the configuration options related to specific action types.
Rewrite Action
Rewrite action replaces the current rule input string with a substitution string. The substitution string is specified within the value attribute of the <action> element of the rule. Substitution string is a free form string that can include the following:
- Back-references to the condition and rule patterns. (For more information, see the section about how to use back-references.)
- Server variables. (For more information, see the section about how to use server variables.)
None Action
None action is used to specify that no action should be performed.
Accessing Response Headers from Rewrite Rules
The content of any response HTTP header can be obtained from within a rewrite rule by using the same syntax as for accessing server variables, but with a special naming convention. If a server variable starts with "RESPONSE_", then it stores the content of an HTTP response header whose name is determined by using the following naming convention:
- All underscore (“_”) symbols in the name are converted to dash symbols (“-”).
- “RESPONSE_” prefix is removed
For example the following pre-condition is used to evaluate the content of the content-type header:
<preCondition name="IsHTML">
<add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" />
</preCondition>
Setting Request Headers and Server Variables
Inbound rewrite rules in URL Rewrite Module 2.0 can be used to set request headers and server variables.
Allowed Server Variables List
Global rewrite rules can be used to set any request headers and server variables, as well as overwrite any existing ones. Distributed rewrite rules can only set/overwrite the request headers and server variables that are defined in the allowed list for server variables <allowedServerVariables>. If a distributed rewrite rule attempts to set any server variable or an HTTP header that is not listed in the <allowedServerVariables> collection a runtime error will be generated by URL Rewrite Module. The <allowedServerVariables> collection by default is stored in applicationHost.config file and can be modified only by an IIS server administrator.
Using Inbound Rewrite Rules to Set Request Headers and Server Variables
A rule element <serverVariables> is used to define a collection of server variables and http headers to set. Those will be set only if the rule pattern has matched and the condition evaluation has succeeded (depending on rule configuration, either all conditions matched or any one or more of the conditions matched). Each item in the <serverVariables> collection consists of the following:
- Name - specifies the name of the server variable to set.
- Value - specifies the value of the server variable. Value is a free form string that can include:
- Back-references to the condition and rule patterns. (For more information, see the section about how to use back-references.)
- Server variables. (For more information, see the section about how to use server variables.)
- Replace flag - specifies whether to overwrite the value of the server variable if it already exists. By default the replace functionality is enabled.
The following example rule rewrites the requested URL and also sets the server variable with name X_REQUESTED_URL_PATH:
<rule name="Rewrite to index.php" stopProcessing="true">
<match url="(.*)\.htm$" />
<serverVariables>
<set name="X_REQUESTED_URL_PATH" value="{R:1}" />
</serverVariables>
<action type="Rewrite" url="index.php?path={R:1}" />
</rule>
Note: for the above example to work it is required to add X_REQUESTED_URL_PATH to the <allowedServerVariables> collection:
<rewrite>
<allowedServerVariables>
<add name="X_REQUESTED_URL_PATH" />
</allowedServerVariables>
</rewrite>
Note About Request Headers
The request headers are set by using the same mechanism as for server variables, but with a special naming convention. If a server variable name in the <serverVariables> collection starts with "HTTP_" then this results in an HTTP request header being set in accordance to the following naming convention:
- All underscore (“_”) symbols in the name are converted to dash symbols (“-”).
- All letters are converted to lower case.
- “HTTP_” prefix is removed
For example the following configuration is used to sets the custom x-original-host header on the request:
<set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
Setting Response Headers
Outbound rewrite rules in URL Rewrite Module 2.0 can be used to set new or modify existing response HTTP headers. The response HTTP headers are accessed within the outbound rules by using the same syntax as for server variables and by using the naming convention as described in Accessing Response Headers from Rewrite Rules.
If the serverVariable attribute of the <match> element of an outbound rewrite rule has a value then it indicates that this rewrite rule will operate on the content of the corresponding response header. For example, the following rule sets the response header "x-custom-header":
<outboundRules>
<rule name="Set Custom Header">
<match serverVariable="RESPONSE_X_Custom_Header" pattern="^$" />
<action type="Rewrite" value="Something" />
</rule>
</outboundRules>
The pattern of the rewrite rule will be applied on the content of the specified response header and if the rule's pattern and optional conditions evaluates successfully then the value of that response header will be rewritten.
The regular expression patterns and easy access to existing request and response headers within a rewrite rule provides a lot of flexibility when defining a logic for rewriting response HTTP headers. For example the following rewrite rule can be used to modify the content of the Location header in redirection responses:
<outboundRules>
<!-- This rule changes the domain in the HTTP location header for redirection responses -->
<rule name="Change Location Header">
<match serverVariable="RESPONSE_LOCATION" pattern="^http://[^/]+/(.*)" />
<conditions>
<add input="{RESPONSE_STATUS}" pattern="^301" />
</conditions>
<action type="Rewrite" value="http://{HTTP_HOST}/{R:1}"/>
</rule>
</outboundRules>
Using Back-references in Rewrite Rules
Parts of rules or conditions inputs can be captures in back-references. These can be then used to construct substitution URLs within rules actions or to construct input strings for rule conditions.
Back-references are generated in different ways, depending on which kind of pattern syntax is used for the rule. When ECMAScript pattern syntax is used, a back-reference can be created by putting parenthesis around the part of the pattern that must capture the back-reference. For example, the pattern ([0-9]+)/([a-z]+)\.html will capture 07 and article in back-references from this string: 07/article.html. When “Wildcard” pattern syntax is used, the back-references are always created when an asterisk symbol (*) is used in the pattern.
Usage of back-references is the same regardless of which pattern syntax was used to capture them. Back-references can be used in the following locations within rewrite rules:
- In condition input string
- In rule action, specifically:
- url attribute of Rewrite and Redirect action in inbound rules
- value attribute of Rewrite action in outbound rules
- statusLine and responseLine of CustomResponse action
- In key parameter to the rewrite map
Back-references to condition patterns are identified by {C:N} where N is from 0 to 9; back-references to rule pattern are identified by {R:N} where N is from 0 to 9. Note that for both types of back-references, {R:0} and {C:0}, will contain the matched string.
For example in this pattern:
^(www\.)(.*)$
For the string: www.foo.com the back-references will be indexed as follows:
{C:0} - www.foo.com
{C:1} - www.
{C:2} - foo.com
Tracking Capture Groups Across Conditions
By default, within a rule action, you can use the back-references to the rule pattern and to the last matched condition of that rule. For example, in this rule:
<rule name="Back-references with trackAllCaptures set to false">
<match url=”^article\.aspx” >
<conditions>
<add input="{QUERY_STRING}" pattern="p1=([0-9]+)" />
<add input="{QUERY_STRING}" pattern="p2=([a-z]+)" />
</conditions>
<action type=”Rewrite” url="article.aspx/{C:1}" /> <!-- rewrite action uses back-references to the last matched condition -->
</rule>
The back-reference {C:1} will always contain the value of the capture group from the second condition, which will be the value of query string parameter p2. The value of p1 will not be available as a back-reference.
In URL Rewrite Module 2.0, it is possible to change how capture groups are indexed. Enabling trackAllCaptures setting to on the <conditions> collection makes the capture groups form all matched conditions to be available through the back-references. For example, in this rule:
<rule name="Back-references with trackAllCaptures set to true">
<match url=”^article\.aspx” >
<conditions trackAllCaptures="true">
<add input="{QUERY_STRING}" pattern="p1=([0-9]+)" />
<add input="{QUERY_STRING}" pattern="p2=([a-z]+)" />
</conditions>
<action type=”Rewrite” url="article.aspx/{C:1}/{C:2}" /> <!-- rewrite action uses back-references to both conditions -->
</rule>
The back-reference {C:1} will contain the value of the capture group from the first condition, and the back-reference {C:2} will contain the value of the capture group from the second condition.
When trackAllCaptures is set to true, the condition capture back-references are identified by {C:N}, where N is from 0 to the total number of capture groups across all the rule's conditions. {C:0} contains the entire matched string from the first matched condition. For example for these two conditions:
<conditions trackAllCaptures="true">
<add input="{REQUEST_URI}" pattern="^/([a-zA-Z]+)/([0-9]+)/$" />
<add input="{QUERY_STRING}" pattern="p2=([a-z]+)" />
</conditions>
If {REQUEST_URI} contains "/article/23/" and {QUERY_STRING} contains "p1=123&p2=abc" then the condition back-references will be indexed as follows:
{C:0} - "/article/23/"
{C:1} - "article"
{C:2} - "23"
{C:3} - "abc"
Logging of Rewritten URLs into IIS logs
A distributed inbound rewrite rule can be configured to log rewritten URLs into the IIS log files, instead of logging the original URLs requested by HTTP client. To enable logging of rewritten URLs use the logRewrittenUrl attribute of the rule's <action> element, e.g:
<rule name="set server variables">
<match url="^article/(\d+)$" />
<action type="Rewrite" url="article.aspx?id={R:1}" logRewrittenUrl="true" />
</rule>
地址:
http://www.iis.net/learn/extensions/url-rewrite-module/url-rewrite-module-20-configuration-reference