Nginx - Rewrite Module
Initially, the purpose of this module (as the name suggests) is to perform URL rewriting. This mechanism allows you to get rid of ugly URLs containing multiple parameters, for instance, http://example.com/article. php?id=1234&comment=32 — such URLs being particularly uninformative and meaningless for a regular visitor. Instead, links to your website will contain useful information that indicate the nature of the page you are about to visit. The URL given in the example becomes http://website.com/article-1234-32-USeconomy-strengthens.html. This solution is not only more interesting for your visitors, but also for search engines — URL rewriting is a key element to Search Engine Optimization (SEO).
The principle behind this mechanism is simple — it consists of rewriting the URI of the client request after it is received, before serving the file. Once rewritten, the URI is matched against location blocks in order to find the configuration that should be applied to the request. The technique is further detailed in the coming sections.
Reminder on Regular Expressions
First and foremost, this module requires a certain understanding of regular expressions, also known as regexes or regexps. Indeed, URL rewriting is performed by the rewrite directive, which accepts a pattern followed by the replacement URI.
Purpose
The first question we must answer is: What's the purpose of regular expressions? To put it simply, the main purpose is to verify that a string matches a pattern. The said pattern is written in a particular language that allows defining extremely complex and accurate rules.
String | Pattern | Matches? | Explanation |
hello | ^hello$ | Yes | The string begins by character h (^h), followed by e, l, l, and then finishes by o (o$). |
hell | ^hello$ | No | The string begins by character h (^h), followed by e, l, l but does not finish by o. |
Hello | ^hello$ | Depends | If the engine performing the match is casesensitive, the string doesn't match the pattern. |
This concept becomes a lot more interesting when complex patterns are employed, such as one that validate an e-mail addresses: ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$. Validating the well-forming of an e-mail address programmatically would require a great deal of code, while all of the work can be done with a single regular expression pattern matching.
PCRE Syntax
The syntax that Nginx employs originates from the Perl Compatible Regular Expression (PCRE) library. It's the most commonly used form of regular expression, and nearly everything you learn here remains valid for other language variations.
In its simplest form, a pattern is composed of one character, for example, x. We can match strings against this pattern. Does example match the pattern x? Yes, example contains the character x. It can be more than one specific character — the pattern [a-z] matches any character between a and z, or even a combination of letters and digits: [a-z0-9]. In consequence, the pattern hell[a-z0-9] validates the following strings: hello and hell4, but not hell or hell!.
You probably noticed that we employed the characters [ and ]. These are called metacharacters and have a special effect on the pattern. There are a total of 11 metacharacters, and all play a different role. If you want to actually create a pattern containing one of these characters, you need to escape them with the \ character.
Metacharacter | Description |
^ |
The entity after this character must be found at the beginning. Example pattern: ^h Matching strings: hello, h, hh Non-matching strings: character, ssh |
$ End |
The entity before this character must be found at the end. Example pattern: e$ Matching strings: sample, e, file Non-matching strings: extra, shell |
. Any |
Matches any character. Example pattern: hell. Matching strings: hello, hellx, hell5, hell! Non-matching strings: hell, helo |
[ ] Set |
Matches any character within the specified set. Syntax: [a-z] for a range, [abcd] for a set, and [a-z0-9] for two ranges. Note that if you want to include the – character in a range, you need to insert it right after the [ or just before the ]. Example pattern: hell[a-y123-] Matching strings: hello, hell1, hell2, hell3, hell Non-matching strings: hellz, hell4, heloo, he-llo |
[^ ] Negate set |
Matches any character that is not within the specified set. Example pattern: hell[^a-np-z0-9] Matching strings: hello, hell; Non-matching strings: hella, hell5 |
| Alternation |
Matches the entity placed either before or after the |. Example pattern: hello|welcome Matching strings: hello, welcome, helloes, awelcome Non-matching strings: hell, ellow, owelcom |
( ) Grouping |
Groups a set of entities, often to be used in conjunction with |. Example pattern: ^(hello|hi) there$ Matching strings: hello there, hi there. Non-matching strings: hey there, ahoy there |
\ Escape |
Allows you to escape special characters. Example pattern: Hello\. Matching strings: Hello., Hello. How are you?, Hi! Hello... Non-matching strings: Hello, Hello, how are you? |
Quantifiers
So far, you are able to express simple patterns with a limited number of characters. Quantifiers allow you to extend the amount of accepted entities:
Quantifier | Description |
* 0 or more times |
The entity preceding * must be found 0 or more times. Example pattern: he*llo Matching strings: hllo, hello, heeeello Non-matching strings: hallo, ello |
+ 1 or more times |
The entity preceding + must be found 1 or more times. Example pattern: he+llo Matching strings: hello, heeeello Non-matching strings: hllo, helo |
? 0 or 1 time |
The entity preceding ? must be found 0 or 1 time. Example pattern: he?llo Matching strings: hello, hllo Non-matching strings: heello, heeeello |
{x} x times |
The entity preceding {x} must be found x times. Example pattern: he{3}llo Matching strings: heeello, oh heeello there! Non-matching strings: hello, heello, heeeello |
{x,} At least x times |
The entity preceding {x,} must be found at least x times. Example pattern: he{3,}llo Matching strings: heeello, heeeeeeello Non-matching strings: hllo, hello, heello |
{x,y} x to y times |
The entity preceding {x,y} must be found between x and y times. Example pattern: he{2,4}llo Matching strings: heello, heeello, heeeello Non-matching strings: hello, heeeeello |
As you probably noticed, the { and } characters in the regular expressions conflict with the block delimiter of the Nginx configuration file syntax language. If you want to write a regular expression pattern that includes curly brackets, you need to place the pattern between quotes (single or double quotes):
rewrite hel{2,}o /hello.php; # invalid
rewrite "hel{2,}o" /hello.php; # valid
rewrite 'hel{2,}o' /hello.php; # valid
Captures
One last feature of the regular expression mechanism is the ability to capture sub-expressions. Whatever text is placed between parentheses ( ) is captured and can be used after the matching process.
Here are a couple of examples to illustrate the principle:
Pattern | String | Captured |
^(hello|hi) (sir|mister)$ | hello sir |
$1 = hello |
^(hello (sir))$ | hello sir |
$1 = hello sir |
^(.*)$ | nginx rocks | $1 = nginx rocks |
^(.{1,3})([0-9]{1,4})([?!]{1,2})$ | abc1234!? |
$1 = abc |
Named captures are also supported: ^/(?<folder>[^/]*)/(?<file>.*)$ | /admin/doc $folder = admin | $file = doc |
When you use a regular expression in Nginx, for example, in the context of a location block, the buffers that you capture can be employed in later directives:
server {
server_name website.com;
location ~* ^/(downloads|files)/(.*)$ {
add_header Capture1 $1;
add_header Capture2 $2;
}
}
In the preceding example, the location block will match the request URI against a regular expression. A couple of URIs that would apply here: /downloads/file.txt, /files/archive.zip, or even /files/docs/report.doc. Two parts are captured: $1 will contain either downloads or files and $2 will contain whatever comes after /downloads/ or /files/. Note that the add_header directive is employed here to append arbitrary headers to the client response for the sole purpose of demonstration.
Internal requests
Nginx differentiates external and internal requests. External requests directly originate from the client; the URI is then matched against possible location blocks:
server {
server_name website.com;
location = /document.html {
deny all; # example directive
}
}
A client request to http://website.com/document.html would directly fall into the above location block.
Opposite to this, internal requests are triggered by Nginx via specific directives. In default Nginx modules, there are several directives capable of producing internal requests: error_page, index, rewrite, try_files, add_before_body, add_after_body, the include SSI command, and more.
There are two different kinds of internal requests:
- Internal redirects: Nginx redirects the client requests internally. The URI is changed, and the request may therefore match another location block and become eligible for different settings. The most common case of internal redirects is when using the Rewrite directive, which allows you to rewrite the request URI.
- Sub-requests: Additional requests that are triggered internally to generate content that is complementary to the main request. A simple example would be with the Addition module. The add_after_body directive allows you to specify a URI that will be processed after the original one, the resulting content being appended to the body of the original request. The SSI module also makes use of sub-requests to insert content with the include command.
error_page
Detailed in the module directives of the Nginx HTTP Core module, error_page allows you to define the server behavior when a specific error code occurs. The simplest form is to affect a URI to an error code:
server {
server_name website.com;
error_page 403 /errors/forbidden.html;
error_page 404 /errors/not_found.html;
}
When a client attempts to access a URI that triggers one of these errors, Nginx is supposed to serve the page corresponding to the error code. In fact, it does not just send the client the error page — it actually initiates a completely new request based on the new URI.
Consequently, you can end up falling back on a different configuration, like in the following example:
server {
server_name website.com;
root /var/www/vhosts/website.com/httpdocs/;
error_page 404 /errors/404.html;
location /errors/ {
alias /var/www/common/errors/;
internal;
}
}
When a client attempts to load a document that does not exist, they will initially receive a 404 error. We employed the error_page directive to specify that 404 errors should create an internal redirect to /errors/404.html. As a result, a new request is generated by Nginx with the URI /errors/404.html. This URI falls under the location /errors/ block so the configuration applies.
Logs can prove to be particularly useful when working with redirects and URL rewrites. Be aware that information on internal redirects will show up in the logs only if you set the error_log directive to debug. You can also get it to show up at the notice level, under the condition that you specify rewrite_log on; wherever you need it.
A raw, but trimmed, excerpt from the debug log summarizes the mechanism:
->http request line: "GET /page.html HTTP/1.1"
->http uri: "/page.html"
->test location: "/errors/"
->using configuration ""
->http filename: "/var/www/vhosts/website.com/httpdocs/page.html"
-> open() "/var/www/vhosts/website.com/httpdocs/page.html" failed (2: No such file or directory), client: 127.0.0.1, server: website.com, request: "GET /page.html HTTP/1.1", host:"website.com"
->http finalize request: 404, "/page.html?" 1
->http special response: 404, "/page.html?"
->internal redirect: "/errors/404.html?"
->test location: "/errors/"
->using configuration "/errors/"
->http filename: "/var/www/common/errors/404.html"
->http finalize request: 0, "/errors/404.html?" 1
Note that the use of the internal directive in the location block forbids clients from accessing the /errors/ directory. This location can only be accessed from an internal redirect.
The mechanism is the same for the index directive — if no file path is provided in the client request, Nginx will attempt to serve the specified index page by triggering an internal redirect.
Rewrite
While the previous directive error_page is not actually part of the Rewrite module, detailing its functionality provides a solid introduction to the way Nginx handles requests.
Similar to how the error_page directive redirects to another location, rewriting the URI with the rewrite directive generates an internal redirect:
server {
server_name website.com;
root /var/www/vhosts/website.com/httpdocs/;
location /storage/ {
internal;
alias /var/www/storage/;
}
location /documents/ {
rewrite ^/documents/(.*)$ /storage/$1;
}
}
A client query to http://website.com/documents/file.txt initially matches the second location block (location /documents/). However, the block contains a rewrite instruction that transforms the URI from /documents/file.txt to /storage/file.txt. The URI transformation reinitializes the process — the new URI is matched against the location blocks. This time, the first location block (location /storage/) matches the URI (/storage/file.txt).
Again, a quick peek at the debug log confirms the mechanism:
->http request line: "GET /documents/file.txt HTTP/1.1"
->http uri: "/documents/file.txt"
->test location: "/storage/"
->test location: "/documents/"
->using configuration "/documents/"
->http script regex: "^/documents/(.*)$"
->"^/documents/(.*)$" matches "/documents/file.txt", client: 127.0.0.1, server: website.com, request: "GET /documents/file.txt HTTP/1.1", host: "website.com"
->rewritten data: "/storage/file.txt", args: "", client: 127.0.0.1, server: website.com, request: "GET /documents/file.txt HTTP/1.1", host: "website.com"
->test location: "/storage/"
->using configuration "/storage/"
->http filename: "/var/www/storage/file.txt"
->HTTP/1.1 200 OK
->http output filter "/storage/test.txt?"
Infinite Loops
With all of the different syntaxes and directives, you may easily get confused. Worse — you might get Nginx confused. This happens, for instance, when your rewrite rules are redundant and cause internal redirects to loop infinitely:
server {
server_name website.com;
location /documents/ {
rewrite ^(.*)$ /documents/$1;
}
}
You thought you were doing well, but this configuration actually triggers internal redirects /documents/anything to /documents//documents/anything. Moreover, since the location patterns are re-evaluated after an internal redirect, /documents//documents/anything becomes /documents//documents//documents/anything.
Here is the corresponding excerpt from the debug log:
->test location: "/documents/"
->using configuration "/documents/"
->rewritten data: "/documents//documents/file.txt", [...]
->test location: "/documents/"
->using configuration "/documents/"
->rewritten data: "/documents//documents//documents/file.txt" [...]
->test location: "/documents/"
->using configuration "/documents/"
->rewritten data: -
>"/documents//documents//documents//documents/file.txt" [...]
->[...]
You probably wonder if this goes on indefinitely—the answer is no. The amount of cycles is restricted to 10. You are only allowed 10 internal redirects. Anything past this limit and Nginx will produce a 500 Internal Server Error.
Server Side Includes (SSI)
A potential source of sub-requests is the Server Side Include (SSI) module. The purpose of SSI is for the server to parse documents before sending the response to the client in a somewhat similar fashion to PHP or other preprocessors.
Within a regular HTML file (for example), you have the possibility to insert tags corresponding to commands interpreted by Nginx:
<html>
<head>
<!--# include file="header.html" -->
</head>
<body>
<!--# include file="body.html" -->
</body>
</html>
Nginx processes these two commands; in this case, it reads the contents of head.html and body.html and inserts them into the document source, which is then sent to the client.
Several commands are at your disposal; they are detailed in the SSI Module section. The one we are interested in for now is the include command — including a file into another file:
<!--# include virtual="/footer.php?id=123" -->
The specified file is not just opened and read from a static location. Instead, a whole subrequest is processed by Nginx, and the body of the response is inserted instead of the include tag.
Conditional Structure
The Rewrite module introduces a new set of directives and blocks, among which is the if conditional structure:
server {
if ($request_method = POST) {
[…]
}
}
This gives you the possibility to apply a configuration according to the specified condition. If the condition is true, the configuration is applied; otherwise, it isn't.
The following table describes the different syntaxes accepted when forming a condition:
Operator | Description |
None |
The condition is true if the specified variable or data is not equal to an empty string or a string starting with character 0: if ($string) { |
=, != |
The condition is true if the argument preceding the = symbol is equal to the argument following it. The following example can be read as "if the request_method is equal to POST, then apply the configuration": if ($request_method = POST) { The != operator does the opposite: "if the request method is differentthan GET, then apply the configuration": if ($request_method != GET) { |
~, ~*, !~, !~* |
The condition is true if the argument preceding the ~ symbol matches the regular expression pattern placed after it: if ($request_filename ~ "\.txt$") { ~ is case-sensitive, ~* is case-insensitive. Use the ! symbol to negate the matching: if ($request_filename !~* "\.php$") { Note that you can insert capture buffers in the regular expression: if ($uri ~ "^/search/(.*)$") { |
-f, !-f |
Tests the existence of the specified file: if (-f $request_filename) { Use !-f to test the non-existence of the file: if (!-f $request_filename) { |
-d, !-d | Similar to the -f operator, for testing the existence of a directory. |
-e, !-e | Similar to the -f operator, for testing the existence of a file, directory, or symbolic link. |
-x, !-x | Similar to the -f operator, for testing if a file exists and is executable. |
As of version 1.2.9, there is no else- or else if-like instruction. However, other directives allowing you to control the flow sequencing are available.
You might wonder: what are the advantages of using a location block over an if block? Indeed, in the following example, both seem to have the same effect:
if ($uri ~ /search/) {
[…]
}
location ~ /search/ {
[…]
}
As a matter of fact, the main difference lies within the directives that can be employed within either block — some can be inserted in an if block and some can't; on the contrary, almost all directives are authorized within a location block, as you probably noticed in the directive listings. In general, it's best to only insert directives from the Rewrite module within an if block, as other directives were not originally intended for such usage.
Directives
The Rewrite module provides you with a set of directives that do more than just rewriting a URI. The following table describes these directives along with the context in which they can be employed:
rewrite
Context: server, location, if
As discussed previously, the rewrite directive allows you to rewrite the URI of the current request, thus resetting the treatment of the said request.
Syntax: rewrite regexp replacement [flag];
Where regexp is the regular expression the URI should match in order for the replacement to apply.
Flag may take one of the following values:
- last: The current rewrite rule should be the last to be applied. After its application, the new URI is processed by Nginx and a location block is searched for. However, further rewrite instructions will be disregarded.
- break: The current rewrite rule is applied, but Nginx does not initiate a new request for the modified URI (does not restart the search for matching location blocks). All further rewrite directives are ignored.
- redirect: Returns a 302 Moved temporarily HTTP response, with the replacement URI set as value of the location header.
- permanent: Returns a 301 Moved permanently HTTP response, with the replacement URI set as the value of the location header.
If you specify a URI beginning with http:// as the replacement URI, Nginx will automatically use the redirect flag.
Note that the request URI processed by the directive is a relative URI: It does not contain the hostname and protocol. For a request such as http://website.com/documents/page.html, the request URI is /documents/page.html.
Is decoded: The URI corresponding to a request such as http://website.com/my%20page.html would be /my page.html.
Does not contain arguments: For a request such as http://website.com/page.php?id=1&p=2, the URI would be /page.php. When rewriting the URI, you don't need to consider including the arguments in the replacement URI — Nginx does it for you. If you wish for Nginx to not include the arguments in the rewritten URI, then insert a ? at the end of the replacement URI: rewrite ^/search/(.*)$ /search.php?q=$1?.
Examples:
rewrite ^/search/(.*)$ /search.php?q=$1;
rewrite ^/search/(.*)$ /search.php?q=$1?;
rewrite ^ http://website.com;
rewrite ^ http://website.com permanent;
break
Context: server, location, if
The break directive is used to prevent further rewrite directives. Past this point, the URI is fixed and cannot be altered.
Example:
if (-f $uri) {
break; # break if the file exists
}
if ($uri ~ ^/search/(.*)$) {
set $query $1;
rewrite ^ /search.php?q=$query?;
}
This example rewrites /search/anything-like queries to /search.php?q=anything. However, if the requested file exists (such as /search/index.html), the break instruction prevents Nginx from rewriting the URI.
return
Context: server, location, if
Interrupts the request treatment process and returns the specified HTTP status code or specified text.
Syntax: return code | text;
Where code is picked among the following status codes: 204, 400, 402 to 406, 408, 410, 411, 413, 416, and 500 to 504. In addition, you may use the Nginx-specific code 444 in order to return a HTTP 200 OK status code with no further header or body data. You may also specify the raw text that will be returned to the user as response body.
Example:
if ($uri ~ ^/admin/) {
return 403;
# the instruction below is NOT executed
# as Nginx already completed the request
rewrite ^ http://website.com;
}
set
Context: server, location, if
Initializes or redefines a variable. Note that some variables cannot be redefined, for example, you are not allowed to alter $uri.
Syntax: set $variable value;
Examples:
set $var1 "some text";
if ($var1 ~ ^(.*) (.*)$) {
set $var2 $1$2; #concatenation
rewrite ^ http://website.com/$var2;
}
uninitialized_variable_warn
Context: http, server, location, if
If set to on, Nginx will issue log messages when the configuration employs a variable that has not yet been initialized.
Syntax: on or off
Examples:
uninitialized_variable_warn on;
rewrite_log
Context: http, server, location, if
If set to on, Nginx will issue log messages for every operation performed by the rewrite engine at the notice error level (see error_log directive).
Syntax: on or off
Default value: off
Examples:
rewrite_log off;
Common Eewrite Rules
Here is a set of rewrite rules that satisfy basic needs for dynamic websites that wish to beautify their page links thanks to the URL rewriting mechanism. You will obviously need to adjust these rules according to your particular situation as every website is different.
Performing a Search
This rewrite rule is intended for search queries. Search keywords are included in the URL.
Input URI http://website.com/search/some-search-keywords
Rewritten URI http://website.com/search.php?q=some-search-keywords
Rewrite rule rewrite ^/search/(.*)$ /search.php?q=$1?;
User Profile Page
Most dynamic websites that allow visitors to register, offer a profile view page. URLs of this form can be employed, containing both the user ID and the username.
Input URI http://website.com/user/31/James
Rewritten URI http://website.com/user.php?id=31&name=James
Rewrite rule rewrite ^/user/([0-9]+)/(.+)$ /user.php?id=$1&name=$2?;
Multiple Parameters
Some websites may use different syntaxes for the argument string, for example, by separating non-named arguments with slashes.
Input URI http://website.com/index.php/param1/param2/param3
Rewritten URI http://website.com/index.php?p1=param1&p2=param2&p3=param3
Rewrite rule rewrite ^/index.php/(.*)/(.*)/(.*)$ /index.php?p1=$1&p2=$2&p3=$3?;
News Website Article
This URL structure is often employed by news websites as URLs contain indications of the articles' contents. It is formed of an article identifier, followed by a slash, then a list of keywords. The keywords can usually be ignored and not included in the rewritten URI.
Input URI http://website.com/33526/us-economy-strengthens
Rewritten URI http://website.com/article.php?id=33526
Rewrite rule rewrite ^/([0-9]+)/.*$ /article.php?id=$1?;
Discussion Board
Modern bulletin boards now use pretty URLs for the most part. This example shows how to create a topic view URL with two parameters — the topic identifier and the starting post. Once again, keywords are ignored:
Input URI http://website.com/topic-1234-50-some-keywords.html
Rewritten URI http://website.com/viewtopic.php?topic=1234&start=50
Rewrite rule rewrite ^/topic-([0-9]+)-([0-9]+)-(.*)\.html$ /viewtopic.php?topic=$1&start=$2?;