DNS Prefetching

DNS Prefetching

Problem

DNS resolution time can lead to a significant amount of user perceived latency. The time that DNS resolution takes is highly variable.  Latency delays range from around 1ms (locally cached results) to commonly reported times of several seconds. 

Solution

DNS prefetching is an attempt to resolve domain names before a user tries to follow a link. This is done using the computer's normal DNS resolution mechanism; no connection to Google is used. Once a domain name has been resolved, if the user does navigate to that domain, there will be no effective delay due to DNS resolution time.  The most obvious example where DNS prefetching can help is when a user is looking at a page with many links to various domains, such as a search results page.  When we encounter hyperlinks in pages, we extract the domain name from each one and resolving each domain to an IP address.  All this work is done in parallel with the user's reading of the page, using minimal CPU and network resources.  When a user clicks on any of these pre-resolved names, they will on average save about 200 milliseconds in their navigation (assuming the user hadn't already visited the domain recently). More importantly than the average savings, users won't tend to experience the "worst case" delays for DNS resolution, which are regularly over 1 second.

Architecture

Chromium's implementation of DNS prefetching does not use the browser's network stack at all.  Instead, it relies on external threads to resolve the names, thereby warming the DNS cache of the operating system (completely ignoring any cache in the application network stack).  The advantage of this approach was that it was completely compatible with all network stacks (it is external), and prevented accidental regressions on the main network stack.  
 
 
Since some DNS resolutions can take a long time, it is paramount that such delays in one resolution should not cause delays in other resolutions.  Toward this end (on Windows, where there is no native support for asynchronous DNS resolution), Chromium currently employs 8 completely asynchronous worker threads to do nothing but perform DNS prefetch resolution. Each worker thread simply waits on a queue, gets the next requested domain name, and then blocks on a synchronous Windows resolution function.   Eventually the operating system responds with a DNS resolution, the thread then discards it (leaving the OS cache warmed!), and waits for the next prefetch request.  With 8 threads, it is rare than more than one or two threads will block extensively, and most resolution proceed rather quickly (or as quickly as DNS can service them!).  On Debug builds, the "about:histograms/DNS.PrefetchQueue" has current stats on the queueing delay.

Manual Prefetch

Chromium uses the "href" attribute of hyperlinks to find host names to prefetch. However, some of those hyperlinks may be redirects, for example if the site is trying to count how many times the link is clicked.  In those situations, the "true" targeted domain is not necessarily discernible by examining the content of a web page, and so Chromium not able to prefetch the final targeted domain. 

To improve the speed of redirects, content authors can add the following tag to their page:
 
<link rel="dns-prefetch" href="//host_name_to_prefetch.com">
 
The above "link rel" tag has no impact on the visual rendering of the page, but causes Chromium to prefetch the DNS resolution of "host_name_to_prefetch.com" as though there was an actual href targeted at a path in that domain. The double slashes indicate that the URL starts with a host name (as specified in RFC 1808). It is equivalent (but unnecessary) to use a full URL such as "http://host_name_to_prefetch.com/".

DNS Prefetch Control

By default, Chromium does not prefetch host names in hyperlinks that appear in HTTPS pages. This restriction helps prevent an eavesdropper from inferring the host names of hyperlinks that appear in HTTPS pages based on DNS prefetch traffic. The one exception is that Chromium may periodically re-resolve the domain of the HTTPS page itself.

An inquisitive content author (for example, a commenter on a blog) may abuse DNS prefetching to attempt to monitor viewing of content containing links. For example, links with novel subdomains, when resolved during a prefetch, may notify a domain's resolver that a link was viewed, even if it was not clicked.  In some such cases, the authority serving the content (such as a blog owner, or webmail server) may wish to preclude such abusive monitoring.

To allow webmasters to control whether DNS prefetch is enabled or disabled, Chromium includes a DNS Prefetch Control mechanism. It can be used to turn DNS prefetch on for HTTPS pages, or turn it off for HTTP pages.
Chromium watches for an HTTP header of the form "X-DNS-Prefetch-Control" (case insensitive) with a value of either "on" or "off."  This setting changes the default behavior for the rendered content. A "meta http-equiv" tag of the same name can be used to make policy changes within a page. If a page explicitly opts out of DNS prefetch, further attempts to opt in are ignored.
 
For example, the following page from https://content_author.com/ would cause Chromium to prefetch "b.com" but not "a.com", "c.com", or "d.com".

<a href="http://a.com"> A) Default HTTPS: No prefetching </a>
<meta http-equiv="x-dns-prefetch-control" content="on">
<a href="http://b.com"> B) Manual opt-in: Prefetch domain resolution. </a>
<meta http-equiv="x-dns-prefetch-control" content="off">
<a href="http://c.com"> C) Manual opt-out: Don't prefetch domain resolution </a>
<meta http-equiv="x-dns-prefetch-control" content="on">
<a href="http://d.com"> D) Already opted out: Don't prefetch domain resolution. </a>

Child frames also inherit the DNS prefetch control opt-out setting from their parent. The DNS prefetch control setting applies only to hyperlinks and not to the manual prefetch mechanism. 

Browser Startup

Chromium automatically remembers the first 10 domains that were resolved the last time the Chromium was started, and automatically starts to resolve these names very early in the startup process.  As a result, the domains for a user's home page(s), along with any embedded domains (or anything the user "always" visits just after startup), are generally resolved before much of Chromium has ever loaded.  When Chromium finally starts to try to load and render those pages, there is typically no DNS induced latency, and the application effectively "starts up" (becoming usable) faster.  Average startup savings are 200ms or more, with common acceleration over 1 second.

Omnibox

Prefetching is also used in Chromium's omnibox, where URL and/or search queries are entered. The omnibox automatically proposes an action, either of the form of a search query, or a URL navigation, as the user types in text.  This proposed action is considered the autocompletion of what the user is typing.  Each time the omnibox makes a proposal (suggests an autocompletion), the domain for the underlying URL is automatically pre-resolved.  This means that when a user is entering a search query, while they type the query (typically when they enter a space between words), Chromium will automatically prefetch the resolution of the domain in their search provider's URL.  Assuming they haven't done a search in a while, this can save a user an average of over 100ms in getting a search result, if not more. Similarly, if the URL is a "common" URL that they've typed in the past (e.g., their favorite web site; bank;  email provider; their company; etc.), the savings in resolution time can be quite significant.

Effectiveness

Typical Usage

If a user resolved a domain name to an IP address recently, their operating system will remember (cache) the answer, and then resolution time can be as low as 0-1ms (a thousandth of a second).  If the resolution is not locally cached and needs to "go out over the network," then a bare minimum for resolution time is about 15 ms, assuming a nearby firewall (home router?) has a cached answer to the question.  Most common names like google.com and yahoo.com are resolved so often that most local ISP's name resolvers can answer in closer to 80-120ms.  If the domain name in question is an uncommon name, then a query may have to go through numerous resolvers up and down the hierarchy, and the delay can average closer to 200-300ms.  More interestingly, for any of these queries that access the internet, dropped packets, and overworked (under provisioned) name resolvers, regularly increases the total resolution time to between 1 and 10 seconds.

Network Usage

DNS resolutions use a hierarchical system, where each level in the hierarchy has a cache, to remember previous resolutions.  As a result, extra resolutions don't generally speaking cause end-to-end internet usage.  The resolutions only go as far as needed toward the "authoritative" resolver, stopping when they reach a resolver that already has the resolution in a cache.  In addition, DNS resolution requests are very light weight.  Each request typically involves sending a single UDP packet that is under 100 bytes out, and getting back a response that is around 100 bytes.  This minimal impact on network usage is compensated by a significant improvement in user experience.

Cache Eviction

The local machine's DNS cache is pretty limited.  Current estimates for the number of resolutions remembered on a Windows XP box are in the range of 50-200 domain names.  As a result, if "too many" resolutions are made, then some "necessary" resolutions might be "evicted" from the cache to make room for the new prefetches.  Chromium tries to model the underlying cache, and guess when there is a chance that a "soon to be needed" domain resolution has been evicted.  When Chromium decides an eviction may have taken place, it can automatically resolve the domain name again, ensuring it is either reloaded into the cache, or marked as "recently used" so that it won't be evicted for a "while."  Cache evictions caused by Chromium can have a negative impact on other applications that don't (yet) use prefetching techniques to keep the underlying cache "warm."
 
Many large internet sites (google.com, yahoo.com, etc.) commonly mark the domain names resolutions with an expiration time in the neighborhood of 5 minutes. They probably set the expiration time to be short so that they can better respond to changes in supply and demand for their services.  This in turn tends cause cache evictions.  As a result, most applications tend to face the problem of cache eviction, and already employ various methods to reduce the impact.
 
The "about:histograms/DNS" and "about:dns" pages contain more information about DNS prefetch activity.

posted on 2018-11-19 19:23  huangguanyuan  阅读(155)  评论(0编辑  收藏  举报

导航