RFC1738
原文地址: http://www.ietf.org/rfc/rfc1738.txt
Network Working Group T.Berners-Lee
Request for Comments: 1738 CERN
Category: Standards Track L. Masinter
Xerox Corporation
M. McCahill
University of Minnesota
Editors
December 1994
Uniform Resource Locators (URL)
Status of this Memo
This document specifiesan Internet standards track protocol for the
Internet community, andrequests discussion and suggestions for
improvements. Please refer to the current edition of the"Internet
Official ProtocolStandards" (STD 1) for the standardization state
and status of thisprotocol. Distribution of this memo isunlimited.
Abstract
This document specifies aUniform Resource Locator (URL), the syntax
and semantics offormalized information for location and access of
resources via theInternet.
1. Introduction
This document describesthe syntax and semantics for a compact string
representation for aresource available via the Internet. These
strings are called"Uniform Resource Locators" (URLs).
The specification isderived from concepts introduced by the World-
Wide Web globalinformation initiative, whose use of such objects
dates from 1990 and isdescribed in "Universal Resource Identifiers
in WWW", RFC 1630.The specification of URLs is designed to meet the
requirements laid out in"Functional Requirements for Internet
Resource Locators"[12].
This document was writtenby the URI working group of the Internet
Engineering TaskForce. Comments may be addressed to theeditors, or
to the URI-WG<uri@bunyip.com>. Discussions of the group are archived
at<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>
Berners-Lee, Masinter & McCahill [Page 1]
RFC 1738 UniformResource Locators (URL) December1994
2. General URL Syntax
Just as there are manydifferent methods of access to resources,
there are several schemesfor describing the location of such
resources.
The generic syntax forURLs provides a framework for new schemes to
be established usingprotocols other than those defined in this
document.
URLs are used to `locate'resources, by providing an abstract
identification of theresource location. Having located aresource,
a system may perform avariety of operations on the resource, as
might be characterized bysuch words as `access', `update',
`replace', `find attributes'.In general, only the `access' method
needs to be specified forany URL scheme.
2.1. The main parts of URLs
A full BNF description ofthe URL syntax is given in Section 5.
In general, URLs arewritten as follows:
<scheme>:<scheme-specific-part>
A URL contains the nameof the scheme being used (<scheme>) followed
by a colon and then astring (the <scheme-specific-part>) whose
interpretation depends onthe scheme.
Scheme names consist of asequence of characters. The lower case
letters"a"--"z", digits, and the characters plus ("+"),period
("."), andhyphen ("-") are allowed. For resiliency, programs
interpreting URLs shouldtreat upper case letters as equivalent to
lower case in schemenames (e.g., allow "HTTP" as well as "http").
2.2. URL Character Encoding Issues
URLs are sequences ofcharacters, i.e., letters, digits, and special
characters. A URLs may berepresented in a variety of ways: e.g., ink
on paper, or a sequenceof octets in a coded character set. The
interpretation of a URLdepends only on the identity of the
characters used.
In most URL schemes, thesequences of characters in different parts
of a URL are used torepresent sequences of octets used in Internet
protocols. For example,in the ftp scheme, the host name, directory
name and file names aresuch sequences of octets, represented by
parts of the URL. Within those parts, an octet may berepresented by
Berners-Lee, Masinter & McCahill [Page 2]
RFC 1738 UniformResource Locators (URL) December1994
the chararacter which hasthat octet as its code within the US-ASCII
[20] coded character set.
In addition, octets maybe encoded by a character triplet consisting
of the character"%" followed by the two hexadecimal digits (from
"0123456789ABCDEF") which forming the hexadecimal value of theoctet.
(The characters"abcdef" may also be used in hexadecimal encodings.)
Octets must be encoded ifthey have no corresponding graphic
character within theUS-ASCII coded character set, if the use of the
corresponding characteris unsafe, or if the corresponding character
is reserved for someother interpretation within the particular URL
scheme.
No corresponding graphicUS-ASCII:
URLs are written onlywith the graphic printable characters of the
US-ASCII coded characterset. The octets 80-FF hexadecimal are not
used in US-ASCII, and theoctets 00-1F and 7F hexadecimal represent
control characters; thesemust be encoded.
Unsafe:
Characters can be unsafefor a number of reasons. The space
character is unsafebecause significant spaces may disappear and
insignificant spaces maybe introduced when URLs are transcribed or
typeset or subjected tothe treatment of word-processing programs.
The characters"<" and ">" are unsafe because they are used as the
delimiters around URLs infree text; the quote mark (""") is used to
delimit URLs in somesystems. The character "#" isunsafe and should
always be encoded becauseit is used in World Wide Web and in other
systems to delimit a URLfrom a fragment/anchor identifier that might
follow it. The character "%" is unsafe becauseit is used for
encodings of othercharacters. Other characters are unsafebecause
gateways and othertransport agents are known to sometimes modify
such characters. Thesecharacters are "{", "}", "|", "\","^", "~",
"[","]", and "`".
All unsafe charactersmust always be encoded within a URL. For
example, the character"#" must be encoded within URLs even in
systems that do notnormally deal with fragment or anchor
identifiers, so that ifthe URL is copied into another system that
does use them, it willnot be necessary to change the URL encoding.
Berners-Lee, Masinter & McCahill [Page 3]
RFC 1738 UniformResource Locators (URL) December1994
Reserved:
Many URL schemes reservecertain characters for a special meaning:
their appearance in thescheme-specific part of the URL has a
designated semantics. Ifthe character corresponding to an octet is
reserved in a scheme, theoctet must be encoded. The characters";",
"/","?", ":", "@", "=" and"&" are the characters which may be
reserved for specialmeaning within a scheme. No other characters may
be reserved within ascheme.
Usually a URL has thesame interpretation when an octet is
represented by acharacter and when it encoded. However, this is not
true for reservedcharacters: encoding a character reserved for a
particular scheme maychange the semantics of a URL.
Thus, only alphanumerics,the special characters "$-_.+!*'(),", and
reserved characters usedfor their reserved purposes may be used
unencoded within a URL.
On the other hand,characters that are not required to be encoded
(including alphanumerics)may be encoded within the scheme-specific
part of a URL, as long asthey are not being used for a reserved
purpose.
2.3 Hierarchical schemes and relative links
In some cases, URLs areused to locate resources that contain
pointers to otherresources. In some cases, those pointers are
represented as relativelinks where the expression of the location of
the second resource is interms of "in the same place as this one
except with the followingrelative path". Relative links are not
described in thisdocument. However, the use of relative links
depends on the originalURL containing a hierarchical structure
against which therelative link is based.
Some URL schemes (such asthe ftp, http, and file schemes) contain
names that can beconsidered hierarchical; the components of the
hierarchy are separatedby "/".
Berners-Lee, Masinter & McCahill [Page 4]
RFC 1738 UniformResource Locators (URL) December 1994
3. Specific Schemes
The mapping for someexisting standard and experimental protocols is
outlined in the BNFsyntax definition. Notes on particularprotocols
follow. The schemescovered are:
ftp File Transfer protocol
http Hypertext Transfer Protocol
gopher The Gopher protocol
mailto Electronic mail address
news USENET news
nntp USENET news using NNTPaccess
telnet Reference to interactivesessions
wais Wide Area InformationServers
file Host-specific file names
prospero Prospero Directory Service
Other schemes may bespecified by future specifications. Section 4 of
this document describeshow new schemes may be registered, and lists
some scheme names thatare under development.
3.1. Common Internet Scheme Syntax
While the syntax for therest of the URL may vary depending on the
particular schemeselected, URL schemes that involve the direct use
of an IP-based protocolto a specified host on the Internet use a
common syntax for thescheme-specific data:
//<user>:<password>@<host>:<port>/<url-path>
Some or all of the parts"<user>:<password>@", ":<password>",
":<port>", and "/<url-path>" may beexcluded. The scheme specific
data start with a doubleslash "//" to indicate that it complies with
the common Internetscheme syntax. The different components obey the
following rules:
user
An optional username. Some schemes (e.g., ftp) allow the
specification of auser name.
password
An optionalpassword. If present, it follows the user
name separated fromit by a colon.
The user name (andpassword), if present, are followed by a
commercial at-sign"@". Within the user and password field, any ":",
"@", or"/" must be encoded.
Berners-Lee, Masinter & McCahill [Page 5]
RFC 1738 UniformResource Locators (URL) December1994
Note that an empty username or password is different than no user
name or password; thereis no way to specify a password without
specifying a user name.E.g., <URL:ftp://@host.com/> has an empty
user name and nopassword, <URL:ftp://host.com/> has no user name,
while<URL:ftp://foo:@host.com/> has a user name of "foo" and an
empty password.
host
The fully qualifieddomain name of a network host, or its IP
address as a set offour decimal digit groups separated by
".". Fullyqualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section2.1 of RFC 1123
[5]: a sequence ofdomain labels separated by ".", each domain
label starting andending with an alphanumerical character and
possibly alsocontaining "-" characters. The rightmost domain
label will neverstart with a digit, though, which
syntacticallydistinguishes all domain names from the IP
addresses.
port
The port number toconnect to. Most schemes designate
protocols that havea default port number. Another port number
may optionally besupplied, in decimal, separated from the
host by a colon. Ifthe port is omitted, the colon is as well.
url-path
The rest of thelocator consists of data specific to the
scheme, and is knownas the "url-path". It supplies the
details of how thespecified resource can be accessed. Note
that the"/" between the host (or port) and the url-path is
NOT part of theurl-path.
The url-path syntaxdepends on the scheme being used, as does the
manner in which it isinterpreted.
3.2. FTP
The FTP URL scheme isused to designate files and directories on
Internet hosts accessibleusing the FTP protocol (RFC959).
A FTP URL follow thesyntax described in Section 3.1. If:<port> is
omitted, the portdefaults to 21.
Berners-Lee, Masinter & McCahill [Page 6]
RFC 1738 UniformResource Locators (URL) December1994
3.2.1. FTP Name and Password
A user name and passwordmay be supplied; they are used in the ftp
"USER" and"PASS" commands after first making the connection to the
FTP server. If no user name or password is supplied andone is
requested by the FTPserver, the conventions for "anonymous" FTP are
to be used, as follows:
The user name"anonymous" is supplied.
The password issupplied as the Internet e-mail address
of the end useraccessing the resource.
If the URL supplies auser name but no password, and the remote
server requests apassword, the program interpreting the FTP URL
should request one fromthe user.
3.2.2. FTP url-path
The url-path of a FTP URLhas the following syntax:
<cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>
Where <cwd1>through <cwdN> and <name> are (possibly encoded) strings
and <typecode> isone of the characters "a", "i", or "d". The part
";type=<typecode>"may be omitted. The <cwdx> and <name> parts may be
empty. The whole url-pathmay be omitted, including the "/"
delimiting it from theprefix containing user, password, host, and
port.
The url-path isinterpreted as a series of FTP commands as follows:
Each of the<cwd> elements is to be supplied, sequentially, as the
argument to a CWD(change working directory) command.
If the typecode is"d", perform a NLST (name list) command with
<name> as theargument, and interpret the results as a file
directory listing.
Otherwise, perform aTYPE command with <typecode> as the argument,
and then access thefile whose name is <name> (for example, using
the RETR command.)
Within a name or CWDcomponent, the characters "/" and ";" are
reserved and must beencoded. The components are decoded prior to
their use in the FTPprotocol. In particular, if theappropriate FTP
sequence to access a particularfile requires supplying a string
containing a"/" as an argument to a CWD or RETR command, it is
Berners-Lee, Masinter & McCahill [Page 7]
RFC 1738 UniformResource Locators (URL) December 1994
necessary to encode each"/".
For example, the URL<URL:ftp://myname@host.dom/%2Fetc/motd> is
interpreted by FTP-ing to"host.dom", logging in as "myname"
(prompting for a passwordif it is asked for), and then executing
"CWD /etc" andthen "RETR motd". This has a different meaning from
<URL:ftp://myname@host.dom/etc/motd> which would "CWDetc" and then
"RETR motd";the initial "CWD" might be executed relative to the
default directory for"myname". On the other hand,
<URL:ftp://myname@host.dom//etc/motd>, would "CWD " witha null
argument, then "CWDetc", and then "RETR motd".
FTP URLs may also be usedfor other operations; for example, it is
possible to update a fileon a remote file server, or infer
information about it fromthe directory listings. The mechanism for
doing so is not spelledout here.
3.2.3. FTP Typecode is Optional
The entire;type=<typecode> part of a FTP URL is optional. If it is
omitted, the clientprogram interpreting the URL must guess the
appropriate mode to use.In general, the data content type of a file
can only be guessed fromthe name, e.g., from the suffix of the name;
the appropriate type codeto be used for transfer of the file can
then be deduced from thedata content of the file.
3.2.4 Hierarchy
For some file systems,the "/" used to denote the hierarchical
structure of the URLcorresponds to the delimiter used to construct a
file name hierarchy, andthus, the filename will look similar to the
URL path. This does NOTmean that the URL is a Unix filename.
3.2.5. Optimization
Clients accessingresources via FTP may employ additional heuristics
to optimize the interaction.For some FTP servers, for example, it
may be reasonable to keepthe control connection open while accessing
multiple URLs from thesame server. However, there is no common
hierarchical model to theFTP protocol, so if a directory change
command has been given,it is impossible in general to deduce what
sequence should be givento navigate to another directory for a
second retrieval, if thepaths are different. The only reliable
algorithm is todisconnect and reestablish the control connection.
Berners-Lee, Masinter & McCahill [Page 8]
RFC 1738 UniformResource Locators (URL) December1994
3.3. HTTP
The HTTP URL scheme isused to designate Internet resources
accessible using HTTP(HyperText Transfer Protocol).
The HTTP protocol isspecified elsewhere. This specification only
describes the syntax ofHTTP URLs.
An HTTP URL takes theform:
http://<host>:<port>/<path>?<searchpart>
where <host> and<port> are as described in Section 3.1. If :<port>
is omitted, the portdefaults to 80. No user name or passwordis
allowed. <path> is an HTTP selector, and<searchpart> is a query
string. The <path>is optional, as is the <searchpart> and its
preceding "?".If neither <path> nor <searchpart> is present, the "/"
may also be omitted.
Within the <path>and <searchpart> components, "/", ";", "?"are
reserved. The "/" character may be usedwithin HTTP to designate a
hierarchical structure.
3.4. GOPHER
The Gopher URL scheme isused to designate Internet resources
accessible using theGopher protocol.
The base Gopher protocolis described in RFC 1436 and supports items
and collections of items(directories). The Gopher+ protocol is a set
of upward compatibleextensions to the base Gopher protocol and is
described in [2]. Gopher+supports associating arbitrary sets of
attributes and alternatedata representations with Gopher items.
Gopher URLs accommodateboth Gopher and Gopher+ items and item
attributes.
3.4.1. Gopher URL syntax
A Gopher URL takes theform:
gopher://<host>:<port>/<gopher-path>
where <gopher-path>is one of
<gophertype><selector>
<gophertype><selector>%09<search>
<gophertype><selector>%09<search>%09<gopher+_string>
Berners-Lee, Masinter & McCahill [Page 9]
RFC 1738 UniformResource Locators (URL) December1994
If :<port> isomitted, the port defaults to 70. <gophertype> is a
single-character field todenote the Gopher type of the resource to
which the URL refers. Theentire <gopher-path> may also be empty, in
which case the delimiting"/" is also optional and the <gophertype>
defaults to"1".
<selector> is theGopher selector string. In the Gopherprotocol,
Gopher selector stringsare a sequence of octets which may contain
any octets except 09hexadecimal (US-ASCII HT or tab) 0A hexadecimal
(US-ASCII character LF),and 0D (US-ASCII character CR).
Gopher clients specifywhich item to retrieve by sending the Gopher
selector string to aGopher server.
Within the<gopher-path>, no characters are reserved.
Note that some Gopher<selector> strings begin with a copy of the
<gophertype>character, in which case that character will occur twice
consecutively. The Gopherselector string may be an empty string;
this is how Gopherclients refer to the top-level directory on a
Gopher server.
3.4.2 Specifying URLs for Gopher Search Engines
If the URL refers to asearch to be submitted to a Gopher search
engine, the selector isfollowed by an encoded tab (%09) and the
search string. To submita search to a Gopher search engine, the
Gopher client sends the<selector> string (after decoding), a tab,
and the search string tothe Gopher server.
3.4.3 URL syntax for Gopher+ items
URLs for Gopher+ itemshave a second encoded tab (%09) and a Gopher+
string. Note that in thiscase, the %09<search> string must be
supplied, although the<search> element may be the empty string.
The<gopher+_string> is used to represent information required for
retrieval of the Gopher+item. Gopher+ items may have alternate
views, arbitrary sets ofattributes, and may have electronic forms
associated with them.
To retrieve the dataassociated with a Gopher+ URL, a client will
connect to the server andsend the Gopher selector, followed by a tab
and the search string(which may be empty), followed by a tab and the
Gopher+ commands.
Berners-Lee, Masinter & McCahill [Page 10]
RFC 1738 UniformResource Locators (URL) December1994
3.4.4 Default Gopher+ data representation
When a Gopher serverreturns a directory listing to a client, the
Gopher+ items are taggedwith either a "+" (denoting Gopher+ items)
or a "?"(denoting Gopher+ items which have a +ASK form associated
with them). A Gopher URLwith a Gopher+ string consisting of only a
"+" refers tothe default view (data representation) of the item
while a Gopher+ stringcontaining only a "?" refer to an item with a
Gopher electronic formassociated with it.
3.4.5 Gopher+ items with electronic forms
Gopher+ items which havea +ASK associated with them (i.e. Gopher+
items tagged with a"?") require the client to fetch the item's +ASK
attribute to get the formdefinition, and then ask the user to fill
out the form and returnthe user's responses along with the selector
string to retrieve theitem. Gopher+ clients know how to dothis but
depend on the"?" tag in the Gopher+ item description to know when to
handle this case. The"?" is used in the Gopher+ string to be
consistent with Gopher+protocol's use of this symbol.
3.4.6 Gopher+ item attribute collections
To refer to the Gopher+attributes of an item, the Gopher URL's
Gopher+ string consistsof "!" or "$". "!" refers to the all of a
Gopher+ item'sattributes. "$" refers to all the item attributes for
all items in a Gopherdirectory.
3.4.7 Referring to specific Gopher+ attributes
To refer to specificattributes, the URL's gopher+_string is
"!<attribute_name>" or "$<attribute_name>".For example, to refer to
the attribute containingthe abstract of an item, the gopher+_string
would be"!+ABSTRACT".
To refer to severalattributes, the gopher+_string consists of the
attribute names separatedby coded spaces. For example,
"!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELLattributes
of an item.
3.4.8 URL syntax for Gopher+ alternate views
Gopher+ allows foroptional alternate data representations (alternate
views) of items. Toretrieve a Gopher+ alternate view, a Gopher+
client sends theappropriate view and language identifier (found in
the item's +VIEWattribute). To refer to a specific Gopher+ alternate
view, the URL's Gopher+string would be in the form:
Berners-Lee, Masinter & McCahill [Page 11]
RFC 1738 UniformResource Locators (URL) December1994
+<view_name>%20<language_name>
For example, a Gopher+string of "+application/postscript%20Es_ES"
refers to the Spanishlanguage postscript alternate view of a Gopher+
item.
3.4.9 URL syntax for Gopher+ electronic forms
The gopher+_string for aURL that refers to an item referenced by a
Gopher+ electronic form(an ASK block) filled out with specific
values is a coded versionof what the client sends to the server.
The gopher+_string is ofthe form:
+%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A
To retrieve this item,the Gopher client sends:
<a_gopher_selector><tab>+<tab>1<cr><lf>
+-1<cr><lf>
<ask_item1_value><cr><lf>
<ask_item2_value><cr><lf>
.<cr><lf>
to the Gopher server.
3.5. MAILTO
The mailto URL scheme isused to designate the Internet mailing
address of an individualor service. No additional information other
than an Internet mailingaddress is present or implied.
A mailto URL takes theform:
mailto:<rfc822-addr-spec>
where <rfc822-addr-spec>is (the encoding of an) addr-spec, as
specified in RFC 822 [6].Within mailto URLs, there are no reserved
characters.
Note that the percentsign ("%") is commonly used within RFC 822
addresses and must beencoded.
Unlike many URLs, themailto scheme does not represent a data object
to be accessed directly;there is no sense in which it designates an
object. It has adifferent use than the message/external-body type in
MIME.
Berners-Lee, Masinter & McCahill [Page 12]
RFC 1738 UniformResource Locators (URL) December1994
3.6. NEWS
The news URL scheme isused to refer to either news groups or
individual articles ofUSENET news, as specified in RFC 1036.
A news URL takes one oftwo forms:
news:<newsgroup-name>
news:<message-id>
A <newsgroup-name>is a period-delimited hierarchical name, such as
"comp.infosystems.www.misc". A <message-id> correspondsto the
Message-ID of section2.1.5 of RFC 1036, without the enclosing "<"
and ">"; ittakes the form <unique>@<full_domain_name>. A message
identifier may bedistinguished from a news group name by the
presence of thecommercial at "@" character. No additional characters
are reserved within thecomponents of a news URL.
If <newsgroup-name>is "*" (as in <URL:news:*>), it is used to refer
to "all availablenews groups".
The news URLs are unusualin that by themselves, they do not contain
sufficient information tolocate a single resource, but, rather, are
location-independent.
3.7. NNTP
The nntp URL scheme is analternative method of referencing news
articles, useful forspecifying news articles from NNTP servers (RFC
977).
A nntp URL take the form:
nntp://<host>:<port>/<newsgroup-name>/<article-number>
where <host> and<port> are as described in Section 3.1. If :<port>
is omitted, the portdefaults to 119.
The<newsgroup-name> is the name of the group, while the <article-
number> is the numericid of the article within that newsgroup.
Note that while nntp:URLs specify a unique location for the article
resource, most NNTPservers currently on the Internet today are
configured only to allowaccess from local clients, and thus nntp
URLs do not designateglobally accessible resources. Thus, the news:
form of URL is preferredas a way of identifying news articles.
Berners-Lee, Masinter & McCahill [Page 13]
RFC 1738 UniformResource Locators (URL) December1994
3.8. TELNET
The Telnet URL scheme isused to designate interactive services that
may be accessed by theTelnet protocol.
A telnet URL takes theform:
telnet://<user>:<password>@<host>:<port>/
as specified in Section3.1. The final "/" character may be omitted.
If :<port> isomitted, the port defaults to 23. The:<password> can
be omitted, as well asthe whole <user>:<password> part.
This URL does notdesignate a data object, but rather an interactive
service. Remoteinteractive services vary widely in the means by
which they allow remotelogins; in practice, the <user> and
<password> suppliedare advisory only: clients accessing a telnet URL
merely advise the user ofthe suggested username and password.
3.9. WAIS
The WAIS URL scheme isused to designate WAIS databases, searches, or
individual documentsavailable from a WAIS database. WAIS is
described in [7]. TheWAIS protocol is described in RFC 1625 [17];
Although the WAISprotocol is based on Z39.50-1988, the WAIS URL
scheme is not intendedfor use with arbitrary Z39.50 services.
A WAIS URL takes one ofthe following forms:
wais://<host>:<port>/<database>
wais://<host>:<port>/<database>?<search>
wais://<host>:<port>/<database>/<wtype>/<wpath>
where <host> and<port> are as described in Section 3.1. If :<port>
is omitted, the portdefaults to 210. The first formdesignates a
WAIS database that isavailable for searching. The second form
designates a particularsearch. <database> is the name ofthe WAIS
database being queried.
The third form designatesa particular document within a WAIS
database to be retrieved.In this form <wtype> is the WAIS
designation of the typeof the object. Many WAIS implementations
require that a clientknow the "type" of an object prior to
retrieval, the type beingreturned along with the internal object
identifier in the searchresponse. The <wtype> is includedin the
URL in order to allow theclient interpreting the URL adequate
information to actuallyretrieve the document.
Berners-Lee, Masinter & McCahill [Page 14]
RFC 1738 UniformResource Locators (URL) December1994
The <wpath> of aWAIS URL consists of the WAIS document-id, encoded
as necessary using themethod described in Section 2.2. The WAIS
document-id should betreated opaquely; it may only be decomposed by
the server that issuedit.
3.10 FILES
The file URL scheme isused to designate files accessible on a
particular host computer.This scheme, unlike most other URL schemes,
does not designate aresource that is universally accessible over the
Internet.
A file URL takes theform:
file://<host>/<path>
where <host> is thefully qualified domain name of the system on
which the <path> isaccessible, and <path> is a hierarchical
directory path of theform <directory>/<directory>/.../<name>.
For example, a VMS file
DISK$USER:[MY.NOTES]NOTE123456.TXT
might become
<URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>
As a special case,<host> can be the string "localhost" or the empty
string; this isinterpreted as `the machine from which the URL is
being interpreted'.
The file URL scheme isunusual in that it does not specify an
Internet protocol oraccess method for such files; as such, its
utility in networkprotocols between hosts is limited.
3.11 PROSPERO
The Prospero URL schemeis used to designate resources that are
accessed via the ProsperoDirectory Service. The Prospero protocol is
described elsewhere [14].
A prospero URLs takes theform:
prospero://<host>:<port>/<hsoname>;<field>=<value>
where <host> and<port> are as described in Section 3.1. If :<port>
is omitted, the portdefaults to 1525. No username or password is
Berners-Lee, Masinter & McCahill [Page 15]
RFC 1738 UniformResource Locators (URL) December1994
allowed.
The <hsoname> isthe host-specific object name in the Prospero
protocol, suitablyencoded. This name is opaque andinterpreted by
the Prospero server. The semicolon ";" is reserved andmay not
appear without quoting inthe <hsoname>.
Prospero URLs areinterpreted by contacting a Prospero directory
server on the specifiedhost and port to determine appropriate access
methods for a resource,which might themselves be represented as
different URLs. ExternalProspero links are represented as URLs of
the underlying accessmethod and are not represented as Prospero
URLs.
Note that a slash"/" may appear in the <hsoname> without quoting and
no significance may beassumed by the application. Thoughslashes
may indicate hierarchicalstructure on the server, such structure is
not guaranteed. Note thatmany <hsoname>s begin with a slash, in
which case the host orport will be followed by a double slash: the
slash from the URLsyntax, followed by the initial slash from the
<hsoname>. (E.g.,<URL:prospero://host.dom//pros/name> designates a
<hsoname> of"/pros/name".)
In addition, after the<hsoname>, optional fields and values
associated with aProspero link may be specified as part of the URL.
When present, eachfield/value pair is separated from each other and
from the rest of the URLby a ";" (semicolon). The nameof the field
and its value areseparated by a "=" (equal sign). If present, these
fields serve to identifythe target of the URL. For example, the
OBJECT-VERSION field canbe specified to identify a specific version
of an object.
4. REGISTRATION OF NEW SCHEMES
A new scheme may beintroduced by defining a mapping onto a
conforming URL syntax,using a new prefix. URLs for experimental
schemes may be used bymutual agreement between parties. Scheme names
starting with thecharacters "x-" are reserved for experimental
purposes.
The Internet AssignedNumbers Authority (IANA) will maintain a
registry of URL schemes.Any submission of a new URL scheme must
include a definition ofan algorithm for accessing of resources
within that scheme andthe syntax for representing such a scheme.
URL schemes must havedemonstrable utility and operability. One way
to provide such ademonstration is via a gateway which provides
objects in the new schemefor clients using an existing protocol. If
Berners-Lee, Masinter & McCahill [Page 16]
RFC 1738 UniformResource Locators (URL) December1994
the new scheme does notlocate resources that are data objects, the
properties of names inthe new space must be clearly defined.
New schemes should try tofollow the same syntactic conventions of
existing schemes, whereappropriate. It is likewise recommended
that, where a protocolallows for retrieval by URL, that the client
software have provisionfor being configured to use specific gateway
locators for indirectaccess through new naming schemes.
The following scheme havebeen proposed at various times, but this
document does not definetheir syntax or use at this time. It is
suggested that IANAreserve their scheme names for future definition:
afs Andrew File System global filenames.
mid Message identifiers forelectronic mail.
cid Content identifiers for MIME bodyparts.
nfs Network File System (NFS) filenames.
tn3270 Interactive 3270 emulation sessions.
mailserver Access to data available from mailservers.
z39.50 Access to ANSI Z39.50 services.
5. BNF for specific URL schemes
This is a BNF-likedescription of the Uniform Resource Locator
syntax, using theconventions of RFC822, except that "|" is used to
designate alternatives,and brackets [] are used around optional or
repeated elements.Briefly, literals are quoted with "", optional
elements are enclosed in[brackets], and elements may be preceded
with <n>* todesignate n or more repetitions of the following
element; n defaults to 0.
; The generic form of a URL is:
genericurl = scheme":" schemepart
; Specific predefined schemes are defined here; new schemes
; may be registered with IANA
url = httpurl |ftpurl | newsurl |
nntpurl |telneturl | gopherurl |
waisurl |mailtourl | fileurl |
prosperourl| otherurl
; new schemes follow the general syntax
otherurl = genericurl
; the scheme is in lower case; interpreters should use case-ignore
scheme = 1*[lowalpha | digit | "+" | "-" | "." ]
Berners-Lee, Masinter & McCahill [Page 17]
RFC 1738 UniformResource Locators (URL) December1994
schemepart = *xchar |ip-schemepart
; URL schemeparts for ip based protocols:
ip-schemepart ="//" login [ "/" urlpath ]
login = [ user [":" password ] "@" ] hostport
hostport = host [":" port ]
host = hostname |hostnumber
hostname = *[domainlabel "." ] toplabel
domainlabel = alphadigit| alphadigit *[ alphadigit | "-" ] alphadigit
toplabel = alpha |alpha *[ alphadigit | "-" ] alphadigit
alphadigit = alpha |digit
hostnumber = digits"." digits "." digits "." digits
port = digits
user = *[ uchar |";" | "?" | "&" | "=" ]
password = *[ uchar |";" | "?" | "&" | "=" ]
urlpath = *xchar ; depends on protocol see section 3.1
; The predefined schemes:
; FTP (see also RFC959)
ftpurl ="ftp://" login [ "/" fpath [ ";type=" ftptype ]]
fpath = fsegment *["/" fsegment ]
fsegment = *[ uchar |"?" | ":" | "@" | "&" |"=" ]
ftptype ="A" | "I" | "D" | "a" | "i" |"d"
; FILE
fileurl ="file://" [ host | "localhost" ] "/" fpath
; HTTP
httpurl ="http://" hostport [ "/" hpath [ "?" search ]]
hpath = hsegment *["/" hsegment ]
hsegment = *[ uchar |";" | ":" | "@" | "&" |"=" ]
search = *[ uchar |";" | ":" | "@" | "&" |"=" ]
; GOPHER (see also RFC1436)
gopherurl ="gopher://" hostport [ / [ gtype [ selector
["%09" search [ "%09" gopher+_string ] ] ] ] ]
gtype = xchar
selector = *xchar
gopher+_string = *xchar
Berners-Lee, Masinter & McCahill [Page 18]
RFC 1738 UniformResource Locators (URL) December1994
; MAILTO (see also RFC822)
mailtourl ="mailto:" encoded822addr
encoded822addr = 1*xchar ; further defined in RFC822
; NEWS (see also RFC1036)
newsurl ="news:" grouppart
grouppart ="*" | group | article
group = alpha *[alpha | digit | "-" | "." | "+" | "_" ]
article = 1*[ uchar |";" | "/" | "?" | ":" |"&" | "=" ] "@" host
; NNTP (see also RFC977)
nntpurl ="nntp://" hostport "/" group [ "/" digits ]
; TELNET
telneturl = "telnet://"login [ "/" ]
; WAIS (see also RFC1625)
waisurl =waisdatabase | waisindex | waisdoc
waisdatabase ="wais://" hostport "/" database
waisindex ="wais://" hostport "/" database "?" search
waisdoc ="wais://" hostport "/" database "/" wtype"/" wpath
database = *uchar
wtype = *uchar
wpath = *uchar
; PROSPERO
prosperourl ="prospero://" hostport "/" ppath *[ fieldspec ]
ppath = psegment *["/" psegment ]
psegment = *[ uchar |"?" | ":" | "@" | "&" |"=" ]
fieldspec =";" fieldname "=" fieldvalue
fieldname = *[ uchar |"?" | ":" | "@" | "&" ]
fieldvalue = *[ uchar |"?" | ":" | "@" | "&" ]
; Miscellaneous definitions
lowalpha ="a" | "b" | "c" | "d" | "e" |"f" | "g" | "h" |
"i" | "j" | "k" | "l" |"m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" |"u" | "v" | "w" | "x" |
"y" | "z"
hialpha ="A" | "B" | "C" | "D" | "E" |"F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" |"N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" |"W" | "X" | "Y" | "Z"
Berners-Lee, Masinter & McCahill [Page 19]
RFC 1738 UniformResource Locators (URL) December1994
alpha = lowalpha |hialpha
digit ="0" | "1" | "2" | "3" | "4" |"5" | "6" | "7" |
"8" | "9"
safe ="$" | "-" | "_" | "." | "+"
extra ="!" | "*" | "'" | "(" | ")" |","
national ="{" | "}" | "|" | "\" | "^" |"~" | "[" | "]" | "`"
punctuation ="<" | ">" | "#" | "%" |<">
reserved =";" | "/" | "?" | ":" | "@" |"&" | "="
hex = digit |"A" | "B" | "C" | "D" | "E" |"F" |
"a" | "b" | "c" | "d" |"e" | "f"
escape ="%" hex hex
unreserved = alpha |digit | safe | extra
uchar = unreserved| escape
xchar = unreserved| reserved | escape
digits = 1*digit
6. Security Considerations
The URL scheme does notin itself pose a security threat. Users
should beware that thereis no general guarantee that a URL which at
one time points to agiven object continues to do so, and does not
even at some later timepoint to a different object due to the
movement of objects onservers.
A URL-related securitythreat is that it is sometimes possible to
construct a URL such thatan attempt to perform a harmless idempotent
operation such as theretrieval of the object will in fact cause a
possibly damaging remoteoperation to occur. The unsafe URL is
typically constructed byspecifying a port number other than that
reserved for the networkprotocol in question. The client
unwittingly contacts aserver which is in fact running a different
protocol. The content of the URL contains instructionswhich when
interpreted according tothis other protocol cause an unexpected
operation. An example hasbeen the use of gopher URLs to cause a rude
message to be sent via aSMTP server. Caution should be used when
using any URL whichspecifies a port number other than the default
for the protocol,especially when it is a number within the reserved
space.
Care should be taken whenURLs contain embedded encoded delimiters
for a given protocol (forexample, CR and LF characters for telnet
protocols) that these arenot unencoded before transmission. This
would violate theprotocol but could be used to simulate an extra
operation or parameter,again causing an unexpected and possible
harmful remote operationto be performed.
Berners-Lee, Masinter & McCahill [Page 20]
RFC 1738 UniformResource Locators (URL) December 1994
The use of URLscontaining passwords that should be secret is clearly
unwise.
7. Acknowledgements
This paper builds on thebasic WWW design (RFC 1630) and much
discussion of theseissues by many people on the network. The
discussion wasparticularly stimulated by articles by Clifford Lynch,
Brewster Kahle [10] andWengyik Yeong [18]. Contributions from John
Curran, Clifford Neuman,Ed Vielmetti and later the IETF URL BOF and
URI working group wereincorporated.
Most recently, carefulreadings and comments by Dan Connolly, Ned
Freed, Roy Fielding,Guido van Rossum, Michael Dolan, Bert Bos, John
Kunze, Olle Jarnefors,Peter Svanberg and many others have helped
refine this RFC.
Berners-Lee, Masinter & McCahill [Page 21]
RFC 1738 UniformResource Locators (URL) December1994
APPENDIX: Recommendations for URLs in Context
URIs, including URLs, areintended to be transmitted through
protocols which provide acontext for their interpretation.
In some cases, it will benecessary to distinguish URLs from other
possible data structuresin a syntactic structure. In this case, is
recommended that URLs bepreceeded with a prefix consisting of the
characters"URL:". For example, this prefix may be used to
distinguish URLs fromother kinds of URIs.
In addition, there aremany occasions when URLs are included in other
kinds of text; examplesinclude electronic mail, USENET news
messages, or printed onpaper. In such cases, it is convenient to
have a separate syntacticwrapper that delimits the URL and separates
it from the rest of thetext, and in particular from punctuation
marks that might bemistaken for part of the URL. For this purpose,
is recommended that anglebrackets ("<" and ">"), along with the
prefix "URL:",be used to delimit the boundaries of the URL. This
wrapper does not formpart of the URL and should not be used in
contexts in whichdelimiters are already specified.
In the case where afragment/anchor identifier is associated with a
URL (following a"#"), the identifier would be placed within the
brackets as well.
In some cases, extrawhitespace (spaces, linebreaks, tabs, etc.) may
need to be added to breaklong URLs across lines. The whitespace
should be ignored whenextracting the URL.
No whitespace should beintroduced after a hyphen ("-") character.
Because some typesettersand printers may (erroneously) introduce a
hyphen at the end of linewhen breaking a line, the interpreter of a
URL containing a linebreak immediately after a hyphen should ignore
all unencoded whitespacearound the line break, and should be aware
that the hyphen may ormay not actually be part of the URL.
Examples:
Yes, Jim, I found itunder <URL:ftp://info.cern.ch/pub/www/doc;
type=d> but you canprobably pick it up from <URL:ftp://ds.in
ternic.net/rfc>. Note thewarning in <URL:http://ds.internic.
net/instructions/overview.html#WARNING>.
Berners-Lee, Masinter & McCahill [Page 22]
RFC 1738 UniformResource Locators (URL) December1994
References
[1] Anklesaria, F.,McCahill, M., Lindner, P., Johnson, D.,
Torrey, D., and B.Alberti, "The Internet Gopher Protocol
(a distributeddocument search and retrieval protocol)",
RFC 1436, Universityof Minnesota, March 1993.
<URL:ftp://ds.internic.net/rfc/rfc1436.txt;type=a>
[2] Anklesaria, F.,Lindner, P., McCahill, M., Torrey, D.,
Johnson, D., and B.Alberti, "Gopher+: Upward compatible
enhancements to theInternet Gopher protocol",
University ofMinnesota, July 1993.
<URL:ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol
/Gopher+/Gopher+.txt>
[3] Berners-Lee, T.,"Universal Resource Identifiers in WWW: A
Unifying Syntax forthe Expression of Names and Addresses of
Objects on theNetwork as used in the World-Wide Web", RFC
1630, CERN, June1994.
<URL:ftp://ds.internic.net/rfc/rfc1630.txt>
[4] Berners-Lee, T.,"Hypertext Transfer Protocol (HTTP)",
CERN, November 1993.
<URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z>
[5] Braden, R., Editor,"Requirements for Internet Hosts --
Application andSupport", STD 3, RFC 1123, IETF, October 1989.
<URL:ftp://ds.internic.net/rfc/rfc1123.txt>
[6] Crocker, D."Standard for the Format of ARPA Internet Text
Messages", STD11, RFC 822, UDEL, April 1982.
<URL:ftp://ds.internic.net/rfc/rfc822.txt>
[7] Davis, F., Kahle, B., Morris, H., Salem,J., Shen, T., Wang, R.,
Sui, J., and M.Grinbaum, "WAIS Interface Protocol Prototype
FunctionalSpecification", (v1.5), Thinking Machines
Corporation, April1990.
<URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>
[8] Horton, M. and R.Adams, "Standard For Interchange of USENET
Messages", RFC1036, AT&T Bell Laboratories, Center for Seismic
Studies, December1987.
<URL:ftp://ds.internic.net/rfc/rfc1036.txt>
[9] Huitema, C.,"Naming: Strategies and Techniques", Computer
Networks and ISDNSystems 23 (1991) 107-110.
Berners-Lee, Masinter & McCahill [Page 23]
RFC 1738 UniformResource Locators (URL) December 1994
[10] Kahle, B.,"Document Identifiers, or International Standard
Book Numbers for theElectronic Age", 1991.
<URL:ftp://quake.think.com/pub/wais/doc/doc-ids.txt>
[11] Kantor, B. and P.Lapsley, "Network News Transfer Protocol:
A Proposed Standardfor the Stream-Based Transmission of News",
RFC 977, UC San Diego& UC Berkeley, February 1986.
<URL:ftp://ds.internic.net/rfc/rfc977.txt>
[12] Kunze, J.,"Functional Requirements for Internet Resource
Locators", Workin Progress, December 1994.
<URL:ftp://ds.internic.net/internet-drafts
/draft-ietf-uri-irl-fun-req-02.txt>
[13] Mockapetris, P.,"Domain Names - Concepts and Facilities",
STD 13, RFC 1034,USC/Information Sciences Institute,
November 1987.
<URL:ftp://ds.internic.net/rfc/rfc1034.txt>
[14] Neuman, B., and S.Augart, "The Prospero Protocol",
USC/InformationSciences Institute, June 1993.
<URL:ftp://prospero.isi.edu/pub/prospero/doc
/prospero-protocol.PS.Z>
[15] Postel, J. and J.Reynolds, "File Transfer Protocol (FTP)",
STD 9, RFC 959,USC/Information Sciences Institute,
October 1985.
<URL:ftp://ds.internic.net/rfc/rfc959.txt>
[16] Sollins, K. and L.Masinter, "Functional Requirements for
Uniform ResourceNames", RFC 1737, MIT/LCS, Xerox Corporation,
December 1994.
<URL:ftp://ds.internic.net/rfc/rfc1737.txt>
[17] St. Pierre, M,Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,
Kunze, J., Morris,H., and F. Schiettecatte, "WAIS over
Z39.50-1988",RFC 1625, WAIS, Inc., CNIDR, Thinking Machines
Corp., UC Berkeley,FS Consulting, June 1994.
<URL:ftp://ds.internic.net/rfc/rfc1625.txt>
[18] Yeong, W."Towards Networked Information Retrieval", Technical
report 91-06-25-01,Performance Systems International, Inc.
<URL:ftp://uu.psi.com/wp/nir.txt>, June 1991.
[19] Yeong, W.,"Representing Public Archives in the Directory",
Work in Progress,November 1991.
Berners-Lee, Masinter & McCahill [Page 24]
RFC 1738 UniformResource Locators (URL) December1994
[20] "Coded CharacterSet -- 7-bit American Standard Code for
InformationInterchange", ANSI X3.4-1986.
Editors' Addresses
Tim Berners-Lee
World-Wide Web project
CERN,
1211 Geneva 23,
Switzerland
Phone: +41 (22)767 3755
Fax: +41 (22)767 7155
EMail: timbl@info.cern.ch
Larry Masinter
Xerox PARC
3333 Coyote Hill Road
Palo Alto, CA 94034
Phone: (415) 812-4365
Fax: (415) 812-4333
EMail: masinter@parc.xerox.com
Mark McCahill
Computer and Information Services,
University of Minnesota
Room 152 Shepherd Labs
100 Union Street SE
Minneapolis, MN 55455
Phone: (612) 625 1300
EMail: mpm@boombox.micro.umn.edu
Berners-Lee, Masinter & McCahill [Page 25]