RFC1738

原文地址: http://www.ietf.org/rfc/rfc1738.txt

 

 

Network Working Group                                     T.Berners-Lee

Request for Comments: 1738                                          CERN

Category: Standards Track                                    L. Masinter

                                                      Xerox Corporation

                                                             M. McCahill

                                                University of Minnesota

                                                                Editors

                                                          December 1994

                    Uniform Resource Locators (URL)

Status of this Memo

   This document specifiesan Internet standards track protocol for the

   Internet community, andrequests discussion and suggestions for

   improvements.  Please refer to the current edition of the"Internet

   Official ProtocolStandards" (STD 1) for the standardization state

   and status of thisprotocol.  Distribution of this memo isunlimited.

Abstract

   This document specifies aUniform Resource Locator (URL), the syntax

   and semantics offormalized information for location and access of

   resources via theInternet.

 

1. Introduction

 

   This document describesthe syntax and semantics for a compact string

   representation for aresource available via the Internet. These

   strings are called"Uniform Resource Locators" (URLs).

 

   The specification isderived from concepts introduced by the World-

   Wide Web globalinformation initiative, whose use of such objects

   dates from 1990 and isdescribed in "Universal Resource Identifiers

   in WWW", RFC 1630.The specification of URLs is designed to meet the

   requirements laid out in"Functional Requirements for Internet

   Resource Locators"[12].

 

   This document was writtenby the URI working group of the Internet

   Engineering TaskForce.  Comments may be addressed to theeditors, or

   to the URI-WG<uri@bunyip.com>. Discussions of the group are archived

   at<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>

 

Berners-Lee, Masinter & McCahill                                [Page 1]

RFC 1738            UniformResource Locators (URL)        December1994

2. General URL Syntax

   Just as there are manydifferent methods of access to resources,

   there are several schemesfor describing the location of such

   resources.

 

   The generic syntax forURLs provides a framework for new schemes to

   be established usingprotocols other than those defined in this

   document.

 

   URLs are used to `locate'resources, by providing an abstract

   identification of theresource location.  Having located aresource,

   a system may perform avariety of operations on the resource, as

   might be characterized bysuch words as `access', `update',

   `replace', `find attributes'.In general, only the `access' method

   needs to be specified forany URL scheme.

 

2.1. The main parts of URLs

 

   A full BNF description ofthe URL syntax is given in Section 5.

   In general, URLs arewritten as follows:

       <scheme>:<scheme-specific-part>

   A URL contains the nameof the scheme being used (<scheme>) followed

   by a colon and then astring (the <scheme-specific-part>) whose

   interpretation depends onthe scheme.

   Scheme names consist of asequence of characters. The lower case

   letters"a"--"z", digits, and the characters plus ("+"),period

   ("."), andhyphen ("-") are allowed. For resiliency, programs

   interpreting URLs shouldtreat upper case letters as equivalent to

   lower case in schemenames (e.g., allow "HTTP" as well as "http").

2.2. URL Character Encoding Issues

   URLs are sequences ofcharacters, i.e., letters, digits, and special

   characters. A URLs may berepresented in a variety of ways: e.g., ink

   on paper, or a sequenceof octets in a coded character set. The

   interpretation of a URLdepends only on the identity of the

   characters used.

   In most URL schemes, thesequences of characters in different parts

   of a URL are used torepresent sequences of octets used in Internet

   protocols. For example,in the ftp scheme, the host name, directory

   name and file names aresuch sequences of octets, represented by

   parts of the URL.  Within those parts, an octet may berepresented by

Berners-Lee, Masinter & McCahill                                [Page 2]

RFC 1738            UniformResource Locators (URL)        December1994

   the chararacter which hasthat octet as its code within the US-ASCII

   [20] coded character set.

   In addition, octets maybe encoded by a character triplet consisting

   of the character"%" followed by the two hexadecimal digits (from

  "0123456789ABCDEF") which forming the hexadecimal value of theoctet.

   (The characters"abcdef" may also be used in hexadecimal encodings.)

 

   Octets must be encoded ifthey have no corresponding graphic

   character within theUS-ASCII coded character set, if the use of the

   corresponding characteris unsafe, or if the corresponding character

   is reserved for someother interpretation within the particular URL

   scheme.

   No corresponding graphicUS-ASCII:

 

   URLs are written onlywith the graphic printable characters of the

   US-ASCII coded characterset. The octets 80-FF hexadecimal are not

   used in US-ASCII, and theoctets 00-1F and 7F hexadecimal represent

   control characters; thesemust be encoded.

   Unsafe:

   Characters can be unsafefor a number of reasons.  The space

   character is unsafebecause significant spaces may disappear and

   insignificant spaces maybe introduced when URLs are transcribed or

   typeset or subjected tothe treatment of word-processing programs.

   The characters"<" and ">" are unsafe because they are used as the

   delimiters around URLs infree text; the quote mark (""") is used to

   delimit URLs in somesystems.  The character "#" isunsafe and should

   always be encoded becauseit is used in World Wide Web and in other

   systems to delimit a URLfrom a fragment/anchor identifier that might

   follow it.  The character "%" is unsafe becauseit is used for

   encodings of othercharacters.  Other characters are unsafebecause

   gateways and othertransport agents are known to sometimes modify

   such characters. Thesecharacters are "{", "}", "|", "\","^", "~",

   "[","]", and "`".

   All unsafe charactersmust always be encoded within a URL. For

   example, the character"#" must be encoded within URLs even in

   systems that do notnormally deal with fragment or anchor

   identifiers, so that ifthe URL is copied into another system that

   does use them, it willnot be necessary to change the URL encoding.

Berners-Lee, Masinter & McCahill                                [Page 3]

RFC 1738            UniformResource Locators (URL)        December1994

   Reserved:

   Many URL schemes reservecertain characters for a special meaning:

   their appearance in thescheme-specific part of the URL has a

   designated semantics. Ifthe character corresponding to an octet is

   reserved in a scheme, theoctet must be encoded.  The characters";",

   "/","?", ":", "@", "=" and"&" are the characters which may be

   reserved for specialmeaning within a scheme. No other characters may

   be reserved within ascheme.

   Usually a URL has thesame interpretation when an octet is

   represented by acharacter and when it encoded. However, this is not

   true for reservedcharacters: encoding a character reserved for a

   particular scheme maychange the semantics of a URL.

   Thus, only alphanumerics,the special characters "$-_.+!*'(),", and

   reserved characters usedfor their reserved purposes may be used

   unencoded within a URL.

   On the other hand,characters that are not required to be encoded

   (including alphanumerics)may be encoded within the scheme-specific

   part of a URL, as long asthey are not being used for a reserved

   purpose.

2.3 Hierarchical schemes and relative links

   In some cases, URLs areused to locate resources that contain

   pointers to otherresources. In some cases, those pointers are

   represented as relativelinks where the expression of the location of

   the second resource is interms of "in the same place as this one

   except with the followingrelative path". Relative links are not

   described in thisdocument. However, the use of relative links

   depends on the originalURL containing a hierarchical structure

   against which therelative link is based.

   Some URL schemes (such asthe ftp, http, and file schemes) contain

   names that can beconsidered hierarchical; the components of the

   hierarchy are separatedby "/".

 

Berners-Lee, Masinter & McCahill                                [Page 4]

RFC 1738            UniformResource Locators (URL)        December 1994

3. Specific Schemes

   The mapping for someexisting standard and experimental protocols is

   outlined in the BNFsyntax definition.  Notes on particularprotocols

   follow. The schemescovered are:

   ftp                     File Transfer protocol

   http                    Hypertext Transfer Protocol

   gopher                  The Gopher protocol

   mailto                  Electronic mail address

   news                    USENET news

   nntp                    USENET news using NNTPaccess

   telnet                  Reference to interactivesessions

   wais                    Wide Area InformationServers

   file                    Host-specific file names

   prospero                Prospero Directory Service

   Other schemes may bespecified by future specifications. Section 4 of

   this document describeshow new schemes may be registered, and lists

   some scheme names thatare under development.

3.1. Common Internet Scheme Syntax

   While the syntax for therest of the URL may vary depending on the

   particular schemeselected, URL schemes that involve the direct use

   of an IP-based protocolto a specified host on the Internet use a

   common syntax for thescheme-specific data:

       //<user>:<password>@<host>:<port>/<url-path>

   Some or all of the parts"<user>:<password>@", ":<password>",

  ":<port>", and "/<url-path>" may beexcluded.  The scheme specific

   data start with a doubleslash "//" to indicate that it complies with

   the common Internetscheme syntax. The different components obey the

   following rules:

 

    user

        An optional username. Some schemes (e.g., ftp) allow the

        specification of auser name.

    password

        An optionalpassword. If present, it follows the user

        name separated fromit by a colon.

   The user name (andpassword), if present, are followed by a

   commercial at-sign"@". Within the user and password field, any ":",

   "@", or"/" must be encoded.

Berners-Lee, Masinter & McCahill                                [Page 5]

RFC 1738            UniformResource Locators (URL)        December1994

   Note that an empty username or password is different than no user

   name or password; thereis no way to specify a password without

   specifying a user name.E.g., <URL:ftp://@host.com/> has an empty

   user name and nopassword, <URL:ftp://host.com/> has no user name,

   while<URL:ftp://foo:@host.com/> has a user name of "foo" and an

   empty password.

    host

        The fully qualifieddomain name of a network host, or its IP

        address as a set offour decimal digit groups separated by

        ".". Fullyqualified domain names take the form as described

        in Section 3.5 of RFC 1034 [13] and Section2.1 of RFC 1123

        [5]: a sequence ofdomain labels separated by ".", each domain

        label starting andending with an alphanumerical character and

        possibly alsocontaining "-" characters. The rightmost domain

        label will neverstart with a digit, though, which

        syntacticallydistinguishes all domain names from the IP

        addresses.

    port

        The port number toconnect to. Most schemes designate

        protocols that havea default port number. Another port number

        may optionally besupplied, in decimal, separated from the

        host by a colon. Ifthe port is omitted, the colon is as well.

    url-path

        The rest of thelocator consists of data specific to the

        scheme, and is knownas the "url-path". It supplies the

        details of how thespecified resource can be accessed. Note

        that the"/" between the host (or port) and the url-path is

        NOT part of theurl-path.

   The url-path syntaxdepends on the scheme being used, as does the

   manner in which it isinterpreted.

3.2. FTP

   The FTP URL scheme isused to designate files and directories on

   Internet hosts accessibleusing the FTP protocol (RFC959).

   A FTP URL follow thesyntax described in Section 3.1.  If:<port> is

   omitted, the portdefaults to 21.

Berners-Lee, Masinter & McCahill                                [Page 6]

RFC 1738            UniformResource Locators (URL)        December1994

3.2.1. FTP Name and Password

   A user name and passwordmay be supplied; they are used in the ftp

   "USER" and"PASS" commands after first making the connection to the

   FTP server.  If no user name or password is supplied andone is

   requested by the FTPserver, the conventions for "anonymous" FTP are

   to be used, as follows:

        The user name"anonymous" is supplied.

        The password issupplied as the Internet e-mail address

        of the end useraccessing the resource.

   If the URL supplies auser name but no password, and the remote

   server requests apassword, the program interpreting the FTP URL

   should request one fromthe user.

3.2.2. FTP url-path

   The url-path of a FTP URLhas the following syntax:

       <cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>

   Where <cwd1>through <cwdN> and <name> are (possibly encoded) strings

   and <typecode> isone of the characters "a", "i", or "d".  The part

   ";type=<typecode>"may be omitted. The <cwdx> and <name> parts may be

   empty. The whole url-pathmay be omitted, including the "/"

   delimiting it from theprefix containing user, password, host, and

   port.

   The url-path isinterpreted as a series of FTP commands as follows:

      Each of the<cwd> elements is to be supplied, sequentially, as the

      argument to a CWD(change working directory) command.

      If the typecode is"d", perform a NLST (name list) command with

      <name> as theargument, and interpret the results as a file

      directory listing.

      Otherwise, perform aTYPE command with <typecode> as the argument,

      and then access thefile whose name is <name> (for example, using

      the RETR command.)

   Within a name or CWDcomponent, the characters "/" and ";" are

   reserved and must beencoded. The components are decoded prior to

   their use in the FTPprotocol.  In particular, if theappropriate FTP

   sequence to access a particularfile requires supplying a string

   containing a"/" as an argument to a CWD or RETR command, it is

Berners-Lee, Masinter & McCahill                                [Page 7]

RFC 1738            UniformResource Locators (URL)        December 1994

   necessary to encode each"/".

   For example, the URL<URL:ftp://myname@host.dom/%2Fetc/motd> is

   interpreted by FTP-ing to"host.dom", logging in as "myname"

   (prompting for a passwordif it is asked for), and then executing

   "CWD /etc" andthen "RETR motd". This has a different meaning from

  <URL:ftp://myname@host.dom/etc/motd> which would "CWDetc" and then

   "RETR motd";the initial "CWD" might be executed relative to the

   default directory for"myname". On the other hand,

  <URL:ftp://myname@host.dom//etc/motd>, would "CWD " witha null

   argument, then "CWDetc", and then "RETR motd".

   FTP URLs may also be usedfor other operations; for example, it is

   possible to update a fileon a remote file server, or infer

   information about it fromthe directory listings. The mechanism for

   doing so is not spelledout here.

3.2.3. FTP Typecode is Optional

   The entire;type=<typecode> part of a FTP URL is optional. If it is

   omitted, the clientprogram interpreting the URL must guess the

   appropriate mode to use.In general, the data content type of a file

   can only be guessed fromthe name, e.g., from the suffix of the name;

   the appropriate type codeto be used for transfer of the file can

   then be deduced from thedata content of the file.

3.2.4 Hierarchy

   For some file systems,the "/" used to denote the hierarchical

   structure of the URLcorresponds to the delimiter used to construct a

   file name hierarchy, andthus, the filename will look similar to the

   URL path. This does NOTmean that the URL is a Unix filename.

3.2.5. Optimization

   Clients accessingresources via FTP may employ additional heuristics

   to optimize the interaction.For some FTP servers, for example, it

   may be reasonable to keepthe control connection open while accessing

   multiple URLs from thesame server. However, there is no common

   hierarchical model to theFTP protocol, so if a directory change

   command has been given,it is impossible in general to deduce what

   sequence should be givento navigate to another directory for a

   second retrieval, if thepaths are different.  The only reliable

   algorithm is todisconnect and reestablish the control connection.

Berners-Lee, Masinter & McCahill                                [Page 8]

RFC 1738            UniformResource Locators (URL)        December1994

3.3. HTTP

   The HTTP URL scheme isused to designate Internet resources

   accessible using HTTP(HyperText Transfer Protocol).

   The HTTP protocol isspecified elsewhere. This specification only

   describes the syntax ofHTTP URLs.

   An HTTP URL takes theform:

     http://<host>:<port>/<path>?<searchpart>

   where <host> and<port> are as described in Section 3.1. If :<port>

   is omitted, the portdefaults to 80.  No user name or passwordis

   allowed.  <path> is an HTTP selector, and<searchpart> is a query

   string. The <path>is optional, as is the <searchpart> and its

   preceding "?".If neither <path> nor <searchpart> is present, the "/"

   may also be omitted.

   Within the <path>and <searchpart> components, "/", ";", "?"are

   reserved.  The "/" character may be usedwithin HTTP to designate a

   hierarchical structure.

 

3.4. GOPHER

 

   The Gopher URL scheme isused to designate Internet resources

   accessible using theGopher protocol.

 

   The base Gopher protocolis described in RFC 1436 and supports items

   and collections of items(directories). The Gopher+ protocol is a set

   of upward compatibleextensions to the base Gopher protocol and is

   described in [2]. Gopher+supports associating arbitrary sets of

   attributes and alternatedata representations with Gopher items.

   Gopher URLs accommodateboth Gopher and Gopher+ items and item

   attributes.

 

3.4.1. Gopher URL syntax

 

   A Gopher URL takes theform:

 

     gopher://<host>:<port>/<gopher-path>

 

   where <gopher-path>is one of

 

      <gophertype><selector>

       <gophertype><selector>%09<search>

      <gophertype><selector>%09<search>%09<gopher+_string>

Berners-Lee, Masinter & McCahill                                [Page 9]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

 

   If :<port> isomitted, the port defaults to 70. <gophertype> is a

   single-character field todenote the Gopher type of the resource to

   which the URL refers. Theentire <gopher-path> may also be empty, in

   which case the delimiting"/" is also optional and the <gophertype>

   defaults to"1".

 

   <selector> is theGopher selector string.  In the Gopherprotocol,

   Gopher selector stringsare a sequence of octets which may contain

   any octets except 09hexadecimal (US-ASCII HT or tab) 0A hexadecimal

   (US-ASCII character LF),and 0D (US-ASCII character CR).

 

   Gopher clients specifywhich item to retrieve by sending the Gopher

   selector string to aGopher server.

 

   Within the<gopher-path>, no characters are reserved.

 

   Note that some Gopher<selector> strings begin with a copy of the

   <gophertype>character, in which case that character will occur twice

   consecutively. The Gopherselector string may be an empty string;

   this is how Gopherclients refer to the top-level directory on a

   Gopher server.

 

3.4.2 Specifying URLs for Gopher Search Engines

 

   If the URL refers to asearch to be submitted to a Gopher search

   engine, the selector isfollowed by an encoded tab (%09) and the

   search string. To submita search to a Gopher search engine, the

   Gopher client sends the<selector> string (after decoding), a tab,

   and the search string tothe Gopher server.

 

3.4.3 URL syntax for Gopher+ items

 

   URLs for Gopher+ itemshave a second encoded tab (%09) and a Gopher+

   string. Note that in thiscase, the %09<search> string must be

   supplied, although the<search> element may be the empty string.

 

   The<gopher+_string> is used to represent information required for

   retrieval of the Gopher+item. Gopher+ items may have alternate

   views, arbitrary sets ofattributes, and may have electronic forms

   associated with them.

 

   To retrieve the dataassociated with a Gopher+ URL, a client will

   connect to the server andsend the Gopher selector, followed by a tab

   and the search string(which may be empty), followed by a tab and the

   Gopher+ commands.

Berners-Lee, Masinter & McCahill                               [Page 10]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

3.4.4 Default Gopher+ data representation

 

   When a Gopher serverreturns a directory listing to a client, the

   Gopher+ items are taggedwith either a "+" (denoting Gopher+ items)

   or a "?"(denoting Gopher+ items which have a +ASK form associated

   with them). A Gopher URLwith a Gopher+ string consisting of only a

   "+" refers tothe default view (data representation) of the item

   while a Gopher+ stringcontaining only a "?" refer to an item with a

   Gopher electronic formassociated with it.

 

3.4.5 Gopher+ items with electronic forms

 

   Gopher+ items which havea +ASK associated with them (i.e. Gopher+

   items tagged with a"?") require the client to fetch the item's +ASK

   attribute to get the formdefinition, and then ask the user to fill

   out the form and returnthe user's responses along with the selector

   string to retrieve theitem.  Gopher+ clients know how to dothis but

   depend on the"?" tag in the Gopher+ item description to know when to

   handle this case. The"?" is used in the Gopher+ string to be

   consistent with Gopher+protocol's use of this symbol.

 

3.4.6 Gopher+ item attribute collections

 

   To refer to the Gopher+attributes of an item, the Gopher URL's

   Gopher+ string consistsof "!" or "$". "!" refers to the all of a

   Gopher+ item'sattributes. "$" refers to all the item attributes for

   all items in a Gopherdirectory.

 

3.4.7 Referring to specific Gopher+ attributes

 

   To refer to specificattributes, the URL's gopher+_string is

  "!<attribute_name>" or "$<attribute_name>".For example, to refer to

   the attribute containingthe abstract of an item, the gopher+_string

   would be"!+ABSTRACT".

 

   To refer to severalattributes, the gopher+_string consists of the

   attribute names separatedby coded spaces. For example,

  "!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELLattributes

   of an item.

 

3.4.8 URL syntax for Gopher+ alternate views

 

   Gopher+ allows foroptional alternate data representations (alternate

   views) of items. Toretrieve a Gopher+ alternate view, a Gopher+

   client sends theappropriate view and language identifier (found in

   the item's +VIEWattribute). To refer to a specific Gopher+ alternate

   view, the URL's Gopher+string would be in the form:

Berners-Lee, Masinter & McCahill                               [Page 11]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

       +<view_name>%20<language_name>

 

   For example, a Gopher+string of "+application/postscript%20Es_ES"

   refers to the Spanishlanguage postscript alternate view of a Gopher+

   item.

 

3.4.9 URL syntax for Gopher+ electronic forms

 

   The gopher+_string for aURL that refers to an item referenced by a

   Gopher+ electronic form(an ASK block) filled out with specific

   values is a coded versionof what the client sends to the server.

   The gopher+_string is ofthe form:

 

+%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A

 

   To retrieve this item,the Gopher client sends:

 

      <a_gopher_selector><tab>+<tab>1<cr><lf>

      +-1<cr><lf>

      <ask_item1_value><cr><lf>

      <ask_item2_value><cr><lf>

       .<cr><lf>

 

   to the Gopher server.

 

3.5. MAILTO

 

   The mailto URL scheme isused to designate the Internet mailing

   address of an individualor service. No additional information other

   than an Internet mailingaddress is present or implied.

 

   A mailto URL takes theform:

 

       mailto:<rfc822-addr-spec>

 

   where <rfc822-addr-spec>is (the encoding of an) addr-spec, as

   specified in RFC 822 [6].Within mailto URLs, there are no reserved

   characters.

 

   Note that the percentsign ("%") is commonly used within RFC 822

   addresses and must beencoded.

 

   Unlike many URLs, themailto scheme does not represent a data object

   to be accessed directly;there is no sense in which it designates an

   object. It has adifferent use than the message/external-body type in

   MIME.

 

Berners-Lee, Masinter & McCahill                               [Page 12]

 

RFC 1738            UniformResource Locators (URL)        December1994

3.6. NEWS

 

   The news URL scheme isused to refer to either news groups or

   individual articles ofUSENET news, as specified in RFC 1036.

 

   A news URL takes one oftwo forms:

 

    news:<newsgroup-name>

     news:<message-id>

 

   A <newsgroup-name>is a period-delimited hierarchical name, such as

  "comp.infosystems.www.misc". A <message-id> correspondsto the

   Message-ID of section2.1.5 of RFC 1036, without the enclosing "<"

   and ">"; ittakes the form <unique>@<full_domain_name>.  A message

   identifier may bedistinguished from a news group name by the

   presence of thecommercial at "@" character. No additional characters

   are reserved within thecomponents of a news URL.

 

   If <newsgroup-name>is "*" (as in <URL:news:*>), it is used to refer

   to "all availablenews groups".

 

   The news URLs are unusualin that by themselves, they do not contain

   sufficient information tolocate a single resource, but, rather, are

   location-independent.

 

3.7. NNTP

 

   The nntp URL scheme is analternative method of referencing news

   articles, useful forspecifying news articles from NNTP servers (RFC

   977).

 

   A nntp URL take the form:

 

     nntp://<host>:<port>/<newsgroup-name>/<article-number>

 

   where <host> and<port> are as described in Section 3.1. If :<port>

   is omitted, the portdefaults to 119.

 

   The<newsgroup-name> is the name of the group, while the <article-

   number> is the numericid of the article within that newsgroup.

 

   Note that while nntp:URLs specify a unique location for the article

   resource, most NNTPservers currently on the Internet today are

   configured only to allowaccess from local clients, and thus nntp

   URLs do not designateglobally accessible resources. Thus, the news:

   form of URL is preferredas a way of identifying news articles.

 

Berners-Lee, Masinter & McCahill                               [Page 13]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

3.8. TELNET

 

   The Telnet URL scheme isused to designate interactive services that

   may be accessed by theTelnet protocol.

 

   A telnet URL takes theform:

 

      telnet://<user>:<password>@<host>:<port>/

 

   as specified in Section3.1. The final "/" character may be omitted.

   If :<port> isomitted, the port defaults to 23.  The:<password> can

   be omitted, as well asthe whole <user>:<password> part.

 

   This URL does notdesignate a data object, but rather an interactive

   service. Remoteinteractive services vary widely in the means by

   which they allow remotelogins; in practice, the <user> and

   <password> suppliedare advisory only: clients accessing a telnet URL

   merely advise the user ofthe suggested username and password.

 

3.9.  WAIS

 

   The WAIS URL scheme isused to designate WAIS databases, searches, or

   individual documentsavailable from a WAIS database. WAIS is

   described in [7]. TheWAIS protocol is described in RFC 1625 [17];

   Although the WAISprotocol is based on Z39.50-1988, the WAIS URL

   scheme is not intendedfor use with arbitrary Z39.50 services.

 

   A WAIS URL takes one ofthe following forms:

 

    wais://<host>:<port>/<database>

    wais://<host>:<port>/<database>?<search>

    wais://<host>:<port>/<database>/<wtype>/<wpath>

 

   where <host> and<port> are as described in Section 3.1. If :<port>

   is omitted, the portdefaults to 210.  The first formdesignates a

   WAIS database that isavailable for searching. The second form

   designates a particularsearch.  <database> is the name ofthe WAIS

   database being queried.

 

   The third form designatesa particular document within a WAIS

   database to be retrieved.In this form <wtype> is the WAIS

   designation of the typeof the object. Many WAIS implementations

   require that a clientknow the "type" of an object prior to

   retrieval, the type beingreturned along with the internal object

   identifier in the searchresponse.  The <wtype> is includedin the

   URL in order to allow theclient interpreting the URL adequate

   information to actuallyretrieve the document.

Berners-Lee, Masinter & McCahill                               [Page 14]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

 

   The <wpath> of aWAIS URL consists of the WAIS document-id, encoded

   as necessary using themethod described in Section 2.2. The WAIS

   document-id should betreated opaquely; it may only be decomposed by

   the server that issuedit.

 

3.10 FILES

 

   The file URL scheme isused to designate files accessible on a

   particular host computer.This scheme, unlike most other URL schemes,

   does not designate aresource that is universally accessible over the

   Internet.

 

   A file URL takes theform:

 

      file://<host>/<path>

 

   where <host> is thefully qualified domain name of the system on

   which the <path> isaccessible, and <path> is a hierarchical

   directory path of theform <directory>/<directory>/.../<name>.

 

  For example, a VMS file

 

    DISK$USER:[MY.NOTES]NOTE123456.TXT

 

   might become

 

    <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>

 

   As a special case,<host> can be the string "localhost" or the empty

   string; this isinterpreted as `the machine from which the URL is

   being interpreted'.

 

   The file URL scheme isunusual in that it does not specify an

   Internet protocol oraccess method for such files; as such, its

   utility in networkprotocols between hosts is limited.

 

3.11 PROSPERO

 

   The Prospero URL schemeis used to designate resources that are

   accessed via the ProsperoDirectory Service. The Prospero protocol is

   described elsewhere [14].

 

   A prospero URLs takes theform:

 

     prospero://<host>:<port>/<hsoname>;<field>=<value>

 

   where <host> and<port> are as described in Section 3.1. If :<port>

   is omitted, the portdefaults to 1525. No username or password is

Berners-Lee, Masinter & McCahill                               [Page 15]

 

RFC 1738            UniformResource Locators (URL)        December1994

   allowed.

 

   The <hsoname> isthe host-specific object name in the Prospero

   protocol, suitablyencoded.  This name is opaque andinterpreted by

   the Prospero server.  The semicolon ";" is reserved andmay not

   appear without quoting inthe <hsoname>.

 

   Prospero URLs areinterpreted by contacting a Prospero directory

   server on the specifiedhost and port to determine appropriate access

   methods for a resource,which might themselves be represented as

   different URLs. ExternalProspero links are represented as URLs of

   the underlying accessmethod and are not represented as Prospero

   URLs.

 

   Note that a slash"/" may appear in the <hsoname> without quoting and

   no significance may beassumed by the application.  Thoughslashes

   may indicate hierarchicalstructure on the server, such structure is

   not guaranteed. Note thatmany <hsoname>s begin with a slash, in

   which case the host orport will be followed by a double slash: the

   slash from the URLsyntax, followed by the initial slash from the

   <hsoname>. (E.g.,<URL:prospero://host.dom//pros/name> designates a

   <hsoname> of"/pros/name".)

 

   In addition, after the<hsoname>, optional fields and values

   associated with aProspero link may be specified as part of the URL.

   When present, eachfield/value pair is separated from each other and

   from the rest of the URLby a ";" (semicolon).  The nameof the field

   and its value areseparated by a "=" (equal sign). If present, these

   fields serve to identifythe target of the URL.  For example, the

   OBJECT-VERSION field canbe specified to identify a specific version

   of an object.

 

4. REGISTRATION OF NEW SCHEMES

 

   A new scheme may beintroduced by defining a mapping onto a

   conforming URL syntax,using a new prefix. URLs for experimental

   schemes may be used bymutual agreement between parties. Scheme names

   starting with thecharacters "x-" are reserved for experimental

   purposes.

   The Internet AssignedNumbers Authority (IANA) will maintain a

   registry of URL schemes.Any submission of a new URL scheme must

   include a definition ofan algorithm for accessing of resources

   within that scheme andthe syntax for representing such a scheme.

   URL schemes must havedemonstrable utility and operability. One way

   to provide such ademonstration is via a gateway which provides

   objects in the new schemefor clients using an existing protocol. If

Berners-Lee, Masinter & McCahill                               [Page 16]

RFC 1738            UniformResource Locators (URL)        December1994

   the new scheme does notlocate resources that are data objects, the

   properties of names inthe new space must be clearly defined.

   New schemes should try tofollow the same syntactic conventions of

   existing schemes, whereappropriate.  It is likewise recommended

   that, where a protocolallows for retrieval by URL, that the client

   software have provisionfor being configured to use specific gateway

   locators for indirectaccess through new naming schemes.

   The following scheme havebeen proposed at various times, but this

   document does not definetheir syntax or use at this time. It is

   suggested that IANAreserve their scheme names for future definition:

   afs              Andrew File System global filenames.

   mid              Message identifiers forelectronic mail.

   cid              Content identifiers for MIME bodyparts.

   nfs              Network File System (NFS) filenames.

   tn3270           Interactive 3270 emulation sessions.

   mailserver       Access to data available from mailservers.

   z39.50           Access to ANSI Z39.50 services.

5. BNF for specific URL schemes

   This is a BNF-likedescription of the Uniform Resource Locator

   syntax, using theconventions of RFC822, except that "|" is used to

   designate alternatives,and brackets [] are used around optional or

   repeated elements.Briefly, literals are quoted with "", optional

   elements are enclosed in[brackets], and elements may be preceded

   with <n>* todesignate n or more repetitions of the following

   element; n defaults to 0.

; The generic form of a URL is:

genericurl     = scheme":" schemepart

; Specific predefined schemes are defined here; new schemes

; may be registered with IANA

url            = httpurl |ftpurl | newsurl |

                 nntpurl |telneturl | gopherurl |

                 waisurl |mailtourl | fileurl |

                 prosperourl| otherurl

; new schemes follow the general syntax

otherurl       = genericurl

; the scheme is in lower case; interpreters should use case-ignore

scheme         = 1*[lowalpha | digit | "+" | "-" | "." ]

Berners-Lee, Masinter & McCahill                               [Page 17]

RFC 1738            UniformResource Locators (URL)        December1994

schemepart     = *xchar |ip-schemepart

; URL schemeparts for ip based protocols:

ip-schemepart  ="//" login [ "/" urlpath ]

login          = [ user [":" password ] "@" ] hostport

hostport       = host [":" port ]

host           = hostname |hostnumber

hostname       = *[domainlabel "." ] toplabel

domainlabel    = alphadigit| alphadigit *[ alphadigit | "-" ] alphadigit

toplabel       = alpha |alpha *[ alphadigit | "-" ] alphadigit

alphadigit     = alpha |digit

hostnumber     = digits"." digits "." digits "." digits

port           = digits

user           = *[ uchar |";" | "?" | "&" | "=" ]

password       = *[ uchar |";" | "?" | "&" | "=" ]

urlpath        = *xchar    ; depends on protocol see section 3.1

 

; The predefined schemes:

 

; FTP (see also RFC959)

 

ftpurl         ="ftp://" login [ "/" fpath [ ";type=" ftptype ]]

fpath          = fsegment *["/" fsegment ]

fsegment       = *[ uchar |"?" | ":" | "@" | "&" |"=" ]

ftptype        ="A" | "I" | "D" | "a" | "i" |"d"

 

; FILE

 

fileurl        ="file://" [ host | "localhost" ] "/" fpath

 

; HTTP

 

httpurl        ="http://" hostport [ "/" hpath [ "?" search ]]

hpath          = hsegment *["/" hsegment ]

hsegment       = *[ uchar |";" | ":" | "@" | "&" |"=" ]

search         = *[ uchar |";" | ":" | "@" | "&" |"=" ]

 

; GOPHER (see also RFC1436)

 

gopherurl      ="gopher://" hostport [ / [ gtype [ selector

                 ["%09" search [ "%09" gopher+_string ] ] ] ] ]

gtype          = xchar

selector       = *xchar

gopher+_string = *xchar

Berners-Lee, Masinter & McCahill                               [Page 18]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

; MAILTO (see also RFC822)

 

mailtourl      ="mailto:" encoded822addr

encoded822addr = 1*xchar               ; further defined in RFC822

 

; NEWS (see also RFC1036)

 

newsurl        ="news:" grouppart

grouppart      ="*" | group | article

group          = alpha *[alpha | digit | "-" | "." | "+" | "_" ]

article        = 1*[ uchar |";" | "/" | "?" | ":" |"&" | "=" ] "@" host

 

; NNTP (see also RFC977)

 

nntpurl        ="nntp://" hostport "/" group [ "/" digits ]

 

; TELNET

 

telneturl      = "telnet://"login [ "/" ]

 

; WAIS (see also RFC1625)

 

waisurl        =waisdatabase | waisindex | waisdoc

waisdatabase   ="wais://" hostport "/" database

waisindex      ="wais://" hostport "/" database "?" search

waisdoc        ="wais://" hostport "/" database "/" wtype"/" wpath

database       = *uchar

wtype          = *uchar

wpath          = *uchar

 

; PROSPERO

 

prosperourl    ="prospero://" hostport "/" ppath *[ fieldspec ]

ppath          = psegment *["/" psegment ]

psegment       = *[ uchar |"?" | ":" | "@" | "&" |"=" ]

fieldspec      =";" fieldname "=" fieldvalue

fieldname      = *[ uchar |"?" | ":" | "@" | "&" ]

fieldvalue     = *[ uchar |"?" | ":" | "@" | "&" ]

 

; Miscellaneous definitions

 

lowalpha       ="a" | "b" | "c" | "d" | "e" |"f" | "g" | "h" |

                "i" | "j" | "k" | "l" |"m" | "n" | "o" | "p" |

                "q" | "r" | "s" | "t" |"u" | "v" | "w" | "x" |

                "y" | "z"

hialpha        ="A" | "B" | "C" | "D" | "E" |"F" | "G" | "H" | "I" |

                "J" | "K" | "L" | "M" |"N" | "O" | "P" | "Q" | "R" |

                "S" | "T" | "U" | "V" |"W" | "X" | "Y" | "Z"

 

Berners-Lee, Masinter & McCahill                               [Page 19]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

alpha          = lowalpha |hialpha

digit          ="0" | "1" | "2" | "3" | "4" |"5" | "6" | "7" |

                "8" | "9"

safe           ="$" | "-" | "_" | "." | "+"

extra          ="!" | "*" | "'" | "(" | ")" |","

national       ="{" | "}" | "|" | "\" | "^" |"~" | "[" | "]" | "`"

punctuation    ="<" | ">" | "#" | "%" |<">

reserved       =";" | "/" | "?" | ":" | "@" |"&" | "="

hex            = digit |"A" | "B" | "C" | "D" | "E" |"F" |

                "a" | "b" | "c" | "d" |"e" | "f"

escape         ="%" hex hex

unreserved     = alpha |digit | safe | extra

uchar          = unreserved| escape

xchar          = unreserved| reserved | escape

digits         = 1*digit

 

6. Security Considerations

 

   The URL scheme does notin itself pose a security threat. Users

   should beware that thereis no general guarantee that a URL which at

   one time points to agiven object continues to do so, and does not

   even at some later timepoint to a different object due to the

   movement of objects onservers.

 

   A URL-related securitythreat is that it is sometimes possible to

   construct a URL such thatan attempt to perform a harmless idempotent

   operation such as theretrieval of the object will in fact cause a

   possibly damaging remoteoperation to occur.  The unsafe URL is

   typically constructed byspecifying a port number other than that

   reserved for the networkprotocol in question.  The client

   unwittingly contacts aserver which is in fact running a different

   protocol.  The content of the URL contains instructionswhich when

   interpreted according tothis other protocol cause an unexpected

   operation. An example hasbeen the use of gopher URLs to cause a rude

   message to be sent via aSMTP server.  Caution should be used when

   using any URL whichspecifies a port number other than the default

   for the protocol,especially when it is a number within the reserved

   space.

   Care should be taken whenURLs contain embedded encoded delimiters

   for a given protocol (forexample, CR and LF characters for telnet

   protocols) that these arenot unencoded before transmission.  This

   would violate theprotocol but could be used to simulate an extra

   operation or parameter,again causing an unexpected and possible

   harmful remote operationto be performed.

Berners-Lee, Masinter & McCahill                               [Page 20]

 

RFC 1738            UniformResource Locators (URL)        December 1994

 

 

   The use of URLscontaining passwords that should be secret is clearly

   unwise.

 

7. Acknowledgements

 

   This paper builds on thebasic WWW design (RFC 1630) and much

   discussion of theseissues by many people on the network. The

   discussion wasparticularly stimulated by articles by Clifford Lynch,

   Brewster Kahle [10] andWengyik Yeong [18]. Contributions from John

   Curran, Clifford Neuman,Ed Vielmetti and later the IETF URL BOF and

   URI working group wereincorporated.

 

   Most recently, carefulreadings and comments by Dan Connolly, Ned

   Freed, Roy Fielding,Guido van Rossum, Michael Dolan, Bert Bos, John

   Kunze, Olle Jarnefors,Peter Svanberg and many others have helped

   refine this RFC.

Berners-Lee, Masinter & McCahill                               [Page 21]

 

RFC 1738            UniformResource Locators (URL)        December1994

APPENDIX: Recommendations for URLs in Context

 

   URIs, including URLs, areintended to be transmitted through

   protocols which provide acontext for their interpretation.

 

   In some cases, it will benecessary to distinguish URLs from other

   possible data structuresin a syntactic structure. In this case, is

   recommended that URLs bepreceeded with a prefix consisting of the

   characters"URL:". For example, this prefix may be used to

   distinguish URLs fromother kinds of URIs.

 

   In addition, there aremany occasions when URLs are included in other

   kinds of text; examplesinclude electronic mail, USENET news

   messages, or printed onpaper. In such cases, it is convenient to

   have a separate syntacticwrapper that delimits the URL and separates

   it from the rest of thetext, and in particular from punctuation

   marks that might bemistaken for part of the URL. For this purpose,

   is recommended that anglebrackets ("<" and ">"), along with the

   prefix "URL:",be used to delimit the boundaries of the URL. This

   wrapper does not formpart of the URL and should not be used in

   contexts in whichdelimiters are already specified.

 

   In the case where afragment/anchor identifier is associated with a

   URL (following a"#"), the identifier would be placed within the

   brackets as well.

 

   In some cases, extrawhitespace (spaces, linebreaks, tabs, etc.) may

   need to be added to breaklong URLs across lines.  The whitespace

   should be ignored whenextracting the URL.

   No whitespace should beintroduced after a hyphen ("-") character.

   Because some typesettersand printers may (erroneously) introduce a

   hyphen at the end of linewhen breaking a line, the interpreter of a

   URL containing a linebreak immediately after a hyphen should ignore

   all unencoded whitespacearound the line break, and should be aware

   that the hyphen may ormay not actually be part of the URL.

 

   Examples:

 

      Yes, Jim, I found itunder <URL:ftp://info.cern.ch/pub/www/doc;

      type=d> but you canprobably pick it up from <URL:ftp://ds.in

     ternic.net/rfc>.  Note thewarning in <URL:http://ds.internic.

     net/instructions/overview.html#WARNING>.

Berners-Lee, Masinter & McCahill                               [Page 22]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

References

 

   [1] Anklesaria, F.,McCahill, M., Lindner, P., Johnson, D.,

       Torrey, D., and B.Alberti, "The Internet Gopher Protocol

       (a distributeddocument search and retrieval protocol)",

       RFC 1436, Universityof Minnesota, March 1993.

      <URL:ftp://ds.internic.net/rfc/rfc1436.txt;type=a>

 

   [2] Anklesaria, F.,Lindner, P., McCahill, M., Torrey, D.,

       Johnson, D., and B.Alberti, "Gopher+: Upward compatible

       enhancements to theInternet Gopher protocol",

       University ofMinnesota, July 1993.

      <URL:ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol

      /Gopher+/Gopher+.txt>

 

   [3] Berners-Lee, T.,"Universal Resource Identifiers in WWW: A

       Unifying Syntax forthe Expression of Names and Addresses of

       Objects on theNetwork as used in the World-Wide Web", RFC

       1630, CERN, June1994.

      <URL:ftp://ds.internic.net/rfc/rfc1630.txt>

 

   [4] Berners-Lee, T.,"Hypertext Transfer Protocol (HTTP)",

       CERN, November 1993.

      <URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z>

 

   [5] Braden, R., Editor,"Requirements for Internet Hosts --

       Application andSupport", STD 3, RFC 1123, IETF, October 1989.

      <URL:ftp://ds.internic.net/rfc/rfc1123.txt>

 

   [6] Crocker, D."Standard for the Format of ARPA Internet Text

       Messages", STD11, RFC 822, UDEL, April 1982.

      <URL:ftp://ds.internic.net/rfc/rfc822.txt>

 

   [7] Davis, F., Kahle, B., Morris, H., Salem,J., Shen, T., Wang, R.,

       Sui, J., and M.Grinbaum, "WAIS Interface Protocol Prototype

       FunctionalSpecification", (v1.5), Thinking Machines

       Corporation, April1990.

      <URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>

 

   [8] Horton, M. and R.Adams, "Standard For Interchange of USENET

       Messages", RFC1036, AT&T Bell Laboratories, Center for Seismic

       Studies, December1987.

      <URL:ftp://ds.internic.net/rfc/rfc1036.txt>

 

   [9] Huitema, C.,"Naming: Strategies and Techniques", Computer

       Networks and ISDNSystems 23 (1991) 107-110.

Berners-Lee, Masinter & McCahill                               [Page 23]

 

RFC 1738            UniformResource Locators (URL)        December 1994

 

  [10] Kahle, B.,"Document Identifiers, or International Standard

       Book Numbers for theElectronic Age", 1991.

      <URL:ftp://quake.think.com/pub/wais/doc/doc-ids.txt>

 

  [11] Kantor, B. and P.Lapsley, "Network News Transfer Protocol:

       A Proposed Standardfor the Stream-Based Transmission of News",

       RFC 977, UC San Diego& UC Berkeley, February 1986.

      <URL:ftp://ds.internic.net/rfc/rfc977.txt>

 

  [12] Kunze, J.,"Functional Requirements for Internet Resource

       Locators", Workin Progress, December 1994.

      <URL:ftp://ds.internic.net/internet-drafts

      /draft-ietf-uri-irl-fun-req-02.txt>

 

  [13] Mockapetris, P.,"Domain Names - Concepts and Facilities",

       STD 13, RFC 1034,USC/Information Sciences Institute,

       November 1987.

      <URL:ftp://ds.internic.net/rfc/rfc1034.txt>

 

  [14] Neuman, B., and S.Augart, "The Prospero Protocol",

       USC/InformationSciences Institute, June 1993.

      <URL:ftp://prospero.isi.edu/pub/prospero/doc

      /prospero-protocol.PS.Z>

 

  [15] Postel, J. and J.Reynolds, "File Transfer Protocol (FTP)",

       STD 9, RFC 959,USC/Information Sciences Institute,

       October 1985.

       <URL:ftp://ds.internic.net/rfc/rfc959.txt>

 

  [16] Sollins, K. and L.Masinter, "Functional Requirements for

       Uniform ResourceNames", RFC 1737, MIT/LCS, Xerox Corporation,

       December 1994.

      <URL:ftp://ds.internic.net/rfc/rfc1737.txt>

 

  [17] St. Pierre, M,Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,

       Kunze, J., Morris,H., and F. Schiettecatte, "WAIS over

       Z39.50-1988",RFC 1625, WAIS, Inc., CNIDR, Thinking Machines

       Corp., UC Berkeley,FS Consulting, June 1994.

       <URL:ftp://ds.internic.net/rfc/rfc1625.txt>

 

  [18] Yeong, W."Towards Networked Information Retrieval", Technical

       report 91-06-25-01,Performance Systems International, Inc.

      <URL:ftp://uu.psi.com/wp/nir.txt>, June 1991.

 

  [19] Yeong, W.,"Representing Public Archives in the Directory",

       Work in Progress,November 1991.

Berners-Lee, Masinter & McCahill                               [Page 24]

 

RFC 1738            UniformResource Locators (URL)        December1994

 

  [20] "Coded CharacterSet -- 7-bit American Standard Code for

       InformationInterchange", ANSI X3.4-1986.

 

Editors' Addresses

 

Tim Berners-Lee

World-Wide Web project

CERN,

1211 Geneva 23,

Switzerland

 

Phone: +41 (22)767 3755

Fax: +41 (22)767 7155

EMail: timbl@info.cern.ch

 

Larry Masinter

Xerox PARC

3333 Coyote Hill Road

Palo Alto, CA 94034

Phone: (415) 812-4365

Fax: (415) 812-4333

EMail: masinter@parc.xerox.com

Mark McCahill

Computer and Information Services,

University of Minnesota

Room 152 Shepherd Labs

100 Union Street SE

Minneapolis, MN 55455

 

Phone: (612) 625 1300

EMail: mpm@boombox.micro.umn.edu

Berners-Lee, Masinter & McCahill                               [Page 25]

posted @ 2010-11-04 17:54  newdefence  阅读(1058)  评论(0编辑  收藏  举报