RFC1738

原文地址： http://www.ietf.org/rfc/rfc1738.txt

Network Working Group T.Berners-Lee

Request for Comments: 1738 CERN

Category: Standards Track L. Masinter

Xerox Corporation

M. McCahill

University of Minnesota

Editors

December 1994

Uniform Resource Locators (URL)

Status of this Memo

This document specifiesan Internet standards track protocol for the

Internet community, andrequests discussion and suggestions for

improvements. Please refer to the current edition of the"Internet

Official ProtocolStandards" (STD 1) for the standardization state

and status of thisprotocol. Distribution of this memo isunlimited.

Abstract

This document specifies aUniform Resource Locator (URL), the syntax

and semantics offormalized information for location and access of

resources via theInternet.

1. Introduction

This document describesthe syntax and semantics for a compact string

representation for aresource available via the Internet. These

strings are called"Uniform Resource Locators" (URLs).

The specification isderived from concepts introduced by the World-

Wide Web globalinformation initiative, whose use of such objects

dates from 1990 and isdescribed in "Universal Resource Identifiers

in WWW", RFC 1630.The specification of URLs is designed to meet the

requirements laid out in"Functional Requirements for Internet

Resource Locators"[12].

This document was writtenby the URI working group of the Internet

Engineering TaskForce. Comments may be addressed to theeditors, or

to the URI-WG<uri@bunyip.com>. Discussions of the group are archived

at<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>

Berners-Lee, Masinter & McCahill [Page 1]

RFC 1738 UniformResource Locators (URL) December1994

2. General URL Syntax

Just as there are manydifferent methods of access to resources,

there are several schemesfor describing the location of such

resources.

The generic syntax forURLs provides a framework for new schemes to

be established usingprotocols other than those defined in this

document.

URLs are used to `locate'resources, by providing an abstract

identification of theresource location. Having located aresource,

a system may perform avariety of operations on the resource, as

might be characterized bysuch words as `access', `update',

`replace', `find attributes'.In general, only the `access' method

needs to be specified forany URL scheme.

2.1. The main parts of URLs

A full BNF description ofthe URL syntax is given in Section 5.

In general, URLs arewritten as follows:

A URL contains the nameof the scheme being used (<scheme>) followed

by a colon and then astring (the <scheme-specific-part>) whose

interpretation depends onthe scheme.

Scheme names consist of asequence of characters. The lower case

letters"a"--"z", digits, and the characters plus ("+"),period

("."), andhyphen ("-") are allowed. For resiliency, programs

interpreting URLs shouldtreat upper case letters as equivalent to

lower case in schemenames (e.g., allow "HTTP" as well as "http").

2.2. URL Character Encoding Issues

URLs are sequences ofcharacters, i.e., letters, digits, and special

characters. A URLs may berepresented in a variety of ways: e.g., ink

on paper, or a sequenceof octets in a coded character set. The

interpretation of a URLdepends only on the identity of the

characters used.

In most URL schemes, thesequences of characters in different parts

of a URL are used torepresent sequences of octets used in Internet

protocols. For example,in the ftp scheme, the host name, directory

name and file names aresuch sequences of octets, represented by

parts of the URL. Within those parts, an octet may berepresented by

Berners-Lee, Masinter & McCahill [Page 2]

RFC 1738 UniformResource Locators (URL) December1994

the chararacter which hasthat octet as its code within the US-ASCII

[20] coded character set.

In addition, octets maybe encoded by a character triplet consisting

of the character"%" followed by the two hexadecimal digits (from

"0123456789ABCDEF") which forming the hexadecimal value of theoctet.

(The characters"abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded ifthey have no corresponding graphic

character within theUS-ASCII coded character set, if the use of the

corresponding characteris unsafe, or if the corresponding character

is reserved for someother interpretation within the particular URL

scheme.

No corresponding graphicUS-ASCII:

URLs are written onlywith the graphic printable characters of the

US-ASCII coded characterset. The octets 80-FF hexadecimal are not

used in US-ASCII, and theoctets 00-1F and 7F hexadecimal represent

control characters; thesemust be encoded.

Unsafe:

Characters can be unsafefor a number of reasons. The space

character is unsafebecause significant spaces may disappear and

insignificant spaces maybe introduced when URLs are transcribed or

typeset or subjected tothe treatment of word-processing programs.

The characters"<" and ">" are unsafe because they are used as the

delimiters around URLs infree text; the quote mark (""") is used to

delimit URLs in somesystems. The character "#" isunsafe and should

always be encoded becauseit is used in World Wide Web and in other

systems to delimit a URLfrom a fragment/anchor identifier that might

follow it. The character "%" is unsafe becauseit is used for

encodings of othercharacters. Other characters are unsafebecause

gateways and othertransport agents are known to sometimes modify

such characters. Thesecharacters are "{", "}", "|", "\","^", "~",

"[","]", and "`".

All unsafe charactersmust always be encoded within a URL. For

example, the character"#" must be encoded within URLs even in

systems that do notnormally deal with fragment or anchor

identifiers, so that ifthe URL is copied into another system that

does use them, it willnot be necessary to change the URL encoding.

Berners-Lee, Masinter & McCahill [Page 3]

RFC 1738 UniformResource Locators (URL) December1994

Reserved:

Many URL schemes reservecertain characters for a special meaning:

their appearance in thescheme-specific part of the URL has a

designated semantics. Ifthe character corresponding to an octet is

reserved in a scheme, theoctet must be encoded. The characters";",

"/","?", ":", "@", "=" and"&" are the characters which may be

reserved for specialmeaning within a scheme. No other characters may

be reserved within ascheme.

Usually a URL has thesame interpretation when an octet is

represented by acharacter and when it encoded. However, this is not

true for reservedcharacters: encoding a character reserved for a

particular scheme maychange the semantics of a URL.

Thus, only alphanumerics,the special characters "$-_.+!*'(),", and

reserved characters usedfor their reserved purposes may be used

unencoded within a URL.

On the other hand,characters that are not required to be encoded

(including alphanumerics)may be encoded within the scheme-specific

part of a URL, as long asthey are not being used for a reserved

purpose.

2.3 Hierarchical schemes and relative links

In some cases, URLs areused to locate resources that contain

pointers to otherresources. In some cases, those pointers are

represented as relativelinks where the expression of the location of

the second resource is interms of "in the same place as this one

except with the followingrelative path". Relative links are not

described in thisdocument. However, the use of relative links

depends on the originalURL containing a hierarchical structure

against which therelative link is based.

Some URL schemes (such asthe ftp, http, and file schemes) contain

names that can beconsidered hierarchical; the components of the

hierarchy are separatedby "/".

Berners-Lee, Masinter & McCahill [Page 4]

RFC 1738 UniformResource Locators (URL) December 1994

3. Specific Schemes

The mapping for someexisting standard and experimental protocols is

outlined in the BNFsyntax definition. Notes on particularprotocols

follow. The schemescovered are:

ftp File Transfer protocol

http Hypertext Transfer Protocol

gopher The Gopher protocol

mailto Electronic mail address

news USENET news

nntp USENET news using NNTPaccess

telnet Reference to interactivesessions

wais Wide Area InformationServers

file Host-specific file names

prospero Prospero Directory Service

Other schemes may bespecified by future specifications. Section 4 of

this document describeshow new schemes may be registered, and lists

some scheme names thatare under development.

3.1. Common Internet Scheme Syntax

While the syntax for therest of the URL may vary depending on the

particular schemeselected, URL schemes that involve the direct use

of an IP-based protocolto a specified host on the Internet use a

common syntax for thescheme-specific data:

//<user>:<password>@<host>:<port>/<url-path>

Some or all of the parts"<user>:<password>@", ":<password>",

":<port>", and "/<url-path>" may beexcluded. The scheme specific

data start with a doubleslash "//" to indicate that it complies with

the common Internetscheme syntax. The different components obey the

following rules:

user

An optional username. Some schemes (e.g., ftp) allow the

specification of auser name.

password

An optionalpassword. If present, it follows the user

name separated fromit by a colon.

The user name (andpassword), if present, are followed by a

commercial at-sign"@". Within the user and password field, any ":",

"@", or"/" must be encoded.

Berners-Lee, Masinter & McCahill [Page 5]

RFC 1738 UniformResource Locators (URL) December1994

Note that an empty username or password is different than no user

name or password; thereis no way to specify a password without

specifying a user name.E.g., <URL:ftp://@host.com/> has an empty

user name and nopassword, <URL:ftp://host.com/> has no user name,

while<URL:ftp://foo:@host.com/> has a user name of "foo" and an

empty password.

host

The fully qualifieddomain name of a network host, or its IP

address as a set offour decimal digit groups separated by

".". Fullyqualified domain names take the form as described

in Section 3.5 of RFC 1034 [13] and Section2.1 of RFC 1123

[5]: a sequence ofdomain labels separated by ".", each domain

label starting andending with an alphanumerical character and

possibly alsocontaining "-" characters. The rightmost domain

label will neverstart with a digit, though, which

syntacticallydistinguishes all domain names from the IP

addresses.

port

The port number toconnect to. Most schemes designate

protocols that havea default port number. Another port number

may optionally besupplied, in decimal, separated from the

host by a colon. Ifthe port is omitted, the colon is as well.

url-path

The rest of thelocator consists of data specific to the

scheme, and is knownas the "url-path". It supplies the

details of how thespecified resource can be accessed. Note

that the"/" between the host (or port) and the url-path is

NOT part of theurl-path.

The url-path syntaxdepends on the scheme being used, as does the

manner in which it isinterpreted.

3.2. FTP

The FTP URL scheme isused to designate files and directories on

Internet hosts accessibleusing the FTP protocol (RFC959).

A FTP URL follow thesyntax described in Section 3.1. If:<port> is

omitted, the portdefaults to 21.

Berners-Lee, Masinter & McCahill [Page 6]

RFC 1738 UniformResource Locators (URL) December1994

3.2.1. FTP Name and Password

A user name and passwordmay be supplied; they are used in the ftp

"USER" and"PASS" commands after first making the connection to the

FTP server. If no user name or password is supplied andone is

requested by the FTPserver, the conventions for "anonymous" FTP are

to be used, as follows:

The user name"anonymous" is supplied.

The password issupplied as the Internet e-mail address

of the end useraccessing the resource.

If the URL supplies auser name but no password, and the remote

server requests apassword, the program interpreting the FTP URL

should request one fromthe user.

3.2.2. FTP url-path

The url-path of a FTP URLhas the following syntax:

Where <cwd1>through <cwdN> and <name> are (possibly encoded) strings

and <typecode> isone of the characters "a", "i", or "d". The part

";type=<typecode>"may be omitted. The <cwdx> and <name> parts may be

empty. The whole url-pathmay be omitted, including the "/"

delimiting it from theprefix containing user, password, host, and

port.

The url-path isinterpreted as a series of FTP commands as follows:

Each of the<cwd> elements is to be supplied, sequentially, as the

argument to a CWD(change working directory) command.

If the typecode is"d", perform a NLST (name list) command with

<name> as theargument, and interpret the results as a file

directory listing.

Otherwise, perform aTYPE command with <typecode> as the argument,

and then access thefile whose name is <name> (for example, using

the RETR command.)

Within a name or CWDcomponent, the characters "/" and ";" are

reserved and must beencoded. The components are decoded prior to

their use in the FTPprotocol. In particular, if theappropriate FTP

sequence to access a particularfile requires supplying a string

containing a"/" as an argument to a CWD or RETR command, it is

Berners-Lee, Masinter & McCahill [Page 7]

RFC 1738 UniformResource Locators (URL) December 1994

necessary to encode each"/".

For example, the URL<URL:ftp://myname@host.dom/%2Fetc/motd> is

interpreted by FTP-ing to"host.dom", logging in as "myname"

(prompting for a passwordif it is asked for), and then executing

"CWD /etc" andthen "RETR motd". This has a different meaning from

<URL:ftp://myname@host.dom/etc/motd> which would "CWDetc" and then

"RETR motd";the initial "CWD" might be executed relative to the

default directory for"myname". On the other hand,

<URL:ftp://myname@host.dom//etc/motd>, would "CWD " witha null

argument, then "CWDetc", and then "RETR motd".

FTP URLs may also be usedfor other operations; for example, it is

possible to update a fileon a remote file server, or infer

information about it fromthe directory listings. The mechanism for

doing so is not spelledout here.

3.2.3. FTP Typecode is Optional

The entire;type=<typecode> part of a FTP URL is optional. If it is

omitted, the clientprogram interpreting the URL must guess the

appropriate mode to use.In general, the data content type of a file

can only be guessed fromthe name, e.g., from the suffix of the name;

the appropriate type codeto be used for transfer of the file can

then be deduced from thedata content of the file.

3.2.4 Hierarchy

For some file systems,the "/" used to denote the hierarchical

structure of the URLcorresponds to the delimiter used to construct a

file name hierarchy, andthus, the filename will look similar to the

URL path. This does NOTmean that the URL is a Unix filename.

3.2.5. Optimization

Clients accessingresources via FTP may employ additional heuristics

to optimize the interaction.For some FTP servers, for example, it

may be reasonable to keepthe control connection open while accessing

multiple URLs from thesame server. However, there is no common

hierarchical model to theFTP protocol, so if a directory change

command has been given,it is impossible in general to deduce what

sequence should be givento navigate to another directory for a

second retrieval, if thepaths are different. The only reliable

algorithm is todisconnect and reestablish the control connection.

Berners-Lee, Masinter & McCahill [Page 8]

RFC 1738 UniformResource Locators (URL) December1994

3.3. HTTP

The HTTP URL scheme isused to designate Internet resources

accessible using HTTP(HyperText Transfer Protocol).

The HTTP protocol isspecified elsewhere. This specification only

describes the syntax ofHTTP URLs.

An HTTP URL takes theform:

http://<host>:<port>/<path>?<searchpart>

where <host> and<port> are as described in Section 3.1. If :<port>

is omitted, the portdefaults to 80. No user name or passwordis

allowed. <path> is an HTTP selector, and<searchpart> is a query

string. The <path>is optional, as is the <searchpart> and its

preceding "?".If neither <path> nor <searchpart> is present, the "/"

may also be omitted.

Within the <path>and <searchpart> components, "/", ";", "?"are

reserved. The "/" character may be usedwithin HTTP to designate a

hierarchical structure.

3.4. GOPHER

The Gopher URL scheme isused to designate Internet resources

accessible using theGopher protocol.

The base Gopher protocolis described in RFC 1436 and supports items

and collections of items(directories). The Gopher+ protocol is a set

of upward compatibleextensions to the base Gopher protocol and is

described in [2]. Gopher+supports associating arbitrary sets of

attributes and alternatedata representations with Gopher items.

Gopher URLs accommodateboth Gopher and Gopher+ items and item

attributes.

3.4.1. Gopher URL syntax

A Gopher URL takes theform:

gopher://<host>:<port>/<gopher-path>

where <gopher-path>is one of

Berners-Lee, Masinter & McCahill [Page 9]

RFC 1738 UniformResource Locators (URL) December1994

If :<port> isomitted, the port defaults to 70. <gophertype> is a

single-character field todenote the Gopher type of the resource to

which the URL refers. Theentire <gopher-path> may also be empty, in

which case the delimiting"/" is also optional and the <gophertype>

defaults to"1".

<selector> is theGopher selector string. In the Gopherprotocol,

Gopher selector stringsare a sequence of octets which may contain

any octets except 09hexadecimal (US-ASCII HT or tab) 0A hexadecimal

(US-ASCII character LF),and 0D (US-ASCII character CR).

Gopher clients specifywhich item to retrieve by sending the Gopher

selector string to aGopher server.

Within the<gopher-path>, no characters are reserved.

Note that some Gopher<selector> strings begin with a copy of the

<gophertype>character, in which case that character will occur twice

consecutively. The Gopherselector string may be an empty string;

this is how Gopherclients refer to the top-level directory on a

Gopher server.

3.4.2 Specifying URLs for Gopher Search Engines

If the URL refers to asearch to be submitted to a Gopher search

engine, the selector isfollowed by an encoded tab (%09) and the

search string. To submita search to a Gopher search engine, the

Gopher client sends the<selector> string (after decoding), a tab,

and the search string tothe Gopher server.

3.4.3 URL syntax for Gopher+ items

URLs for Gopher+ itemshave a second encoded tab (%09) and a Gopher+

string. Note that in thiscase, the %09<search> string must be

supplied, although the<search> element may be the empty string.

The<gopher+_string> is used to represent information required for

retrieval of the Gopher+item. Gopher+ items may have alternate

views, arbitrary sets ofattributes, and may have electronic forms

associated with them.

To retrieve the dataassociated with a Gopher+ URL, a client will

connect to the server andsend the Gopher selector, followed by a tab

and the search string(which may be empty), followed by a tab and the

Gopher+ commands.

Berners-Lee, Masinter & McCahill [Page 10]

RFC 1738 UniformResource Locators (URL) December1994

3.4.4 Default Gopher+ data representation

When a Gopher serverreturns a directory listing to a client, the

Gopher+ items are taggedwith either a "+" (denoting Gopher+ items)

or a "?"(denoting Gopher+ items which have a +ASK form associated

with them). A Gopher URLwith a Gopher+ string consisting of only a

"+" refers tothe default view (data representation) of the item

while a Gopher+ stringcontaining only a "?" refer to an item with a

Gopher electronic formassociated with it.

3.4.5 Gopher+ items with electronic forms

Gopher+ items which havea +ASK associated with them (i.e. Gopher+

items tagged with a"?") require the client to fetch the item's +ASK

attribute to get the formdefinition, and then ask the user to fill

out the form and returnthe user's responses along with the selector

string to retrieve theitem. Gopher+ clients know how to dothis but

depend on the"?" tag in the Gopher+ item description to know when to

handle this case. The"?" is used in the Gopher+ string to be

consistent with Gopher+protocol's use of this symbol.

3.4.6 Gopher+ item attribute collections

To refer to the Gopher+attributes of an item, the Gopher URL's

Gopher+ string consistsof "!" or "$". "!" refers to the all of a

Gopher+ item'sattributes. "$" refers to all the item attributes for

all items in a Gopherdirectory.

3.4.7 Referring to specific Gopher+ attributes

To refer to specificattributes, the URL's gopher+_string is

"!<attribute_name>" or "$<attribute_name>".For example, to refer to

the attribute containingthe abstract of an item, the gopher+_string

would be"!+ABSTRACT".

To refer to severalattributes, the gopher+_string consists of the

attribute names separatedby coded spaces. For example,

"!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELLattributes

of an item.

3.4.8 URL syntax for Gopher+ alternate views

Gopher+ allows foroptional alternate data representations (alternate

views) of items. Toretrieve a Gopher+ alternate view, a Gopher+

client sends theappropriate view and language identifier (found in

the item's +VIEWattribute). To refer to a specific Gopher+ alternate

view, the URL's Gopher+string would be in the form:

Berners-Lee, Masinter & McCahill [Page 11]

RFC 1738 UniformResource Locators (URL) December1994

+<view_name>%20<language_name>

For example, a Gopher+string of "+application/postscript%20Es_ES"

refers to the Spanishlanguage postscript alternate view of a Gopher+

item.

3.4.9 URL syntax for Gopher+ electronic forms

The gopher+_string for aURL that refers to an item referenced by a

Gopher+ electronic form(an ASK block) filled out with specific

values is a coded versionof what the client sends to the server.

The gopher+_string is ofthe form:

+%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A

To retrieve this item,the Gopher client sends:

<a_gopher_selector><tab>+<tab>1<cr><lf>

+-1<cr><lf>

<ask_item1_value><cr><lf>

<ask_item2_value><cr><lf>

.<cr><lf>

to the Gopher server.

3.5. MAILTO

The mailto URL scheme isused to designate the Internet mailing

address of an individualor service. No additional information other

than an Internet mailingaddress is present or implied.

A mailto URL takes theform:

mailto:<rfc822-addr-spec>

where <rfc822-addr-spec>is (the encoding of an) addr-spec, as

specified in RFC 822 [6].Within mailto URLs, there are no reserved

characters.

Note that the percentsign ("%") is commonly used within RFC 822

addresses and must beencoded.

Unlike many URLs, themailto scheme does not represent a data object

to be accessed directly;there is no sense in which it designates an

object. It has adifferent use than the message/external-body type in

MIME.

Berners-Lee, Masinter & McCahill [Page 12]

RFC 1738 UniformResource Locators (URL) December1994

3.6. NEWS

The news URL scheme isused to refer to either news groups or

individual articles ofUSENET news, as specified in RFC 1036.

A news URL takes one oftwo forms:

news:<newsgroup-name>

news:<message-id>

A <newsgroup-name>is a period-delimited hierarchical name, such as

"comp.infosystems.www.misc". A <message-id> correspondsto the

Message-ID of section2.1.5 of RFC 1036, without the enclosing "<"

and ">"; ittakes the form <unique>@<full_domain_name>. A message

identifier may bedistinguished from a news group name by the

presence of thecommercial at "@" character. No additional characters

are reserved within thecomponents of a news URL.

If <newsgroup-name>is "*" (as in <URL:news:*>), it is used to refer

to "all availablenews groups".

The news URLs are unusualin that by themselves, they do not contain

sufficient information tolocate a single resource, but, rather, are

location-independent.

3.7. NNTP

The nntp URL scheme is analternative method of referencing news

articles, useful forspecifying news articles from NNTP servers (RFC

977).

A nntp URL take the form:

nntp://<host>:<port>/<newsgroup-name>/<article-number>

where <host> and<port> are as described in Section 3.1. If :<port>

is omitted, the portdefaults to 119.

The<newsgroup-name> is the name of the group, while the <article-

number> is the numericid of the article within that newsgroup.

Note that while nntp:URLs specify a unique location for the article

resource, most NNTPservers currently on the Internet today are

configured only to allowaccess from local clients, and thus nntp

URLs do not designateglobally accessible resources. Thus, the news:

form of URL is preferredas a way of identifying news articles.

Berners-Lee, Masinter & McCahill [Page 13]

RFC 1738 UniformResource Locators (URL) December1994

3.8. TELNET

The Telnet URL scheme isused to designate interactive services that

may be accessed by theTelnet protocol.

A telnet URL takes theform:

telnet://<user>:<password>@<host>:<port>/

as specified in Section3.1. The final "/" character may be omitted.

If :<port> isomitted, the port defaults to 23. The:<password> can

be omitted, as well asthe whole <user>:<password> part.

This URL does notdesignate a data object, but rather an interactive

service. Remoteinteractive services vary widely in the means by

which they allow remotelogins; in practice, the <user> and

<password> suppliedare advisory only: clients accessing a telnet URL

merely advise the user ofthe suggested username and password.

3.9. WAIS

The WAIS URL scheme isused to designate WAIS databases, searches, or

individual documentsavailable from a WAIS database. WAIS is

described in [7]. TheWAIS protocol is described in RFC 1625 [17];

Although the WAISprotocol is based on Z39.50-1988, the WAIS URL

scheme is not intendedfor use with arbitrary Z39.50 services.

A WAIS URL takes one ofthe following forms:

wais://<host>:<port>/<database>

wais://<host>:<port>/<database>?<search>

wais://<host>:<port>/<database>/<wtype>/<wpath>

where <host> and<port> are as described in Section 3.1. If :<port>

is omitted, the portdefaults to 210. The first formdesignates a

WAIS database that isavailable for searching. The second form

designates a particularsearch. <database> is the name ofthe WAIS

database being queried.

The third form designatesa particular document within a WAIS

database to be retrieved.In this form <wtype> is the WAIS

designation of the typeof the object. Many WAIS implementations

require that a clientknow the "type" of an object prior to

retrieval, the type beingreturned along with the internal object

identifier in the searchresponse. The <wtype> is includedin the

URL in order to allow theclient interpreting the URL adequate

information to actuallyretrieve the document.

Berners-Lee, Masinter & McCahill [Page 14]

RFC 1738 UniformResource Locators (URL) December1994

The <wpath> of aWAIS URL consists of the WAIS document-id, encoded

as necessary using themethod described in Section 2.2. The WAIS

document-id should betreated opaquely; it may only be decomposed by

the server that issuedit.

3.10 FILES

The file URL scheme isused to designate files accessible on a

particular host computer.This scheme, unlike most other URL schemes,

does not designate aresource that is universally accessible over the

Internet.

A file URL takes theform:

file://<host>/<path>

where <host> is thefully qualified domain name of the system on

which the <path> isaccessible, and <path> is a hierarchical

directory path of theform <directory>/<directory>/.../<name>.

For example, a VMS file

DISK$USER:[MY.NOTES]NOTE123456.TXT

might become

<URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>

As a special case,<host> can be the string "localhost" or the empty

string; this isinterpreted as `the machine from which the URL is

being interpreted'.

The file URL scheme isunusual in that it does not specify an

Internet protocol oraccess method for such files; as such, its

utility in networkprotocols between hosts is limited.

3.11 PROSPERO

The Prospero URL schemeis used to designate resources that are

accessed via the ProsperoDirectory Service. The Prospero protocol is

described elsewhere [14].

A prospero URLs takes theform:

prospero://<host>:<port>/<hsoname>;<field>=<value>

where <host> and<port> are as described in Section 3.1. If :<port>

is omitted, the portdefaults to 1525. No username or password is

Berners-Lee, Masinter & McCahill [Page 15]

RFC 1738 UniformResource Locators (URL) December1994

allowed.

The <hsoname> isthe host-specific object name in the Prospero

protocol, suitablyencoded. This name is opaque andinterpreted by

the Prospero server. The semicolon ";" is reserved andmay not

appear without quoting inthe <hsoname>.

Prospero URLs areinterpreted by contacting a Prospero directory

server on the specifiedhost and port to determine appropriate access

methods for a resource,which might themselves be represented as

different URLs. ExternalProspero links are represented as URLs of

the underlying accessmethod and are not represented as Prospero

URLs.

Note that a slash"/" may appear in the <hsoname> without quoting and

no significance may beassumed by the application. Thoughslashes

may indicate hierarchicalstructure on the server, such structure is

not guaranteed. Note thatmany <hsoname>s begin with a slash, in

which case the host orport will be followed by a double slash: the

slash from the URLsyntax, followed by the initial slash from the

<hsoname>. (E.g.,<URL:prospero://host.dom//pros/name> designates a

<hsoname> of"/pros/name".)

In addition, after the<hsoname>, optional fields and values

associated with aProspero link may be specified as part of the URL.

When present, eachfield/value pair is separated from each other and

from the rest of the URLby a ";" (semicolon). The nameof the field

and its value areseparated by a "=" (equal sign). If present, these

fields serve to identifythe target of the URL. For example, the

OBJECT-VERSION field canbe specified to identify a specific version

of an object.

4. REGISTRATION OF NEW SCHEMES

A new scheme may beintroduced by defining a mapping onto a

conforming URL syntax,using a new prefix. URLs for experimental

schemes may be used bymutual agreement between parties. Scheme names

starting with thecharacters "x-" are reserved for experimental

purposes.

The Internet AssignedNumbers Authority (IANA) will maintain a

registry of URL schemes.Any submission of a new URL scheme must

include a definition ofan algorithm for accessing of resources

within that scheme andthe syntax for representing such a scheme.

URL schemes must havedemonstrable utility and operability. One way

to provide such ademonstration is via a gateway which provides

objects in the new schemefor clients using an existing protocol. If

Berners-Lee, Masinter & McCahill [Page 16]

RFC 1738 UniformResource Locators (URL) December1994

the new scheme does notlocate resources that are data objects, the

properties of names inthe new space must be clearly defined.

New schemes should try tofollow the same syntactic conventions of

existing schemes, whereappropriate. It is likewise recommended

that, where a protocolallows for retrieval by URL, that the client

software have provisionfor being configured to use specific gateway

locators for indirectaccess through new naming schemes.

The following scheme havebeen proposed at various times, but this

document does not definetheir syntax or use at this time. It is

suggested that IANAreserve their scheme names for future definition:

afs Andrew File System global filenames.

mid Message identifiers forelectronic mail.

cid Content identifiers for MIME bodyparts.

nfs Network File System (NFS) filenames.

tn3270 Interactive 3270 emulation sessions.

mailserver Access to data available from mailservers.

z39.50 Access to ANSI Z39.50 services.

5. BNF for specific URL schemes

This is a BNF-likedescription of the Uniform Resource Locator

syntax, using theconventions of RFC822, except that "|" is used to

designate alternatives,and brackets [] are used around optional or

repeated elements.Briefly, literals are quoted with "", optional

elements are enclosed in[brackets], and elements may be preceded

with <n>* todesignate n or more repetitions of the following

element; n defaults to 0.

; The generic form of a URL is:

genericurl = scheme":" schemepart

; Specific predefined schemes are defined here; new schemes

; may be registered with IANA

url = httpurl |ftpurl | newsurl |

nntpurl |telneturl | gopherurl |

waisurl |mailtourl | fileurl |

prosperourl| otherurl

; new schemes follow the general syntax

otherurl = genericurl

; the scheme is in lower case; interpreters should use case-ignore

scheme = 1*[lowalpha | digit | "+" | "-" | "." ]

Berners-Lee, Masinter & McCahill [Page 17]

RFC 1738 UniformResource Locators (URL) December1994

schemepart = *xchar |ip-schemepart

; URL schemeparts for ip based protocols:

ip-schemepart ="//" login [ "/" urlpath ]

hostport = host [":" port ]

host = hostname |hostnumber

hostname = *[domainlabel "." ] toplabel

domainlabel = alphadigit| alphadigit *[ alphadigit | "-" ] alphadigit

toplabel = alpha |alpha *[ alphadigit | "-" ] alphadigit

alphadigit = alpha |digit

hostnumber = digits"." digits "." digits "." digits

port = digits

user = *[ uchar |";" | "?" | "&" | "=" ]

password = *[ uchar |";" | "?" | "&" | "=" ]

urlpath = *xchar ; depends on protocol see section 3.1

; The predefined schemes:

; FTP (see also RFC959)

ftpurl ="ftp://" login [ "/" fpath [ ";type=" ftptype ]]

fpath = fsegment *["/" fsegment ]

fsegment = *[ uchar |"?" | ":" | "@" | "&" |"=" ]

ftptype ="A" | "I" | "D" | "a" | "i" |"d"

; FILE

fileurl ="file://" [ host | "localhost" ] "/" fpath

; HTTP

httpurl ="http://" hostport [ "/" hpath [ "?" search ]]

hpath = hsegment *["/" hsegment ]

hsegment = *[ uchar |";" | ":" | "@" | "&" |"=" ]

search = *[ uchar |";" | ":" | "@" | "&" |"=" ]

; GOPHER (see also RFC1436)

gopherurl ="gopher://" hostport [ / [ gtype [ selector

["%09" search [ "%09" gopher+_string ] ] ] ] ]

gtype = xchar

selector = *xchar

gopher+_string = *xchar

Berners-Lee, Masinter & McCahill [Page 18]

RFC 1738 UniformResource Locators (URL) December1994

; MAILTO (see also RFC822)

mailtourl ="mailto:" encoded822addr

encoded822addr = 1*xchar ; further defined in RFC822

; NEWS (see also RFC1036)

newsurl ="news:" grouppart

grouppart ="*" | group | article

group = alpha *[alpha | digit | "-" | "." | "+" | "_" ]

article = 1*[ uchar |";" | "/" | "?" | ":" |"&" | "=" ] "@" host

; NNTP (see also RFC977)

nntpurl ="nntp://" hostport "/" group [ "/" digits ]

; TELNET

telneturl = "telnet://"login [ "/" ]

; WAIS (see also RFC1625)

waisurl =waisdatabase | waisindex | waisdoc

waisdatabase ="wais://" hostport "/" database

waisindex ="wais://" hostport "/" database "?" search

waisdoc ="wais://" hostport "/" database "/" wtype"/" wpath

database = *uchar

wtype = *uchar

wpath = *uchar

; PROSPERO

prosperourl ="prospero://" hostport "/" ppath *[ fieldspec ]

ppath = psegment *["/" psegment ]

psegment = *[ uchar |"?" | ":" | "@" | "&" |"=" ]

fieldspec =";" fieldname "=" fieldvalue

fieldname = *[ uchar |"?" | ":" | "@" | "&" ]

fieldvalue = *[ uchar |"?" | ":" | "@" | "&" ]

; Miscellaneous definitions

lowalpha ="a" | "b" | "c" | "d" | "e" |"f" | "g" | "h" |

"i" | "j" | "k" | "l" |"m" | "n" | "o" | "p" |

"q" | "r" | "s" | "t" |"u" | "v" | "w" | "x" |

"y" | "z"

hialpha ="A" | "B" | "C" | "D" | "E" |"F" | "G" | "H" | "I" |

"J" | "K" | "L" | "M" |"N" | "O" | "P" | "Q" | "R" |

"S" | "T" | "U" | "V" |"W" | "X" | "Y" | "Z"

Berners-Lee, Masinter & McCahill [Page 19]

RFC 1738 UniformResource Locators (URL) December1994

alpha = lowalpha |hialpha

digit ="0" | "1" | "2" | "3" | "4" |"5" | "6" | "7" |

"8" | "9"

safe ="$" | "-" | "_" | "." | "+"

extra ="!" | "*" | "'" | "(" | ")" |","

national ="{" | "}" | "|" | "\" | "^" |"~" | "[" | "]" | "`"

punctuation ="<" | ">" | "#" | "%" |<">

reserved =";" | "/" | "?" | ":" | "@" |"&" | "="

hex = digit |"A" | "B" | "C" | "D" | "E" |"F" |

"a" | "b" | "c" | "d" |"e" | "f"

escape ="%" hex hex

unreserved = alpha |digit | safe | extra

uchar = unreserved| escape

xchar = unreserved| reserved | escape

digits = 1*digit

6. Security Considerations

The URL scheme does notin itself pose a security threat. Users

should beware that thereis no general guarantee that a URL which at

one time points to agiven object continues to do so, and does not

even at some later timepoint to a different object due to the

movement of objects onservers.

A URL-related securitythreat is that it is sometimes possible to

construct a URL such thatan attempt to perform a harmless idempotent

operation such as theretrieval of the object will in fact cause a

possibly damaging remoteoperation to occur. The unsafe URL is

typically constructed byspecifying a port number other than that

reserved for the networkprotocol in question. The client

unwittingly contacts aserver which is in fact running a different

protocol. The content of the URL contains instructionswhich when

interpreted according tothis other protocol cause an unexpected

operation. An example hasbeen the use of gopher URLs to cause a rude

message to be sent via aSMTP server. Caution should be used when

using any URL whichspecifies a port number other than the default

for the protocol,especially when it is a number within the reserved

space.

Care should be taken whenURLs contain embedded encoded delimiters

for a given protocol (forexample, CR and LF characters for telnet

protocols) that these arenot unencoded before transmission. This

would violate theprotocol but could be used to simulate an extra

operation or parameter,again causing an unexpected and possible

harmful remote operationto be performed.

Berners-Lee, Masinter & McCahill [Page 20]

RFC 1738 UniformResource Locators (URL) December 1994

The use of URLscontaining passwords that should be secret is clearly

unwise.

7. Acknowledgements

This paper builds on thebasic WWW design (RFC 1630) and much

discussion of theseissues by many people on the network. The

discussion wasparticularly stimulated by articles by Clifford Lynch,

Brewster Kahle [10] andWengyik Yeong [18]. Contributions from John

Curran, Clifford Neuman,Ed Vielmetti and later the IETF URL BOF and

URI working group wereincorporated.

Most recently, carefulreadings and comments by Dan Connolly, Ned

Freed, Roy Fielding,Guido van Rossum, Michael Dolan, Bert Bos, John

Kunze, Olle Jarnefors,Peter Svanberg and many others have helped

refine this RFC.

Berners-Lee, Masinter & McCahill [Page 21]

RFC 1738 UniformResource Locators (URL) December1994

APPENDIX: Recommendations for URLs in Context

URIs, including URLs, areintended to be transmitted through

protocols which provide acontext for their interpretation.

In some cases, it will benecessary to distinguish URLs from other

possible data structuresin a syntactic structure. In this case, is

recommended that URLs bepreceeded with a prefix consisting of the

characters"URL:". For example, this prefix may be used to

distinguish URLs fromother kinds of URIs.

In addition, there aremany occasions when URLs are included in other

kinds of text; examplesinclude electronic mail, USENET news

messages, or printed onpaper. In such cases, it is convenient to

have a separate syntacticwrapper that delimits the URL and separates

it from the rest of thetext, and in particular from punctuation

marks that might bemistaken for part of the URL. For this purpose,

is recommended that anglebrackets ("<" and ">"), along with the

prefix "URL:",be used to delimit the boundaries of the URL. This

wrapper does not formpart of the URL and should not be used in

contexts in whichdelimiters are already specified.

In the case where afragment/anchor identifier is associated with a

URL (following a"#"), the identifier would be placed within the

brackets as well.

In some cases, extrawhitespace (spaces, linebreaks, tabs, etc.) may

need to be added to breaklong URLs across lines. The whitespace

should be ignored whenextracting the URL.

No whitespace should beintroduced after a hyphen ("-") character.

Because some typesettersand printers may (erroneously) introduce a

hyphen at the end of linewhen breaking a line, the interpreter of a

URL containing a linebreak immediately after a hyphen should ignore

all unencoded whitespacearound the line break, and should be aware

that the hyphen may ormay not actually be part of the URL.

Examples:

Yes, Jim, I found itunder <URL:ftp://info.cern.ch/pub/www/doc;

type=d> but you canprobably pick it up from <URL:ftp://ds.in

ternic.net/rfc>. Note thewarning in <URL:http://ds.internic.

net/instructions/overview.html#WARNING>.

Berners-Lee, Masinter & McCahill [Page 22]

RFC 1738 UniformResource Locators (URL) December1994

References

[1] Anklesaria, F.,McCahill, M., Lindner, P., Johnson, D.,

Torrey, D., and B.Alberti, "The Internet Gopher Protocol

(a distributeddocument search and retrieval protocol)",

RFC 1436, Universityof Minnesota, March 1993.

<URL:ftp://ds.internic.net/rfc/rfc1436.txt;type=a>

[2] Anklesaria, F.,Lindner, P., McCahill, M., Torrey, D.,

Johnson, D., and B.Alberti, "Gopher+: Upward compatible

enhancements to theInternet Gopher protocol",

University ofMinnesota, July 1993.

<URL:ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol

/Gopher+/Gopher+.txt>

[3] Berners-Lee, T.,"Universal Resource Identifiers in WWW: A

Unifying Syntax forthe Expression of Names and Addresses of

Objects on theNetwork as used in the World-Wide Web", RFC

1630, CERN, June1994.

<URL:ftp://ds.internic.net/rfc/rfc1630.txt>

[4] Berners-Lee, T.,"Hypertext Transfer Protocol (HTTP)",

CERN, November 1993.

<URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z>

[5] Braden, R., Editor,"Requirements for Internet Hosts --

Application andSupport", STD 3, RFC 1123, IETF, October 1989.

<URL:ftp://ds.internic.net/rfc/rfc1123.txt>

[6] Crocker, D."Standard for the Format of ARPA Internet Text

Messages", STD11, RFC 822, UDEL, April 1982.

<URL:ftp://ds.internic.net/rfc/rfc822.txt>

[7] Davis, F., Kahle, B., Morris, H., Salem,J., Shen, T., Wang, R.,

Sui, J., and M.Grinbaum, "WAIS Interface Protocol Prototype

FunctionalSpecification", (v1.5), Thinking Machines

Corporation, April1990.

<URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>

[8] Horton, M. and R.Adams, "Standard For Interchange of USENET

Messages", RFC1036, AT&T Bell Laboratories, Center for Seismic

Studies, December1987.

<URL:ftp://ds.internic.net/rfc/rfc1036.txt>

[9] Huitema, C.,"Naming: Strategies and Techniques", Computer

Networks and ISDNSystems 23 (1991) 107-110.

Berners-Lee, Masinter & McCahill [Page 23]

RFC 1738 UniformResource Locators (URL) December 1994

[10] Kahle, B.,"Document Identifiers, or International Standard

Book Numbers for theElectronic Age", 1991.

<URL:ftp://quake.think.com/pub/wais/doc/doc-ids.txt>

[11] Kantor, B. and P.Lapsley, "Network News Transfer Protocol:

A Proposed Standardfor the Stream-Based Transmission of News",

RFC 977, UC San Diego& UC Berkeley, February 1986.

<URL:ftp://ds.internic.net/rfc/rfc977.txt>

[12] Kunze, J.,"Functional Requirements for Internet Resource

Locators", Workin Progress, December 1994.

<URL:ftp://ds.internic.net/internet-drafts

/draft-ietf-uri-irl-fun-req-02.txt>

[13] Mockapetris, P.,"Domain Names - Concepts and Facilities",

STD 13, RFC 1034,USC/Information Sciences Institute,

November 1987.

<URL:ftp://ds.internic.net/rfc/rfc1034.txt>

[14] Neuman, B., and S.Augart, "The Prospero Protocol",

USC/InformationSciences Institute, June 1993.

<URL:ftp://prospero.isi.edu/pub/prospero/doc

/prospero-protocol.PS.Z>

[15] Postel, J. and J.Reynolds, "File Transfer Protocol (FTP)",

STD 9, RFC 959,USC/Information Sciences Institute,

October 1985.

<URL:ftp://ds.internic.net/rfc/rfc959.txt>

[16] Sollins, K. and L.Masinter, "Functional Requirements for

Uniform ResourceNames", RFC 1737, MIT/LCS, Xerox Corporation,

December 1994.

<URL:ftp://ds.internic.net/rfc/rfc1737.txt>

[17] St. Pierre, M,Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,

Kunze, J., Morris,H., and F. Schiettecatte, "WAIS over

Z39.50-1988",RFC 1625, WAIS, Inc., CNIDR, Thinking Machines

Corp., UC Berkeley,FS Consulting, June 1994.

<URL:ftp://ds.internic.net/rfc/rfc1625.txt>

[18] Yeong, W."Towards Networked Information Retrieval", Technical

report 91-06-25-01,Performance Systems International, Inc.

<URL:ftp://uu.psi.com/wp/nir.txt>, June 1991.

[19] Yeong, W.,"Representing Public Archives in the Directory",

Work in Progress,November 1991.

Berners-Lee, Masinter & McCahill [Page 24]

RFC 1738 UniformResource Locators (URL) December1994

[20] "Coded CharacterSet -- 7-bit American Standard Code for

InformationInterchange", ANSI X3.4-1986.

Editors' Addresses

Tim Berners-Lee

World-Wide Web project

CERN,

1211 Geneva 23,

Switzerland

Phone: +41 (22)767 3755

Fax: +41 (22)767 7155

EMail: timbl@info.cern.ch

Larry Masinter

Xerox PARC

3333 Coyote Hill Road

Palo Alto, CA 94034

Phone: (415) 812-4365

Fax: (415) 812-4333

EMail: masinter@parc.xerox.com

Mark McCahill

Computer and Information Services,

University of Minnesota

Room 152 Shepherd Labs

100 Union Street SE

Minneapolis, MN 55455

Phone: (612) 625 1300

EMail: mpm@boombox.micro.umn.edu

Berners-Lee, Masinter & McCahill [Page 25]

posted @ 2010-11-04 17:54 newdefence 阅读(1058) 评论(0) 编辑收藏举报

newdefence

RFC1738

公告