The Open Archives Initiative Protocol for Metadata Harvesting
|
The Open Archives Initiative Protocol for Metadata Harvesting |
Protocol Version 2.0 of 2002-06-14
|
Editors
The OAI Executive:
Carl
Lagoze <lagoze@cs.cornell.edu
> -- Cornell University - Computer
Science
Herbert Van de
Sompel <herbertv@lanl.gov > --
Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson <m.l.nelson@larc.nasa.gov > -- NASA - Langley Research Center
Simeon Warner <simeon@cs.cornell.edu > -- Cornell University - Computer Science
Table of Contents
1. Introduction
2. Definitions and Concepts
2.1. Harvester
2.2.
Repository
2.3. Item
2.4. Unique Identifier
2.5.
Record
2.5.1 Deleted records
2.6. Set
2.7. Selective
Harvesting
2.7.1
Selective Harvesting and Datestamps
2.7.2 Selective Harvesting and Sets
3. Protocol Features
3.1. HTTP Embedding of OAI-PMH requests
3.1.1. HTTP Request Format
3.1.2. HTTP Response Format
3.1.3. Response Compression
3.2. XML Response Format
3.2.1. XML Schema for Validating Responses to OAI-PMH
Requests
3.3. UTCdatetime
3.3.1. UTCdatetime in Protocol
Requests
3.3.2. UTCdatetime in
Protocol Responses
3.4. metadataPrefix
and Metadata Schema
3.5. Flow
Control
3.5.1 Idempotency of
resumptionTokens
3.6. Error and Exception
Conditions
4. Protocol Requests and
Responses
4.1. GetRecord
4.2. Identify
4.3.
ListIdentifiers
4.4.
ListMetadataFormats
4.5.
ListRecords
4.6. ListSets
5. Dublin Core
6. Implementation
Guidelines
Acknowledgements
Document History
1. Introduction
The Open Archives Initiative Protocol for Metadata Harvesting (referred to as the OAI-PMH in the remainder of this document) provides an application-independent interoperability framework based on metadata harvesting. There are two classes of participants in the OAI-PMH framework:
- Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and
- Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services.
In this document the key words "must", "must not", " required", "shall", "shall not", "should", " should not", "recommended", "may", and "optional " in bold face are to be interpreted as described in RFC 2119 . An implementation is not conformant if it fails to satisfy one or more of the "must" or "required" level requirements for the protocols it implements.
This document refers in several places to "community-specific" practices to which individual protocol implementations may conform. These practices are described in an accompanying Implementation Guidelines document.
2. Definitions and Concepts
2.1 Harvester
A harvester is a client application that issues OAI-PMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories.
2.2 Repository
A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in this document. A repository is managed by a data provider to expose metadata to harvesters. To allow various repository configurations, the OAI-PMH distinguishes between three distinct entities related to the metadata made accessible by the OAI-PMH.
- resource - A resource is the object or "stuff" that metadata is "about". The nature of a resource, whether it is physical or digital, or whether it is stored in the repository or is a constituent of another database, is outside the scope of the OAI-PMH.
- item - An item is a constituent of a repository from which metadata about a resource can be disseminated. That metadata may be disseminated on-the-fly from the associated resource, cross-walked from some canonical form, actually stored in the repository, etc.
- record - A record is metadata in a specific metadata format. A record is returned as an XML-encoded byte stream in response to a protocol request to disseminate a specific metadata format from a constituent item.
2.3 Item
An item is a constituent of a repository from which metadata about a resource can be disseminated. An item is conceptually a container that stores or dynamically generates metadata about a single resource in multiple formats, each of which can be harvested as records via the OAI-PMH. Each item has an identifier that is unique within the scope of the repository of which it is a constituent.
2.4 Unique Identifier
A unique identifier unambigiously identifies an item within a repository; the unique identifier is used in OAI-PMH requests for extracting metadata from the item. Items may contain metadata in multiple formats. The unique identifier maps to the item, and all possible records available from a single item share the same unique identifier.
The format of the unique identifier must correspond to that of the URI (Uniform Resource
Identifier) syntax. Individual communities may develop
community-specific URI schemes for
coordinated use across repositories. The scheme component of the unique
identifiers must not correspond to that of a recognized URI scheme unless
the identifiers conform to that scheme. Repositories may implement the oai-identifier
syntax described in the accompanying Implementation
Guidelines document.
Unique identifiers play two roles in the protocol:
- Response: Identifiers are returned by both the
ListIdentifiersandListRecordsrequests. - Request: An identifier, in combination with a
metadataPrefix, is used in theGetRecordrequest as a means of requesting a record in a specific metadata format from an item.
Note that the identifier described here is not that of a
resource. The nature of a resource identifier is outside the scope of the
OAI-PMH. To facilitate access to the resource associated with harvested
metadata, repositories should use an element in metadata records to
establish a linkage between the record (and the identifier of its item) and the
identifier (URL, URN, DOI, etc.) of the associated resource. The mandatory
Dublin Core format provides the identifier element that
should be used for this purpose.
2.5 Record
A record is metadata expressed in a single format. A record is returned in
an XML-encoded byte stream in response to an OAI-PMH request for metadata from
an item. A record is identified unambigiously by the combination of the unique identifier of the item from which the record
is available, the metadataPrefix identifying the metadata format of
the record, and the datestamp of the record. The
XML-encoding of records is organized into the following parts:
- -- contains the unique identifier of the
item and properties necessary for selective harvesting. The header consists of
the following parts:
- the unique identifier -- the unique identifier of an item in a repository;
- the datestamp -- the date of creation, modification or deletion of the record for the purpose of selective harvesting.
- zero or more setSpec elements -- the set membership of the item for the purpose of selective harvesting.
- an optional
statusattribute with a value ofdeletedindicates the withdrawal of availability of the specified metadata format for the item, dependent on the repository support for deletions.
- metadata -- a single manifestation of the
metadata from an item. The OAI-PMH supports items with multiple manifestations
(formats) of metadata. At a minimum, repositories must be able to return
records with metadata expressed in the Dublin Core
format, without any qualification.
Optionally, a repository may also disseminate other formats of metadata.
The specific metadata format of the record to be disseminated is specified by
means of an argument -- the
metadataPrefix-- in theGetRecordorListRecordsrequest that produces the record. TheListMetadataFormatsrequest returns the list of all metadata formats available from a repository, or for a specific item (which can be specified as an argument to theListMetadataFormatsrequest). - about -- an optional and repeatable container to hold data
about the metadata part of the record. The contents of an about container
must conform to an XML Schema. Individual implementation communities
may create XML Schema that define specific uses for the contents of about
containers. Two common uses of about containers are:
- rights statements: some repositories may find it desirable to attach terms of use to the metadata they make available through the OAI-PMH. No specific set of XML tags for rights expression is defined by OAI-PMH, but the about container is provided to allow for encapsulating community-defined rights tags.
- provenance statements: One suggested use of the about container is to indicate the provenance of a metadata record, e.g. whether it has been harvested itself and if so from which repository, and when. An XML Schema for such a provenance container, as well as some supporting information is available from the accompanying Implementation Guidelines document.
The following example shows an XML-encoding of a record and its components:
- the header part with:
- a unique identifier of the item from which the record was disseminated, equal to oai:arXiv.org:cs/0112017;
- the datestamp of the record equal to 2002-02-28;
- two setSpecs, respectively
csandmath, indicating that the item from which the record was disseminated belongs to two sets of the repository;
- the metadata part. This consists of a single root tag - in the
example the tag
oai_dc:dc- with the nested tags belonging to the corresponding metadata format - in the example, Dublin Core elements such asdc:title. Note that the root tag within the metadata part includes a number of attributes that are common to all XML documents that use namespaces and schema validity:- namespace declarations -- the declarations of the namespaces used
within the metadata part, each of which is prefixed with
xmlns. Namespace declarations within the metadata part fall into two categories:- metadata format specific namespace(s) - every metadata part
must include one or more
xmlnsprefixed attributes that define the correspondence between a metadata format prefix -- e.g.dc-- and the namespace URI (as defined by the XML namespace specification ) of the respective metadata format. Some metadata formats employ tags from multiple namespaces, requiring multiplexmlnsprefixed attributes -- in the example, there are declarations for bothoai_dcanddc. - xml schema namespace - every metadata part must include the
attribute
xmlns:xsi, the value of which must always be the URI shown in the example, which is the namespace URI for XML schema.
- metadata format specific namespace(s) - every metadata part
must include one or more
xsi:schemaLocation-- the value of which is a URI, URL pair; the first is the namespace URI (as defined by the XML namespace specification) of the metadata that follows in this part, and the second is the URL of the XML schema for validation of the metadata that follows.
- namespace declarations -- the declarations of the namespaces used
within the metadata part, each of which is prefixed with
- one about part of the record which uses the
oai_provenance.xsdschema, described in the accompanying Implementation Guidelines document, as a means to provide information regarding the origins of the metadata part of the record. Note that the root element within each about part has the same structure as the root element in the metadata part.
<header> |
2.5.1 Deleted records
If a record is no longer available then it is said to be deleted.
Repositories must declare one of three levels of support for deleted
records in the deletedRecord element of the Identify response:
no- the repository does not maintain information about deletions. A repository that indicates this level of support must not reveal a deleted status in any response.persistent- the repository maintains information about deletions with no time limit. A repository that indicates this level of support must persistently keep track of the full history of deletions and consistently reveal the status of a deleted record over time.transient- the repository does not guarantee that a list of deletions is maintained persistently or consistently. A repository that indicates this level of support may reveal a deleted status for records.
If a repository does not keep track of deletions then such records will
simply vanish from responses and there will be no way for a harvester to
discover deletions through continued incremental harvesting. If a repository
does keep track of deletions then the datestamp of the deleted record
must be the date and time that it was deleted. Responses to GetRecord request for a deleted record must
then include a header with the attribute
status="deleted", and must not include metadata
or about parts. Similarly, responses to selective harvesting requests with set
membership and date range criteria that include deleted records must
include the headers of these records. Incremental harvesting will thus discover
deletions from repositories that keep track of them.
Deleted status is a property of individual records. Like a normal record, a deleted record is identified by a unique identifier, a metadataPrefix and a datestamp. Other records, with different metadataPrefix but the same unique identifier, may remain available for the item.
2.6 Set
A set is an optional construct for grouping items for the purpose
of selective harvesting.
Repositories may organize items into sets. Set organization
may be flat, i.e. a simple list, or hierarchical. Multiple hierarchies
with distinct, independent top-level nodes are allowed. Hierarchical
organization of sets is expressed in the syntax of the setSpec
parameter as described below. When a repository defines a set organization it
must include set membership information in the headers of items returned in response to the ListIdentifiers , ListRecords and GetRecord requests.
Each node in a set organization of a repository has:
- a
setSpec-- a colon [:] separated list indicating the path from the root of the set hierarchy to the respective node. Each element in the list is a string consisting of any valid URI unreserved characters, which must not contain any colons [:]. Since asetSpecforms a unique identifier for the set within the repository, it must be unique for each set. Flat set organizations have only sets withsetSpecthat do not contain any colons [:]. - a
setName-- a short human-readable string naming the set. - a
setDescription-- an optional and repeatable container that may hold community-specific XML-encoded data about the set; the accompanying Implementation Guidelines document provides suggestions regarding the usage of this container.
The following is an example of a possible set hierarchy in a repository:
- Institutions
- Oceanside University of Nebraska
- Valley View University of Florida
- Subjects
- Existential Kenesiology
- Quantum Psychology
The following table shows a possible representation of the above set
hierarchy by means of setName and respective setSpec
values.
setName |
setSpec |
| Institutions | institution |
|
Oceanside University of Nebraska |
institution:nebraska |
|
Valley View University of Florida |
institution:florida |
| Subjects | subject |
|
Existential Kenesiology |
subject:kenesiology |
|
Quantum Psychology |
subject:quantum |
An item may be organized in one set, several sets, or no sets at all.
In the example above, it is conceivable that an individual item is organized in
both subject and institution:florida. A harvester
should not assume that harvesting every set in a repository will retrieve
metadata from all items in the repository. Items may also be assigned to
interior nodes in the set hierarchy.
The actual meaning of a set or of the arrangement of sets in a repository is
not defined in the protocol. It is expected that individual communities may
formulate well-defined set configurations with perhaps a controlled vocabulary
for setNames and setSpec , and may even develop
mechanisms for exposing these to harvesters. For example, a group of cooperating
e-print archives in a specific discipline may agree on sets that arrange
metadata in their repositories based on a controlled subject classification.
A repository's set hierarchy is represented in the protocol via
setSpecs. ListSets returns a
list indicating the configuration of sets in a repository. Each member of this
list must include a setSpec and a setName and
may include a setDescription. ListRecords and ListIdentifiers requests may include
an optional set argument, the value of which is a
setSpec, to specify the target set for selective harvesting. In the
previous example of a set hierarchy, the
setSpec institution:nebraska could be used in a
request to return only those records that are disseminated from items organized
in the set represented by this setSpec. Five issues should be noted
here:
- If a repository supports sets then it must include set membership
information in response to
ListIdentifiers,ListRecordsandGetRecordrequests. The list ofsetSpecshould include only the minimum number ofsetSpecrequired to specify the set membership. Using the previous example of a set hierarchy, the header for an item organized in setinstitution:floridashould not includesetSpecinstitutionsince that is implied by thesetSpecinstitution:florida. - An item may be organized in more than one set; meaning that different
setSpecarguments may return the same record(s). - An item need not be organized in any set; meaning that an exhaustive
repetition of
ListRecordsrequests with all possiblesetSpecsis not guaranteed to return all records in the repository. The only guaranteed methods of harvesting all records or headers areListRecordsorListIdentifiersrequests with nosetSpecargument. - When a
setSpecis used as an argument, the response must include records or headers from all items in the set specified by thesetSpec, and all records or headers from items in sets that are descendant from the specified set. Using the previous example of a set hierarchy, asetSpecofinstitutionto theListRecordsrequest will return all records from metadata organized within the set with asetSpecvalue equal toinstitutionand within the descendent sets with setSpec values equal toinstitution:floridaandinstitution:nebraska. - The set hierarchy of a repository may include sets that are empty.
2.7 Selective Harvesting
Selective harvesting allows harvesters to limit harvest requests to portions of the metadata available from a repository. The OAI-PMH supports selective harvesting with two types of harvesting criteria that may be combined in an OAI-PMH request: datestamps and set membership.
2.7.1 Selective Harvesting and Datestamps
Harvesters may use datestamps to harvest only those records
that were created, deleted, or modified within a specified date range. To
specify datestamp-based selective harvesting, datestamps are included as values
of the optional arguments, from and until, in
the ListRecords and ListIdentifiers requests. Harvesting is
restricted to the range specified by the from and
until arguments, extending back to the earliest datestamp if
from is omitted, and forward to the most recent datestamp if
until is omitted. Range limits are inclusive:
from specifies a bound that must be interpreted as
"greater than or equal to", until specifies a bound that
must be interpreted as "less than or equal to". Therefore, the
from arugment must be less than or equal to the
until argument. Otherwise, a repository must issue a
badArgument error.
Repositories must support selective harvesting with the
from and until arguments expressed at day granularity.
Optional support for seconds granularity is indicated in the response to
the Identify request. The value of
datestamps in both requests and responses must comply to the
specifications for UTCdatetime in this document. A
repository must update the datestamp of a record if a change occurs, the
result of which would be a change to the metadata
part of the XML-encoding of the record. Such changes include, but are not
limited to, changes to the metadata of the record, changes to the metadata
format of the record, introduction of a new metadata format, termination of
support for a metadata format, etc.
Datestamp ranges for selective harvesting are expressed in the
from and until arguments that may be submitted
in the ListRecords and ListIdentifiers requests. Repositories
must use the following rules to create a ListRecords response matching the specified
datestamp range according to the type of change that occured within the
repository. The response to a ListIdentifiers request follows the same
rules but is abbreviated to include only headers rather than records.
- modification - the response must include records,
corresponding to the
metadataPrefixargument, which have changed within the bounds of thefromanduntilarguments. - creation - the response must include records, corresponding to
the
metadataPrefixargument, that have become available from the repository within the bounds of thefromanduntilarguments. - deletion - depending on the level at which a repository keeps track
of deleted records, the response may
include headers of records, corresponding to the
metadataPrefixargument, which have been withdrawn from the repository within the bounds of thefromanduntilarguments. Deleted status is indicated via the status attribute of the header element and no metedata is included.
Every header returned by the GetRecord, ListRecords or ListIdentifiers requests contains a
datestamp, which reflects the most recent date and time of the creation,
modification, or deletion according to the rules defined above.
2.7.2 Selective Harvesting and Sets
Harvesters may specify set membership as a criteria for
selective harvesting. To specify set-based selective harvesting, a setSpec is included as the value of the
optional set argument to the ListRecords and ListIdentifiers requests, thereby
specifying selective harvesting of records from items within the respective
set.
When a setSpec is used as an argument, the response must
include:
- the records corresponding to the
metadataPrefixargument, or headers thereof in the case of deleted records, available from those items in the set specified by thesetSpec; - the records corresponding to the
metadataPrefixargument, or headers thereof in the case of deleted records, available from those items in sets that are descendant from the specified set.
3. Protocol Features
3.1 HTTP Embedding of OAI-PMH requests
OAI-PMH requests are expressed as HTTP requests. A typical implementation uses a standard Web server that is configured to dispatch OAI-PMH requests to the software handling these requests. The remainder of this section describes the aspects of the protocol that are specific to the HTTP embedding.3.1.1 HTTP Request Format
OAI-PMH requests must be submitted using either the HTTP
GET or POST methods. POST has the
advantage of imposing no limitations on the length of arguments. Repositories
must support both the GET and POST methods.
There is a single base URL for all requests. The base URL specifies the Internet
host and port, and optionally a path, of an HTTP server acting as a
repository. Repositories expose their base URL as the value of the
baseURL element in the Identify response. Note that the composition
of any path is determined by the configuration of the repository's HTTP
server.
In addition to the base URL, all requests consist of a list of keyword
arguments, which take the form of key=value pairs. Arguments
may appear in any order and multiple arguments must be separated by
ampersands [&]. Each OAI-PMH request must have at least
one key=value pair that specifies the OAI-PMH request issued by the
harvester:
keyis the string'verb';valueis one of the defined OAI-PMH requests.
The number and nature of additional key=value pairs depends on
the arguments for the individual request.
3.1.1.1 Encoding an OAI-PMH request in a URL for an HTTP GET
URLs for GET requests have keyword arguments appended to the
base URL, separated from it by a question mark [?]. For example,
the URL of a GetRecord request to a
repository with base URL that is http://an.oa.org/OAI-script might
be:
http://an.oa.org/OAI-script?
verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_dc
However, since special characters in URIs must be encoded, the correct form of the above
GET request URL is:
http://an.oa.org/OAI-script?
verb=GetRecord&identifier=oai%3AarXiv.org%3Ahep-th%2F9901001&metadataPrefix=oai_dc
3.1.1.2 Encoding an OAI-PMH request in an HTTP POST
Keyword arguments are carried in the message body of the HTTP
POST. The Content-Type of the request must be
application/x-www-form-urlencoded. For example, submitting the same
request as above using the POST method would use just the base URL
as the URL, with the format of the POST being:
POST http://an.oa.org/OAI-script HTTP/1.0
Content-Length:
82
Content-Type:
application/x-www-form-urlencoded
verb=GetRecord&identifier=oai%3AarXiv.org%3Ahep-th%2F9901001&metadataPrefix=oai_dc
3.1.1.3 Encoding of special characters in keyword arguments of OAI-PMH requests
The syntax rules for URIs restrict a few characters to special roles in certain contexts, and require that if these characters are used in any other way that they must be written as an escape sequence, i.e. a percent sign followed by the character code in hexadecimal. The reserved characters include:
|
Character
|
URI Role
|
Escape Sequence
|
/ |
Path Component Separator
|
%2F |
? |
Query Component Separator
|
%3F |
# |
Fragment Identifier
|
%23 |
= |
Name/Value Separator
|
%3D |
& |
Argument Separator in Query Component
|
%26 |
: |
Host Port Separator
|
%3A |
; |
Authority Namespace Separator
|
%3B |
|
|
Space Character
|
%20 |
|
%
|
Escape Indicator
|
%25 |
|
+
|
Escaped Space
|
%2B |
As a result, these characters must be represented by their respective
escape sequence if their use does not correspond to their established URI
role. In case of the OAI-PMH, this means that the reserved characters
must be encoded when they appear in the value part of the
key=value pairs of the request. This applies for both the
GET and POST encoding of the OAI-PMH requests.
3.1.2 HTTP Response Format
Responses to requests are formatted as HTTP responses, with appropriate HTTP header fields.
3.1.2.1 Content-Type
The Content-Type returned for all OAI-PMH requests must
be text/xml.
3.1.2.2 Status-Code
OAI-PMH errors are distinguished from HTTP
Status-Codes. Since OAI-PMH uses HTTP as a transport layer, servers
implementing OAI-PMH must conform to HTTP status code definitions and
report relevant HTTP transport layer status via those Status-Codes.
OAI-PMH repositories may employ HTTP Status-Codes in
addition to "200 OK". For instance, the following
Status-Codes may be useful for load balancing in OAI
repositories:
302- Allows the repository to temporarily redirect an OAI-PMH request to another repository. The URI of the temporary repository should be given by theLocationfield in the HTTP response.503- Service unavailable, aRetry-Afterperiod is specified. Harvesters should wait this period before attempting another OAI-PMH request.
3.1.3 Response Compression
Response compression is optional in OAI-PMH. Compression of responses to OAI-PMH requests is handled at the level of HTTP, with the following restrictions:
- Harvesters may include an
Accept-Encodingheader in their OAI-PMH requests to specify response compression preferences. - Harvesters that do not include an
Accept-Encodingheader in their requests will always receive uncompressed responses. - When a request includes an
Accept-Encodingheader the list of encodings must include theidentity(no compression) encoding (with a non-zeroqvalue). - Repositories must support the HTTP
identityencoding. - Repositories should express the encodings they support in addition to
identityby includingcompressionelements in theIdentifyresponse.
3.2 XML Response Format
All responses to OAI-PMH requests must be well-formed XML instance documents. Encoding of the XML must use the UTF-8 representation of Unicode. Character references, rather than entity references, must be used. Character references allow XML responses to be treated as stand-alone documents that can be manipulated without dependency on entity declarations external to the document.
The XML data for all responses to OAI-PMH requests must validate against the XML Schema shown at the end of this section . As can be seen from that schema, responses to OAI-PMH requests have the following common markup:
- The first tag output is an XML declaration where the
version is
always
1.0and the encoding is alwaysUTF-8, eg:<?xml version="1.0" encoding="UTF-8" ?> - The remaining content is enclosed in a root element with the name OAI-PMH.
This element must have three attributes that define the XML namespaces
used in the remainder of the response and the location of the validating schema:
xmlns-- the value of which must be the namespace URI of the OAI-PMH (http://www.openarchives.org/OAI/2.0/).xmlns:xsi-- the value of which must be the namespace URI for XML schema (http://www.w3.org/2001/XMLSchema-instance).xsi:schemaLocation-- is a pair, the first part of which is the namespace URI (as defined by the XML namespace specification ) of the OAI-PMH (http://www.openarchives.org/OAI/2.0/), and the second part is the URL of the XML schema for validation of the response (http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd).
- For all responses, the first two children of the root element are:
responseDate-- a UTCdatetime indicating the time and date that the response was sent. This must be expressed in UTC.request-- indicating the protocol request that generated this response. The rules for generating therequestelement are as follows:- The content of the
requestelement must always be the base URL of the protocol request; - The only valid attributes for the
requestelement are thekeysof thekey=valuepairs of protocol request. The attribute values must be the correspondingvaluesof thosekey=valuepairs; - In cases where the request that generated this response did not result in an
error or exception condition, the attributes and
attribute values of the
requestelement must match thekey=valuepairs of the protocol request; - In cases where the request that generated this response resulted in
a badVerborbadArgumenterror condition, the repository must return the base URL of the protocol request only. Attributes must not be provided in these cases.
- The content of the
- The third child of the root element is either:
- an
errorelement that must be used in case of an error or exception condition; - an element with the same name as the verb of the respective OAI-PMH request.
- an
An example of a successful reply to the GetRecord request shown
above is of the form:
<?xml version="1.0" encoding="UTF-8" ?> |
3.2.1 XML Schema for Validating Responses to OAI-PMH Requests
<schema targetNamespace="http://www.openarchives.org/OAI/2.0/" |
| This Schema is available at http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd |
3.3 UTCdatetime
Dates and times are uniformly encoded using ISO8601 and are expressed in UTC
throughout the protocol. When time is included, the special UTC designator
("Z") must be used. UTC is implied for dates although no
timezone designator is specified. For example, 1957-03-20T20:30:00Z
is UTC 8:30:00 PM on March 20th 1957. UTCdatetime is used in both protocol
requests and protocol replies, in the way described in the following
sections.
3.3.1 UTCdatetime in Protocol Requests
Datestamps used as values of the optional
arguments from and until in the ListIdentifiers and ListRecords requests are encoded using ISO8601 and are expressed in UTC.
These arguments are used to specify datestamp-based
selective harvesting. These arguments support the "Complete date" and the
"Complete date plus hours, minutes and seconds" granularities defined in
ISO8601. The legitimate formats are YYYY-MM-DD and
YYYY-MM-DDThh:mm:ssZ. Both arguments must have the same
granularity. All repositories must support YYYY-MM-DD. A
repository that supports YYYY-MM-DDThh:mm:ssZ should
indicate so in the Identify response. A
request by a harvester with finer granularity than that supported by a
repository must produce an error.
3.3.2 UTCdatetime in Protocol Responses
Datestamps appear in the headers of records that are
returned in response to ListIdentifiers , GetRecord and ListRecords requests. These datestamps are encoded using ISO8601 and are expressed in UTC;
they must be expressed in the finest granularity supported by the
repository. The value of the datestamp must correspond to the rules for datestamp-based selective harvesting.
Each protocol response includes a responseDate element, which
must be the time and date of the response in UTC. This is encoded using
the "Complete date plus hours, minutes, and seconds" variant of ISO8601. This format is
YYYY-MM-DDThh:mm:ssZ.
A resumptionToken in a protocol reply
may include an optional argument expirationDate,
which is expressed in UTC. This is encoded using the "Complete date plus hours,
minutes, and seconds" variant of ISO8601. This format is
YYYY-MM-DDThh:mm:ssZ.
3.4 metadataPrefix and Metadata Schema
OAI-PMH supports the dissemination of records in multiple metadata formats
from a repository. The ListMetadataFormats request returns the
list of all metadata formats available from a repository, each of which has the
following properities:
- The
metadataPrefix- a string to specify the metadata format in OAI-PMH requests issued to the repository.metadataPrefixconsists of any valid URI unreserved characters.metadataPrefixarguments are used inListRecords,ListIdentifiers, andGetRecordrequests to retrieve records, or the headers of records that include metadata in the format specified by themetadataPrefix; - The metadata schema URL - the URL of an XML schema to test validity of metadata expressed according to the format;
- The XML namespace URI that is a global identifier of the metadata format.
The metadata in each record returned by ListRecords and GetRecord must comply with the conventions
of the XML
namespace specification. This means that the root element of the metadata
part must contain an xmlns attribute, the value of which is
the XML namespace URI of the metadata format. The root element must also
contain an xsi:schemaLocation attribute that has a value that
includes the URL of the XML schema for validation of the metadata. This URL
must match the URL of the metadata schema for the
metadataPrefix included as an argument to the ListRecords or GetRecord request (the mapping from
metadataPrefix to metadata schema is defined by the repository's
response to the ListMetadataFormats request).
For purposes of interoperability, repositories must disseminate Dublin Core,
without any qualification.
Therefore, the protocol reserves the metadataPrefix
`oai_dc', and the URL of a metadata schema for unqualified Dublin
Core, which is http://www.openarchives.org/OAI/2.0/oai_dc.xsd.
The corresponding XML
namespace URI is http://www.openarchives.org/OAI/2.0/oai_dc/.
The metadataPrefix `all' is reserved for future
use. Implementations should not use this metadataPrefix.
Communities should adopt guidelines for sharing of
metadataPrefixes, metadata schema and XML namespace URI's of
metadata formats. Such guidelines are outside of the scope of the OAI-PMH. The
accompanying Implementation
Guidelines document provides some sample XML Schema and instance documents
for common metadata formats such as MARC and RFC 1807.
3.5 Flow Control
A number of OAI-PMH requests return a list of discrete entities:
ListRecords returns a list of records, ListIdentifiers returns a list of headers, and ListSets
returns a list of sets. Collectively these requests are
called list requests. In some cases, these lists may be large and it may
be practical to partition them among a series of requests and responses. This
partitioning is accomplished as follows:
- A repository replies to a request with an incomplete list and a
resumptionToken; - In order to make the response a complete list, the harvester will
need to issue one or more requests with
resumptionTokensas arguments. The complete list then consists of the concatenation of the incomplete lists from the sequence of requests, known as a list request sequence.
Details of flow control and the resumptionToken are as
follows:
- The only defined use of
resumptionTokenis as follows:- a repository must include a
resumptionTokenelement as part of each response that includes an incomplete list; - in order to retrieve the next portion of the complete list, the next request
must use the value of that
resumptionTokenelement as the value of theresumptionTokenargument of the request; - the response containing the incomplete list that completes the list
must include an empty
resumptionTokenelement;
resumptionTokenby a harvester are illegal and must return an error. - a repository must include a
- In all cases when a
resumptionTokenis issued, the incomplete list must consist of complete entities; e.g., all individual records returned in an incomplete record list from aListRecordsrequest must be intact. - The format of the
resumptionTokenis not defined by the OAI-PMH and should be considered opaque by the harvester. - The protocol does not define the semantics of incompleteness. Therefore, a harvester should not assume that the members in an incomplete list conform to some selection criteria (e.g., date ordering).
- Before including a
resumptionTokenin the URL of a subsequent request, a harvester must encode any special characters in it.
The following optional attributes may be included as part of
the resumptionToken element along with the
resumptionToken itself:
expirationDate-- a UTCdatetime indicating when theresumptionTokenceases to be valid.completeListSize-- an integer indicating the cardinality of the complete list (i.e., the sum of the cardinalities of the incomplete lists). Because there may be changes in a repository during a list request sequence, as described under Idempotency of resumptionTokens, the value ofcompleteListSizemay be only an estimate of the actual cardinality of the complete list and may be revised during the list request sequence.cursor-- a count of the number of elements of the complete list thus far returned (i.e.cursorstarts at 0).
The following example is a series of ListRecords requests where the complete list consists of 175 records and the repository only returns 100 records per response.
- The harvester issues a
ListRecordsrequest. - The repository responds with an incomplete list of 100 records. The
repository marks this list as incomplete by including in the response a
non-empty
resumptionTokenelement, with two attributes: acompleteListSizeof 175, and acursorof 0. - The harvester issues a subsequent
ListRecordsrequest that includes theresumptionTokenthat it received in the previous response. - The repository responds with an incomplete list of 75 records. The
repository marks this list as the final incomplete list by including in the
response an empty
resumptionTokenelement with two attributes: acompleteListSizeof 175, and acursorof 100.
This flow control mechanism, in combination with HTTP transport layer facilities, provides some basic tools with which a repository can enforce an acceptable use policy for its harvesting interface. Communities implementing the OAI-PMH may need more extensive tools to enforce acceptable use policies for either the harvesting interface of their repositories or for the metadata harvested from those repositories. The enforcement of such additional policies is outside of the scope of the OAI-PMH.
3.5.1 Idempotency of resumptionTokens
Repositories that implement resumptionTokens must do so
in a manner that allows harvesters to resume a sequence of requests for
incomplete lists by re-issuing a list request with the most recent
resumptionToken. The purpose of this is to allow harvesters to
recover from network or other errors that would otherwise mean that the list
request sequence would have to be started again. A re-issue of a list request
with a resumptionToken occurs in two contexts:
- When there are no changes in the repository. There are no changes to
the complete list returned by the list request sequence. In this case, the
repository must return the same incomplete list when the most recent list
request, i.e. the one with the most recent non-expired
resumptionToken, is re-issued. - When there are changes in the repository. There may be changes to the
complete list returned by the list request sequence. These changes occur when
the records disseminated in the list move in or out of the datestamp range of the request because of changes,
modifications, or deletions in the repository. In this case, strict idempotency
of the incomplete-list requests using
resumptionTokenvalues is not required. Instead, the incomplete list returned in response to a re-issued request must include all records with unchangeddatestampswithin the range of the initial list request. The incomplete list returned in response to a re-issued request may contain records with datestamps that either moved into or out of the range of the initial request. In cases where there are substantial changes to the repository, it may be appropriate for a repository to return abadResumptionTokenerror, signaling that the harvester should restart the list request sequence.
3.6 Error and Exception Conditions
In event of an error or exception condition, repositories must
indicate OAI-PMH errors, distinguished from HTTP
Status-Codes, by including one or more error
elements in the response. While one error element is sufficient to
indicate the presence of the error or exception condition, repositories
should report all errors or exceptions that arise from processing the
request. Each error element must have a code
attribute that must be from the following table; each error
element may also have a free text string value to provide information
about the error that is useful to a human reader. These strings are not defined
by the OAI-PMH.
| Error Codes | Description | Applicable Verbs |
badArgument |
The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax. | all verbs |
badResumptionToken |
The value of the resumptionToken argument is invalid or
expired. |
ListIdentifiers |
badVerb |
Value of the verb argument is not a legal OAI-PMH verb, the
verb argument is missing, or the verb argument is repeated. |
N/A |
cannotDisseminateFormat |
The metadata format identified by the value given for the
metadataPrefix argument is not supported by the item or by the
repository. |
GetRecordListRecords |
idDoesNotExist |
The value of the identifier argument is unknown or illegal in
this repository. |
GetRecordListMetadataFormats |
noRecordsMatch |
The combination of the values of the from, until,
set and metadataPrefix arguments results in an empty
list. |
ListIdentifiersListRecords |
noMetadataFormats |
There are no metadata formats available for the specified item. | ListMetadataFormats |
noSetHierarchy |
The repository does not support sets. | ListSets |
The following example demonstrates error handling in the case of an illegal verb argument. All request URLs shown from now on will be wrapped to make them more readable.
Request
http://arXiv.org/oai2?
verb=nastyVerb
Response
<?xml version="1.0" encoding="UTF-8"?> |
The following example demonstrates error handling in the case of a
ListSets request to a repository that does not handle sets.
Request
http://arXiv.org/oai2?
verb=ListSets
- Response
<?xml version="1.0" encoding="UTF-8"?> |
4. Protocol Requests and Responses
This section lists the requests, or verbs, defined in the
OAI-PMH. The documentation for each request is organized as follows:
- A section title corresponding to the token used to specify the request as
the required
verbargument to an HTTP request. - A brief summary of the meaning of the verb and notes on its usage.
- The list of additional arguments for the request. Arguments are of three
types:
- required, the argument must be included with the request (the
verbargument is always required, as described in HTTP Request Format). - optional, the argument may be included with the request.
- exclusive, the argument may be included with request, but
must be the only argument (in addition to the
verbargument).
- required, the argument must be included with the request (the
- Error and exception conditions specific to the protocol request.
- One or more example requests and corresponding responses, with explanatory notes if appropriate.
An XML Schema defines the format of valid replies to all OAI-PMH requests.
4.1 GetRecord
Summary and Usage Notes
This verb is used to retrieve an individual metadata record from a
repository. Required arguments specify the identifier of the item from which the
record is requested and the format of the metadata that should be included in
the record. Depending on the level at which a repository tracks deletions, a header with a "deleted" value for the
status attribute may be returned, in case the metadata
format specified by the metadataPrefix is no longer available from
the repository or from the specified item.
Arguments
identifiera required argument that specifies the unique identifier of the item in the repository from which the record must be disseminated.metadataPrefixa required argument that specifies themetadataPrefixof the format that should be included in the metadata part of the returned record . A record should only be returned if the format specified by themetadataPrefixcan be disseminated from the item identified by the value of the identifier argument. The metadata formats supported by a repository and for a particular record can be retrieved using theListMetadataFormatsrequest.
Error and Exception Conditions
badArgument- The request includes illegal arguments or is missing required arguments.cannotDisseminateFormat- The value of themetadataPrefixargument is not supported by the item identified by the value of theidentifierargument.idDoesNotExist- The value of theidentifierargument is unknown or illegal in this repository.
Examples
Request
Request a record in the Dublin Core metadata format [URL shown without encoding to be more readable].
http://arXiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc
Response
<?xml version="1.0" encoding="UTF-8"?> |
Request
Request a record in the Dublin Core metadata format. The requested record,
however, can not be returned because the identifier does not exist. Therefore,
the response does not contain a record container. It does have an
error element with a code attribute that has the value
idDoesNotExist. [URL shown without encoding for better readability].
http://arXiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv.org:quant-ph/02131001&metadataPrefix=oai_dc
Response
<?xml version="1.0" encoding="UTF-8"?> |
Request
Request a record in the oai_marc metadata format. However, the requested
metadata format can not be disseminated for this identifier. Therefore, the
response contains no record. It does contain an error element with
a code attribute that has the value
cannotDisseminateFormat. [URL shown without encoding for better readability].
http://arXiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv.org:quant-ph/9901001&metadataPrefix=oai_marc
Response
<?xml version="1.0" encoding="UTF-8"?> |
4.2 Identify
Summary and Usage Notes
This verb is used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Repositories may also employ the Identify verb to return additional descriptive information.
Arguments
None
Error and Exception Conditions
badArgument- The request includes illegal arguments.
Response Format
The response must include one instance of the following elements:
repositoryName: a human readable name for the repository;baseURL: the base URL of the repository;protocolVersion: the version of the OAI-PMH supported by the repository;earliestDatestamp: a UTCdatetime that is the guaranteed lower limit of all datestamps recording changes, modifications, or deletions in the repository. A repository must not use datestamps lower than the one specified by the content of theearliestDatestampelement.earliestDatestampmust be expressed at the finest granularity supported by the repository.deletedRecord: the manner in which the repository supports the notion of deleted records. Legitimate values areno;transient;persistentwith meanings defined in the section on deletion.granularity:the finest harvesting granularity supported by the repository. The legitimate values areYYYY-MM-DDandYYYY-MM-DDThh:mm:ssZwith meanings as defined in ISO8601.
The response must include one or more instances of the following element:
adminEmail: the e-mail address of an administrator of the repository.
The response may include multiple instances of the following optional elements:
compression: a compression encoding supported by the repository. The recommended values are those defined for theContent-Encodingheader in Section 14.11 of RFC 2616 describing HTTP 1.1. Acompressionelement should not be included for theidentityencoding, which is implied.description: an extensible mechanism for communities to describe their repositories. For example, thedescriptioncontainer could be used to include collection-level metadata in the response to the Identify request. Implementation Guidelines are available to give directions with this respect. Eachdescriptioncontainer must be accompanied by the URL of an XML schema describing the structure of the description container.
Examples
Request
http://memory.loc.gov/cgi-bin/oai?
verb=Identify
Response
The below example of a response to the Identify request contains
three description containers:
- The
oai-identifiercontainer complies to an XML Schema, which is available at http://www.openarchives.org/OAI/2.0/oai-identifier.xsd. This schema, provided in the accompanying Implementation Guidelines document, is used by repositories that choose to comply with a specific format of unique identifiers for items. The format of that identifier is explained by means of comments in the oai-identifier.xsd XML Schema. - The
eprintscontainer complies to an XML Schema, which is available at http://www.openarchives.org/OAI/1.1/eprints.xsd. This schema, provided in the accompanying Implementation Guidelines document, has been agreed upon by the OAI e-print community, and contains information specific to repositories in that community. - The
friendscontainer complies to an XML Schema, which is available at http://www.openarchives.org/OAI/2.0/friends.xsd. This schema, provided in the accompanying Implementation Guidelines document, is used by repositories that want to point harvesters to other repositories, by listing their base URLs. Usage of thefriendscontainer is recommended; it may support harvesters in discovering the network-location of repositories.
<?xml version="1.0" encoding="UTF-8"?> |
4.3 ListIdentifiers
Summary and Usage Notes
This verb is an abbreviated form of ListRecords, retrieving only headers rather than records. Optional
arguments permit selective harvesting of headers based on
set membership and/or datestamp. Depending on the
repository's support for deletions, a returned header may have a status attribute of
"deleted" if a record matching the arguments specified in the request has been
deleted.
Arguments
froman optional argument with a UTCdatetime value, which specifies a lower bound for datestamp-based selective harvesting.untilan optional argument with a UTCdatetime value, which specifies a upper bound for datestamp-based selective harvesting.metadataPrefixa required argument, which specifies that headers should be returned only if the metadata format matching the suppliedmetadataPrefixis available or, depending on the repository's support for deletions, has been deleted. The metadata formats supported by a repository and for a particular item can be retrieved using theListMetadataFormatsrequest.setan optional argument with asetSpecvalue , which specifies set criteria for selective harvesting.resumptionTokenan exclusive argument with a value that is the flow control token returned by a previousListIdentifiersrequest that issued an incomplete list.
Error and Exception Conditions
badArgument- The request includes illegal arguments or is missing required arguments.badResumptionToken- The value of theresumptionTokenargument is invalid or expired.cannotDisseminateFormat- The value of themetadataPrefixargument is not supported by the repository.noRecordsMatch- The combination of the values of thefrom,until, andsetarguments results in an empty list.noSetHierarchy- The repository does not support sets.
Examples
Request
List the headers of records in the oldarXiv metadata format that are added, modified or deleted since January 15, 1998 in the set physics:hep. [URL shown without encoding for better readability].
http://an.oa.org/OAI-script?
verb=ListIdentifiers&from=1998-01-15&metadataPrefix=oldArXiv&set=physics:hep
Response
A list of four headers is returned. One header has a deleted
status, indicating that a record in the metadata format specified by the
metadataPrefix is no longer available. In addition, a
resumptionToken (non-empty, value xxx45abttyz) has
been returned, indicating that the list of headers is incomplete and that
one or more subsequent requests will need to be issued to retrieve a
complete list. In the example, the resumptionToken comes
with all of the 3 optional attributes: expirationDate indicates
that the resumptionToken will become unusable after 11:20 PM UTC on
June 1st 2002; completeListSize indicates that the complete
list consists of 6 identifiers; the zero-value for cursor indicates
that no headers have been returned previous to this reply.
<?xml version="1.0" encoding="UTF-8"?> |
Request
Issue a subsequent request to the one issued above. The single
resumptionToken argument has the value returned in the previous
response. [URL shown without encoding for
better readability].
http://an.oa.org/OAI-script?
verb=ListIdentifiers&resumptionToken=xxx45abttyz
Response
Two more headers are returned. The resumptionToken element at
the end of the list has no value, indicating that the list is now complete. The
value of the completeListSize attribute remains 6, while the value
of the cursor attribute has changed to 4, indicating that a
previous reply has (or previous replies have) already delivered 4
identifiers.
<?xml version="1.0" encoding="UTF-8"?> |
Request
List the headers of olac-formatted records, added or modified on January 1, 2001 in the set Perseus:collection:PersInfo. There are no matches for this request, hence, the response contains an error tag and does not contain any header elements [URL shown without encoding for better readability].
http://www.perseus.tufts.edu/cgi-bin/pdataprov?
verb=ListIdentifiers&metadataPrefix=olac&from=2001-01-01&until=2001-01-01
&set=Perseus:collection:PersInfo
Response
<?xml version="1.0" encoding="UTF-8"?> |
4.4 ListMetadataFormats
Summary and Usage Notes
This verb is used to retrieve the metadata formats available from a repository. An optional argument restricts the request to the formats available for a specific item.
Arguments
identifieran optional argument that specifies the unique identifier of the item for which available metadata formats are being requested. If this argument is omitted, then the response includes all metadata formats supported by this repository. Note that the fact that a metadata format is supported by a repository does not mean that it can be disseminated from all items in the repository.
Error and Exception Conditions
badArgument- The request includes illegal arguments or is missing required arguments.idDoesNotExist- The value of theidentifierargument is unknown or illegal in this repository.noMetadataFormats- There are no metadata formats available for the specified item.
Examples
Request
List the metadata formats that can be disseminated from the repository
http://www.perseus.tufts.edu/cgi-bin/pdataprov for the item with
unique identifier oai:perseus.tufts.edu:Perseus:text:1999.02.0119
[URL shown without encoding for better
readability].
http://www.perseus.tufts.edu/cgi-bin/pdataprov?
verb=ListMetadataFormats&identifier=oai:perseus.tufts.edu:Perseus:text:1999.02.0119
Response
The response shows that 3 metadata formats are supported for the given identifier: oai_dc, olac and perseus. For each of the formats, the location of an XML Schema describing the format, as well as the XML Namespace URI is given.
<?xml version="1.0" encoding="UTF-8"?> |
Request
List the metadata formats that can be disseminated from the repository
http://memory.loc.gov/cgi-bin/oai.
http://memory.loc.gov/cgi-bin/oai?
verb=ListMetadataFormats
Response
The response shows that the repository supports two metadata formats:
oai_dc, and oai_marc. For each of the formats, the
location of an XML Schema describing the format is given. The support of these
formats at the repository-level does not imply support of each format for each
item of the repository.
<?xml version="1.0" encoding="UTF-8"?> |
Request
List the metadata formats that can be disseminated for the unique identifier
oai:lcoa1.loc.gov:loc.rbc/rbpe.00000111 in the repository
http://memory.loc.gov/cgi-bin/oai. The identifier, however, does
not exist and therefore, the response contains an error element and
no metadataFormat container. [URL shown without encoding for better readability].
http://memory.loc.gov/cgi-bin/oai?
verb=ListMetadataFormats&identifier=oai:lcoa1.loc.gov:loc.rbc/rbpe.00000111
Response
<?xml version="1.0" encoding="UTF-8"?> |
4.5 ListRecords
Summary and Usage Notes
This verb is used to harvest records from a repository. Optional arguments
permit selective harvesting of records based on set membership and/or
datestamp. Depending on the repository's support for deletions, a returned header
may have a status attribute of "deleted" if a record
matching the arguments specified in the request has been deleted. No metadata
will be present for records with deleted status.
Arguments
froman optional argument with a UTCdatetime value, which specifies a lower bound for datestamp-based selective harvesting.untilan optional argument with a UTCdatetime value, which specifies a upper bound for datestamp-based selective harvesting.setan optional argument with asetSpecvalue , which specifies set criteria for selective harvesting.resumptionTokenan exclusive argument with a value that is the flow control token returned by a previousListRecordsrequest that issued an incomplete list.metadataPrefixa required argument (unless the exclusive argumentresumptionTokenis used) that specifies themetadataPrefixof the format that should be included in the metadata part of the returned records. Records should be included only for items from which the metadata format
matching themetadataPrefixcan be disseminated. The metadata formats supported by a repository and for a particular item can be retrieved using theListMetadataFormatsrequest.
Error and Exception Conditions
badArgument- The request includes illegal arguments or is missing required arguments.badResumptionToken- The value of theresumptionTokenargument is invalid or expired.cannotDisseminateFormat- The value of themetadataPrefixargument is not supported by the repository.noRecordsMatch- The combination of the values of thefrom,until,setandmetadataPrefixarguments results in an empty list.noSetHierarchy- The repository does not support sets.
Examples
Request
List the records expressed in oai_rfc1807 metadata format, that
have been added or modified since January 15, 1998 in the hep
subset of the physics set [URL shown without encoding for better readability].
http://an.oa.org/OAI-script?
verb=ListRecords&from=1998-01-15&set=physics:hep&metadataPrefix=oai_rfc1807
Response
Two records are returned:- The first record is expressed in the
oai_rfc1807metadata. This record also has anaboutpart, and the item from which it was disseminated belongs to two sets (physics:hepandmath). - The second has a header with a
status="deleted"attribute (and therefore no metadata part).
Note: The reply only includes records for those items from which metadata in
oai_rfc1807 can be disseminated. No records are returned for those
items that fit the from, until, and set
arguments but from which the specified format can not be disseminated.
<?xml version="1.0" encoding="UTF-8"?> |
Request
Request records in the oai_dc metadata format, modified or added
between 2:15pm and 2:20pm UTC on May 1st 2002. [URL shown without encoding for better readability].
http://www.perseus.tufts.edu/cgi-b:in/pdataprov?
verb=ListRecords&from=2002-05-01T14:15:00Z&until=2002-05-01T14:20:00Z&
metadataPrefix=oai_dc
Response
Two records are returned. The second one has a provenance
container in its about element, giving an insight in its chain of
provenance.
<?xml version="1.0" encoding="UTF-8"?> |
Request
Request records in the the oai_marc metadata format, modified or
added between 2:00am and 3:00am UTC on June 1st 2002. The specified granularity
is not supported by the repository and therefore, an error with
code attribute of badArgument is returned. [URL shown
without encoding for better readability].
http://memory.loc.gov/cgi-bin/oai?
verb=ListRecords&from=2002-06-01T02:00:00Z&until=2002-06-01T03:00:00Z&metadataPrefix=oai_marc
Response
<?xml version="1.0" encoding="UTF-8"?> |
4.6 ListSets
Summary and Usage Notes
This verb is used to retrieve the set structure of a repository, useful for selective harvesting.
Arguments
resumptionTokenan exclusive argument with a value that is the flow control token returned by a previousListSetsrequest that issued an incomplete list.
Error and Exception Conditions
badArgument- The request includes illegal arguments or is missing required arguments.badResumptionToken- The value of theresumptionTokenargument is invalid or expired.noSetHierarchy- The repository does not support sets.
Examples
Request
http://an.oa.org/OAI-script?
verb=ListSets
Response
The following response indicates a set hierarchy with two top level sets with
respective setSpec music and video. The
music set has two subsets, with setSpec music:(muzak)
and music:(elec). The subsets identified by setSpec
music:(elec), has a setDescription element which holds a
Dublin Core container, used to describe its contents.
<?xml version="1.0" encoding="UTF-8"?> |
Request
http://purl.org/alcme/etdcat/servlet/OAIHandler?
verb=ListSets
Response
The response shows that the repository does not have a set hierarchy.
<?xml version="1.0" encoding="UTF-8"?> |
5. Dublin Core
The following table shows the XML Schema for Dublin Core without
qualification, which is associated with the reserved metadataPrefix
oai_dc in the OAI-PMH. All examples in this document that include Dublin
Core metadata, validate against this XML schema. Schema for other metadata
formats are provided in the accompanying Implementation
Guidelines document.
|
A XML schema for validating Unqualified Dublin Core metadata associated
|
<schema targetNamespace="http://www.openarchives.org/OAI/2.0/oai_dc/" |
| This Schema is available at http://www.openarchives.org/OAI/2.0/oai_dc.xsd |
Examples
<?xml version="1.0" encoding="UTF-8"?> |
<?xml version="1.0" encoding="UTF-8"?> |
6. Implementation Guidelines
Some passages in this document refer to the existence and goals of the accompanying Implementation Guidelines document.
Acknowledgements
Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416.
This document is based on the deliberations of the OAI Technical Committee: Caroline Arms (Library of Congress), Thomas Baron (CERN), Steven Bird (University of Pennsylvania), Les Carr (University of Southampton), Tim Cole (University of Illinois at Urbana Champaign), Thomas Krichel (Long Island University), Carl Lagoze (Cornell University), Michael Nelson (NASA), Andy Powell (UKOLN & University of Bath), Mogens Sandfaer (Danmarks Tekniske Videncenter), Hussein Suleman (Virginia Tech), Robert Tansley (HP), Herbert Van de Sompel (Los Alamos National Laboratory), Simeon Warner (Cornell University), Muhammad Zubair (Old Dominion University) and Jeff Young (OCLC).
Many thanks to all involved in alpha-testing of version 2.0 of the OAI-PMH. In addition to the above: Tim Brody (University of Southampton), Irena Dijour (Ex Libris), Naomi Dushay (Cornell University), Susanne Dobratz (Humboldt Universität zu Berlin), Curtis Fornadley (UCLA), Christopher Gutteridge (University of Southampton), Alan Kent (InQuirion Pty Ltd & RMIT University), David Letts (The British Library), Xiaoming Liu (Old Dominion University), Jon Phipps (Cornell University) and Francois Schiettecatte (FS Consulting Inc).
Special thanks to Pete Johnston (UKOLN & University of Bath) and Andy Powell (UKOLN & University of Bath) for work on the Dublin Core schema, and to Donna Bergmark (Cornell University) for work on the OAI validation and registration service.
Many thanks to everyone involved in the compilation and alpha-testing of version 1.0 and 1.1 of the OAI-PMH, and to all of you using this protocol.
Document History
2004-10-12: Changed wording and schema definition for characters
allowed in setSpec and metadataPrefix to
agree.
2004-09-15: Added section 2.5.1.
Corrected section 2.6. Corrected second example in section 5. Changed schema to defined
a type for protocolVersion and to enforce use of Z
notation for UTC datetime.
2003-02-21: Changed
identifiers in the examples so that they conform to version 2.0 of the
oai-identifier specification.
2002-12-19: Updated oai_dc schema to use revised Dublin Core
schema simpledc20021212.xsd. Corrected provenance
blocks in examples (sections 2.5 and 4.5).
2002-06-14: Release of OAI-PMH version
2.0.
2002-05-02: Release of beta version of OAI-PMH version
2.0.
2002-05-06: Release of alpha-4 version of OAI-PMH version 2.0.
Changed document to reflect association of datestamps and deleted status with
records as opposed to items. Changed requestURL to
request. Changed schema location of oai-identifier and
oai_dc schema. Changed validation of about,
metadata, description and setDescription
to strict.
2002-04-07: Changed document to reflect the usage of a
single schema to validate all OAI-PMH responses.
2002-03-30: Release
of alpha two version of OAI-PMH version 2.0.
2002-03-01: Release of
alpha version of OAI-PMH version 2.0

浙公网安备 33010602011771号