An Introduction to RDF and the Jena RDF API

Preface

This is a tutorial introduction to both W3C's Resource Description Framework (RDF) and Jena, a Java API for RDF. It is written for the programmer who is unfamiliar with RDF and who learns best by prototyping, or, for other reasons, wishes to move quickly to implementation. Some familiarity with both XML and Java is assumed.

Implementing too quickly, without first understanding the RDF data model, leads to frustration and disappointment. Yet studying the data model alone is dry stuff and often leads to tortuous metaphysical conundrums. It is better to approach understanding both the data model and how to use it in parallel. Learn a bit of the data model and try it out. Then learn a bit more and try that out. Then the theory informs the practice and the practice the theory. The data model is quite simple, so this approach does not take long.

RDF has an XML syntax and many who are familiar with XML will think of RDF in terms of that syntax. This is mistake. RDF should be understood in terms of its data model. RDF data can be represented in XML, but understanding the syntax is secondary to understanding the data model.

An implementation of the Jena API, including the working source code for all the examples used in this tutorial can be downloaded from http://jena.sourceforge.net/downloads.html.

Introduction

The Resource Description Framework (RDF) is a standard (technically a W3C Recommendation) for describing resources. What is a resource? That is rather a deep question and the precise definition is still the subject of debate. For our purposes we can think of it as anything we can identify. You are a resource, as is your home page, this tutorial, the number one and the great white whale in Moby Dick.

Our examples in this tutorial will be about people. They use an RDF representation of VCARDS. RDF is best thought of in the form of node and arc diagrams. A simple vcard might look like this in RDF:

The resource, John Smith, is shown as an elipse and is identified by a Uniform Resource Identifier (URI)¹, in this case "http://.../JohnSmith". If you try to access that resource using your browser, you are unlikely to be successful; April the first jokes not withstanding, you would be rather surprised if your browser were able to deliver John Smith to your desk top. If you are unfamiliar with URI's, think of them simply as rather strange looking names.

Resources have properties. In these examples we are interested in the sort of properties that would appear on John Smith's business card. Figure 1 shows only one property, John Smith's full name. A property is represented by an arc, labeled with the name of a property. The name of a property is also a URI, but as URI's are rather long and cumbersome, the diagram shows it in XML qname form. The part before the ':' is called a namespace prefix and represents a namespace. The part after the ':' is called a local name and represents a name in that namespace. Properties are usually represented in this qname form when written as RDF XML and it is a convenient shorthand for representing them in diagrams and in text. Strictly, however, properties are identified by a URI. The nsprefix:localname form is a shorthand for the URI of the namespace concatenated with the localname. There is no requirement that the URI of a property resolve to anything when accessed by a browser.

Each property has a value. In this case the value is a literal, which for now we can think of as a strings of characters². Literals are shown in rectangles.

Jena is a Java API which can be used to create and manipulate RDF graphs like this one. Jena has object classes to represent graphs, resources, properties and literals. The interfaces representing resources, properties and literals are called Resource, Property and Literal respectively. In Jena, a graph is called a model and is represented by the Model interface.

The code to create this graph, or model, is simple:

// some definitions
static String personURI    = "http://somewhere/JohnSmith";
static String fullName     = "John Smith";
// create an empty Model
Model model = ModelFactory.createDefaultModel();
// create the resource
Resource johnSmith = model.createResource(personURI);
// add the property
johnSmith.addProperty(VCARD.FN, fullName);

It begins with some constant definitions and then creates an empty Model or model, using the ModelFactory method createDefaultModel() to create a memory-based model. Jena contains other implementations of the Model interface, e.g one which uses a relational database: these types of Model are also available from ModelFactory.

The John Smith resource is then created and a property added to it. The property is provided by a "constant" class VCARD which holds objects representing all the definitions in the VCARD schema. Jena provides constant classes for other well known schemas, such as RDF and RDF schema themselves, Dublin Core and DAML.

The code to create the resource and add the property, can be more compactly written in a cascading style:

Resource johnSmith =
model.createResource(personURI)
.addProperty(VCARD.FN, fullName);

The working code for this example can be found in the /src-examples directory of the Jena distribution as tutorial 1. As an exercise, take this code and modify it to create a simple VCARD for yourself.

Now let's add some more detail to the vcard, exploring some more features of RDF and Jena.

In the first example, the property value was a literal. RDF properties can also take other resources as their value. Using a common RDF technique, this example shows how to represent the different parts of John Smith's name:

Here we have added a new property, vcard:N, to represent the structure of John Smith's name. There are several things of interest about this Model. Note that the vcard:N property takes a resource as its value. Note also that the ellipse representing the compound name has no URI. It is known as an blank Node.

The Jena code to construct this example, is again very simple. First some declarations and the creation of the empty model.

// some definitions
String personURI    = "http://somewhere/JohnSmith";
String givenName    = "John";
String familyName   = "Smith";
String fullName     = givenName + " " + familyName;
// create an empty Model
Model model = ModelFactory.createDefaultModel();
// create the resource
//   and add the properties cascading style
Resource johnSmith
= model.createResource(personURI)
.addProperty(VCARD.FN, fullName)
.addProperty(VCARD.N,
model.createResource()
.addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));

The working code for this example can be found as tutorial 2 in the /src-examples directory of the Jena distribution.

Statements

Each arc in an RDF Model is called a statement. Each statement asserts a fact about a resource. A statement has three parts:

the subject is the resource from which the arc leaves
the predicate is the property that labels the arc
the object is the resource or literal pointed to by the arc

A statement is sometimes called a triple, because of its three parts.

An RDF Model is represented as a set of statements. Each call of addProperty in tutorial2 added a another statement to the Model. (Because a Model is set of statements, adding a duplicate of a statement has no effect.) The Jena model interface defines a listStatements() method which returns an StmtIterator, a subtype of Java's Iterator over all all the statements in a Model. StmtIterator has a method nextStatement() which returns the next statement from the iterator (the same one that next() would deliver, already cast to Statement). The Statement interface provides accessor methods to the subject, predicate and object of a statement.

Now we will use that interface to extend tutorial2 to list all the statements created and print them out. The complete code for this can be found in tutorial 3.

// list the statements in the Model
StmtIterator iter = model.listStatements();
// print out the predicate, subject and object of each statement
while (iter.hasNext()) {
Statement stmt      = iter.nextStatement();  // get next statement
Resource  subject   = stmt.getSubject();     // get the subject
Property  predicate = stmt.getPredicate();   // get the predicate
RDFNode   object    = stmt.getObject();      // get the object
System.out.print(subject.toString());
System.out.print(" " + predicate.toString() + " ");
if (object instanceof Resource) {
System.out.print(object.toString());
} else {
// object is a literal
System.out.print(" \"" + object.toString() + "\"");
}
System.out.println(" .");
}

Since the object of a statement can be either a resource or a literal, the getObject() method returns an object typed as RDFNode, which is a common superclass of both Resource and Literal. The underlying object is of the appropriate type, so the code uses instanceof to determine which and processes it accordingly.

When run, this program should produce output resembling:

http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#N anon:14df86:ecc3dee17b:-7fff .
anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Family  "Smith" .
anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Given  "John" .
http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#FN  "John Smith" .

Now you know why it is clearer to draw Models. If you look carefully, you will see that each line consists of three fields representing the subject, predicate and object of each statement. There are four arcs in the Model, so there are four statements. The "anon:14df86:ecc3dee17b:-7fff" is an internal identifier generated by Jena. It is not a URI and should not be confused with one. It is simply an internal label used by the Jena implementation.

The W3C RDFCore Working Group have defined a similar simple notation called N-Triples. The name means "triple notation". We will see in the next section that Jena has an N-Triples writer built in.

Writing RDF

Jena has methods for reading and writing RDF as XML. These can be used to save an RDF model to a file and later read it back in again.

Tutorial 3 created a model and wrote it out in triple form. Tutorial 4 modifies tutorial 3 to write the model in RDF XML form to the standard output stream. The code again, is very simple: model.write can take an OutputStream argument.

// now write the model in XML form to a file
model.write(System.out);

The output should look something like this:

<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'
>
<rdf:Description rdf:about='http://somewhere/JohnSmith'>
<vcard:FN>John Smith</vcard:FN>
<vcard:N rdf:nodeID="A0"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A0">
<vcard:Given>John</vcard:Given>
<vcard:Family>Smith</vcard:Family>
</rdf:Description>
</rdf:RDF>

The RDF specifications specify how to represent RDF as XML. The RDF XML syntax is quite complex. The reader is referred to the primer being developed by the RDFCore WG for a more detailed introduction. However, let's take a quick look at how to interpret the above.

RDF is usually embedded in an <rdf:RDF> element. The element is optional if there are other ways of know that some XML is RDF, but it is usually present. The RDF element defines the two namespaces used in the document. There is then an <rdf:Description> element which describes the resource whose URI is "http://somewhere/JohnSmith". If the rdf:about attribute was missing, this element would represent a blank node.

The <vcard:FN> element describes a property of the resource. The property name is the "FN" in the vcard namespace. RDF converts this to a URI reference by concatenating the URI reference for the namespace prefix and "FN", the local name part of the name. This gives a URI reference of "http://www.w3.org/2001/vcard-rdf/3.0#FN". The value of the property is the literal "John Smith".

The <vcard:N> element is a resource. In this case the resource is represented by a relative URI reference. RDF converts this to an absolute URI reference by concatenating it with the base URI of the current document.

There is an error in this RDF XML; it does not exactly represent the Model we created. The blank node in the Model has been given a URI reference. It is no longer blank. The RDF/XML syntax is not capable of representing all RDF Models; for example it cannot represent a blank node which is the object of two statements. The 'dumb' writer we used to write this RDF/XML makes no attempt to write correctly the subset of Models which can be written correctly. It gives a URI to each blank node, making it no longer blank.

Jena has an extensible interface which allows new writers for different serialization languages for RDF to be easily plugged in. The above call invoked the standard 'dumb' writer. Jena also includes a more sophisticated RDF/XML writer which can be invoked by specifying another argument to the write() method call:

// now write the model in XML form to a file
model.write(System.out, "RDF/XML-ABBREV");

This writer, the so called PrettyWriter, takes advantage of features of the RDF/XML abbreviated syntax to write a Model more compactly. It is also able to preserve blank nodes where that is possible. It is however, not suitable for writing very large Models, as its performance is unlikely to be acceptable. To write large files and preserve blank nodes, write in N-Triples format:

// now write the model in XML form to a file
model.write(System.out, "N-TRIPLE");

This will produce output similar to that of tutorial 3 which conforms to the N-Triples specification.

Reading RDF

Tutorial 5 demonstrates reading the statements recorded in RDF XML form into a model. With this tutorial, we have provided a small database of vcards in RDF/XML form. The following code will read it in and write it out. Note that for this application to run, the input file must be in the current directory.


// create an empty model
Model model = ModelFactory.createDefaultModel();
// use the FileManager to find the input file
InputStream in = FileManager.get().open( inputFileName );
if (in == null) {
throw new IllegalArgumentException(
"File: " + inputFileName + " not found");
}
// read the RDF/XML file
model.read(in, "");
// write it to standard out
model.write(System.out);

The second argument to the read() method call is the URI which will be used for resolving relative URI's. As there are no relative URI references in the test file, it is allowed to be empty. When run, tutorial 5 will produce XML output which looks like:

<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'
>
<rdf:Description rdf:nodeID="A0">
<vcard:Family>Smith</vcard:Family>
<vcard:Given>John</vcard:Given>
</rdf:Description>
<rdf:Description rdf:about='http://somewhere/JohnSmith/'>
<vcard:FN>John Smith</vcard:FN>
<vcard:N rdf:nodeID="A0"/>
</rdf:Description>
<rdf:Description rdf:about='http://somewhere/SarahJones/'>
<vcard:FN>Sarah Jones</vcard:FN>
<vcard:N rdf:nodeID="A1"/>
</rdf:Description>
<rdf:Description rdf:about='http://somewhere/MattJones/'>
<vcard:FN>Matt Jones</vcard:FN>
<vcard:N rdf:nodeID="A2"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A3">
<vcard:Family>Smith</vcard:Family>
<vcard:Given>Rebecca</vcard:Given>
</rdf:Description>
<rdf:Description rdf:nodeID="A1">
<vcard:Family>Jones</vcard:Family>
<vcard:Given>Sarah</vcard:Given>
</rdf:Description>
<rdf:Description rdf:nodeID="A2">
<vcard:Family>Jones</vcard:Family>
<vcard:Given>Matthew</vcard:Given>
</rdf:Description>
<rdf:Description rdf:about='http://somewhere/RebeccaSmith/'>
<vcard:FN>Becky Smith</vcard:FN>
<vcard:N rdf:nodeID="A3"/>
</rdf:Description>
</rdf:RDF>

Controlling Prefixes

explicit prefix definitions

In the previous section, we saw that the output XML declared a namespace prefix vcard and used that prefix to abbreviate URIs. While RDF uses only the full URIs, and not this shortened form, Jena provides was of controlling the namespaces used on output with its prefix mappings. Here's a simple example.


Model m = ModelFactory.createDefaultModel();
String nsA = "http://somewhere/else#";
String nsB = "http://nowhere/else#";
Resource root = m.createResource( nsA + "root" );
Property P = m.createProperty( nsA + "P" );
Property Q = m.createProperty( nsB + "Q" );
Resource x = m.createResource( nsA + "x" );
Resource y = m.createResource( nsA + "y" );
Resource z = m.createResource( nsA + "z" );
m.add( root, P, x ).add( root, P, y ).add( y, Q, z );
System.out.println( "# -- no special prefixes defined" );
m.write( System.out );
System.out.println( "# -- nsA defined" );
m.setNsPrefix( "nsA", nsA );
m.write( System.out );
System.out.println( "# -- nsA and cat defined" );
m.setNsPrefix( "cat", nsB );
m.write( System.out );

The output from this fragment is three lots of RDF/XML, with three different prefix mappings. First the default, with no prefixes other than the standard ones:


# -- no special prefixes defined
<rdf:RDF
xmlns:j.0="http://nowhere/else#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.1="http://somewhere/else#" >
<rdf:Description rdf:about="http://somewhere/else#root">
<j.1:P rdf:resource="http://somewhere/else#x"/>
<j.1:P rdf:resource="http://somewhere/else#y"/>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/else#y">
<j.0:Q rdf:resource="http://somewhere/else#z"/>
</rdf:Description>
</rdf:RDF>

We see that the rdf namespace is declared automatically, since it is required for tags such as <RDF:rdf> and <rdf:resource>. Namespace declarations are also needed for using the two properties P and Q, but since their namespaces have not been introduced to the model, they get invented namespace names: j.0 and j.1.

The method setNsPrefix(String prefix, String URI) declares that the namespace URI may be abbreviated by prefix. Jena requires that prefix be a legal XML namespace name, and that URI ends with a non-name character. The RDF/XML writer will turn these prefix declarations into XML namespace declarations and use them in its output:


# -- nsA defined
<rdf:RDF
xmlns:j.0="http://nowhere/else#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:nsA="http://somewhere/else#" >
<rdf:Description rdf:about="http://somewhere/else#root">
<nsA:P rdf:resource="http://somewhere/else#x"/>
<nsA:P rdf:resource="http://somewhere/else#y"/>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/else#y">
<j.0:Q rdf:resource="http://somewhere/else#z"/>
</rdf:Description>
</rdf:RDF>

The other namespace still gets the constructed name, but the nsA name is now used in the property tags. There's no need for the prefix name to have anything to do with the variables in the Jena code:


# -- nsA and cat defined
<rdf:RDF
xmlns:cat="http://nowhere/else#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:nsA="http://somewhere/else#" >
<rdf:Description rdf:about="http://somewhere/else#root">
<nsA:P rdf:resource="http://somewhere/else#x"/>
<nsA:P rdf:resource="http://somewhere/else#y"/>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/else#y">
<cat:Q rdf:resource="http://somewhere/else#z"/>
</rdf:Description>
</rdf:RDF>

Both prefixes are used for output, and no generated prefixes are needed.

implicit prefix definitions

As well as prefix declarations provided by calls to setNsPrefix, Jena will remember the prefixes that were used in input to model.read().

Take the output produced by the previous fragment, and paste it into some file, with URL file:/tmp/fragment.rdf say. Then run the code:


Model m2 = ModelFactory.createDefaultModel();
m2.read( "file:/tmp/fragment.rdf" );
m2.write( System.out );

You'll see that the prefixes from the input are preserved in the output. All the prefixes are written, even if they're not used anywhere. You can remove a prefix with removeNsPrefix(String prefix) if you don't want it in the output.

Since NTriples doesn't have any short way of writing URIs, it takes no notice of prefixes on output and doesn't provide any on input. The notation N3, also supported by Jena, does have short prefixed names, and records them on input and uses them on output.

Jena has further operations on the prefix mappings that a model holds, such as extracting a Java Map of the exiting mappings, or adding a whole group of mappings at once; see the documentation for PrefixMapping for details.

Jena RDF Packages

Jena is a Java API for semantic web applications. The key RDF package for the application developer is com.hp.hpl.jena.rdf.model. The API has been defined in terms of interfaces so that application code can work with different implementations without change. This package contains interfaces for representing models, resources, properties, literals, statements and all the other key concepts of RDF, and a ModelFactory for creating models. So that application code remains independent of the implementation, it is best if it uses interfaces wherever possible, not specific class implementations.

The com.hp.hpl.jena.tutorial package contains the working source code for all the examples used in this tutorial.

The com.hp.hpl.jena...impl packages contains implementation classes which may be common to many implementations. For example, they defines classes ResourceImpl, PropertyImpl, and LiteralImpl which may be used directly or subclassed by different implementations. Applications should rarely, if ever, use these classes directly. For example, rather than creating a new instance of ResourceImpl, it is better to use the createResource method of whatever model is being used. That way, if the model implementation has used an optimized implementation of Resource, then no conversions between the two types will be necessary.

Navigating a Model

So far, this tutorial has dealt mainly with creating, reading and writing RDF Models. It is now time to deal with accessing information held in a Model.

Given the URI of a resource, the resource object can be retrieved from a model using the Model.getResource(String uri) method. This method is defined to return a Resource object if one exists in the model, or otherwise to create a new one. For example, to retrieve the Adam Smith resource from the model read in from the file in tutorial 5:

// retrieve the John Smith vcard resource from the model
Resource vcard = model.getResource(johnSmithURI);

The Resource interface defines a number of methods for accessing the properties of a resource. The Resource.getProperty(Property p) method accesses a property of the resource. This method does not follow the usual Java accessor convention in that the type of the object returned is Statement, not the Property that you might have expected. Returning the whole statement allows the application to access the value of the property using one of its accessor methods which return the object of the statement. For example to retrieve the resource which is the value of the vcard:N property:

// retrieve the value of the N property
Resource name = (Resource) vcard.getProperty(VCARD.N)
.getObject();

In general, the object of a statement could be a resource or a literal, so the application code, knowing the value must be a resource, casts the returned object. One of the things that Jena tries to do is to provide type specific methods so the application does not have to cast and type checking can be done at compile time. The code fragment above, can be more conveniently written:

// retrieve the value of the FN property
Resource name = vcard.getProperty(VCARD.N)
.getResource();

Similarly, the literal value of a property can be retrieved:

// retrieve the given name property
String fullName = vcard.getProperty(VCARD.FN)
.getString();

In this example, the vcard resource has only one vcard:FN and one vcard:N property. RDF permits a resource to repeat a property; for example Adam might have more than one nickname. Let's give him two:

// add two nickname properties to vcard
vcard.addProperty(VCARD.NICKNAME, "Smithy")
.addProperty(VCARD.NICKNAME, "Adman");

As noted before, Jena represents an RDF Model as set of statements, so adding a statement with the subject, predicate and object as one already in the Model will have no effect. Jena does not define which of the two nicknames present in the Model will be returned. The result of calling vcard.getProperty(VCARD.NICKNAME) is indeterminate. Jena will return one of the values, but there is no guarantee even that two consecutive calls will return the same value.

If it is possible that a property may occur more than once, then the Resource.listProperties(Property p) method can be used to return an iterator which will list them all. This method returns an iterator which returns objects of type Statement. We can list the nicknames like this:

// set up the output
System.out.println("The nicknames of \""
+ fullName + "\" are:");
// list the nicknames
StmtIterator iter = vcard.listProperties(VCARD.NICKNAME);
while (iter.hasNext()) {
System.out.println("    " + iter.nextStatement()
.getObject()
.toString());
}

This code can be found in tutorial 6. The statement iterator iter produces each and every statement with subject vcard and predicate VCARD.NICKNAME, so looping over it allows us to fetch each statement by using nextStatement(), get the object field, and convert it to a string. The code produces the following output when run:

The nicknames of "John Smith" are:
Smithy
Adman

All the properties of a resource can be listed by using the listProperties() method without an argument.

Querying a Model

The previous section dealt with the case of navigating a model from a resource with a known URI. This section deals with searching a model. The core Jena API supports only a limited query primitive. The more powerful query facilities of RDQL are described elsewhere.

The Model.listStatements() method, which lists all the statements in a model, is perhaps the crudest way of querying a model. Its use is not recommended on very large Models. Model.listSubjects() is similar, but returns an iterator over all resources that have properties, ie are the subject of some statement.

Model.listSubjectsWithProperty(Property p, RDFNode o) will return an iterator over all the resources which have property p with value o. If we assume that only vcard resources will have vcard:FN property, and that in our data, all such resources have such a property, then we can find all the vcards like this:

// list vcards
ResIterator iter = model.listSubjectsWithProperty(VCARD.FN);
while (iter.hasNext()) {
Resource r = iter.nextResource();
...
}

All these query methods are simply syntactic sugar over a primitive query method model.listStatements(Selector s). This method returns an iterator over all the statements in the model 'selected' by s. The selector interface is designed to be extensible, but for now, there is only one implementation of it, the class SimpleSelector from the package com.hp.hpl.jena.rdf.model. Using SimpleSelector is one of the rare occasions in Jena when it is necessary to use a specific class rather than an interface. The SimpleSelector constructor takes three arguments:

Selector selector = new SimpleSelector(subject, predicate, object)

This selector will select all statements with a subject that matches subject, a predicate that matches predicate and an object that matches object. If a null is supplied in any of the positions, it matches anything; otherwise they match corresponding equal resources or literals. (Two resources are equal if they have equal URIs or are the same blank node; two literals are the same if all their components are equal.) Thus:

Selector selector = new SimpleSelector(null, null, null);

will select all the statements in a Model.

Selector selector = new SimpleSelector(null, VCARD.FN, null);

will select all the statements with VCARD.FN as their predicate, whatever the subject or object. As a special shorthand,

listStatements( S, P, O )

is equivalent to

listStatements( new SimpleSelector( S, P, O ) )

The following code, which can be found in full in tutorial 7 lists the full names on all the vcards in the database.

// select all the resources with a VCARD.FN property
ResIterator iter = model.listSubjectsWithProperty(VCARD.FN);
if (iter.hasNext()) {
System.out.println("The database contains vcards for:");
while (iter.hasNext()) {
System.out.println("  " + iter.nextStatement()
.getProperty(VCARD.FN)
.getString());
}
} else {
System.out.println("No vcards were found in the database");
}

This should produce output similar to the following:

The database contains vcards for:
Sarah Jones
John Smith
Matt Jones
Becky Smith

Your next exercise is to modify this code to use SimpleSelector instead of listSubjectsWithProperty.

Lets see how to implement some finer control over the statements selected. SimpleSelector can be subclassed and its selects method modified to perform further filtering:

// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
public boolean selects(Statement s)
{return s.getString().endsWith("Smith");}
});

This sample code uses a neat Java technique of overridding a method definition inline when creating an instance of the class. Here the selects(...) method checks to ensure that the full name ends with "Smith". It is important to note that filtering based on the subject, predicate and object arguments takes place before the selects(...) method is called, so the extra test will only be applied to matching statements.

The full code can be found in tutorial 8 and produces output like this:

The database contains vcards for:
John Smith
Becky Smith

You might think that:

// do all filtering in the selects method
StmtIterator iter = model.listStatements(
new
SimpleSelector(null, null, (RDFNode) null) {
public boolean selects(Statement s) {
return (subject == null   || s.getSubject().equals(subject))
&& (predicate == null || s.getPredicate().equals(predicate))
&& (object == null    || s.getObject().equals(object))
}
}
});

is equivalent to:

StmtIterator iter =
model.listStatements(new SimpleSelector(subject, predicate, object)

Whilst functionally they may be equivalent, the first form will list all the statements in the Model and test each one individually, whilst the second allows indexes maintained by the implementation to improve performance. Try it on a large Model and see for yourself, but make a cup of coffee first.

Operations on Models

Jena provides three operations for manipulating Models as a whole. These are the common set operations of union, intersection and difference.

The union of two Models is the union of the sets of statements which represent each Model. This is one of the key operations that the design of RDF supports. It enables data from disparate data sources to be merged. Consider the following two Models:

and

When these are merged, the two http://...JohnSmith nodes are merged into one and the duplicate vcard:FN arc is dropped to produce:

Lets look at the code to do this (the full code is in tutorial 9) and see what happens.

// read the RDF/XML files
model1.read(new InputStreamReader(in1), "");
model2.read(new InputStreamReader(in2), "");
// merge the Models
Model model = model1.union(model2);
// print the Model as RDF/XML
model.write(system.out, "RDF/XML-ABBREV");

The output produced by the pretty writer looks like this:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#">
<rdf:Description rdf:about="http://somewhere/JohnSmith/">
<vcard:EMAIL>
<vcard:internet>
<rdf:value>John@somewhere.com</rdf:value>
</vcard:internet>
</vcard:EMAIL>
<vcard:N rdf:parseType="Resource">
<vcard:Given>John</vcard:Given>
<vcard:Family>Smith</vcard:Family>
</vcard:N>
<vcard:FN>John Smith</vcard:FN>
</rdf:Description>
</rdf:RDF>

Even if you are unfamiliar with the details of the RDF/XML syntax, it should be reasonably clear that the Models have merged as expected. The intersection and difference of the Models can be computed in a similar manner, using the methods .intersection(Model) and .difference(Model); see the difference and intersection Javadocs for more details.

Containers

RDF defines a special kind of resources for representing collections of things. These resources are called containers. The members of a container can be either literals or resources. There are three kinds of container:

a BAG is an unordered collection
an ALT is an unordered collection intended to represent alternatives
a SEQ is an ordered collection

A container is represented by a resource. That resource will have an rdf:type property whose value should be one of rdf:Bag, rdf:Alt or rdf:Seq, or a subclass of one of these, depending on the type of the container. The first member of the container is the value of the container's rdf:_1 property; the second member of the container is the value of the container's rdf:_2 property and so on. The rdf:_nnn properties are known as the ordinal properties.

For example, the Model for a simple bag containing the vcards of the Smith's might look like this:

Whilst the members of the bag are represented by the properties rdf:_1, rdf:_2 etc the ordering of the properties is not significant. We could switch the values of the rdf:_1 and rdf:_2 properties and the resulting Model would represent the same information.

Alt's are intended to represent alternatives. For example, lets say a resource represented a software product. It might have a property to indicate where it might be obtained from. The value of that property might be an Alt collection containing various sites from which it could be downloaded. Alt's are unordered except that the rdf:_1 property has special significance. It represents the default choice.

Whilst containers can be handled using the basic machinery of resources and properties, Jena has explicit interfaces and implementation classes to handle them. It is not a good idea to have an object manipulating a container, and at the same time to modify the state of that container using the lower level methods.

Let's modify tutorial 8 to create this bag:

// create a bag
Bag smiths = model.createBag();
// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
public boolean selects(Statement s) {
return s.getString().endsWith("Smith");
}
});
// add the Smith's to the bag
while (iter.hasNext()) {
smiths.add(iter.nextStatement().getSubject());
}

If we write out this Model, it contains something like the following:

<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'
>
...
<rdf:Description rdf:nodeID="A3">
<rdf:type rdf:resource='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
<rdf:_1 rdf:resource='http://somewhere/JohnSmith/'/>
<rdf:_2 rdf:resource='http://somewhere/RebeccaSmith/'/>
</rdf:Description>
</rdf:RDF>

which represents the Bag resource.

The container interface provides an iterator to list the contents of a container:

// print out the members of the bag
NodeIterator iter2 = smiths.iterator();
if (iter2.hasNext()) {
System.out.println("The bag contains:");
while (iter2.hasNext()) {
System.out.println("  " +
((Resource) iter2.next())
.getProperty(VCARD.FN)
.getString());
}
} else {
System.out.println("The bag is empty");
}

which produces the following output:

The bag contains:
John Smith
Becky Smith

Executable example code can be found in tutorial 10, which glues together the fragments above into a complete example.

The Jena classes offer methods for manipulating containers including adding new members, inserting new members into the middle of a container and removing existing members. The Jena container classes currently ensure that the the list of ordinal properties used starts at rdf:_1 and is contiguous. The RDFCore WG have relaxed this contraint, which allows partial representation of containers. This therefore is an area of Jena may be changed in the future.

More about Literals and Datatypes

RDF literals are not just simple strings. Literals may have a language tag to indicate the language of the literal. The literal "chat" with an English language tag is considered different to the literal "chat" with a French language tag. This rather strange behaviour is an artefact of the original RDF/XML syntax.

Further there are really two sorts of Literals. In one, the string component is just that, an ordinary string. In the other the string component is expected to be a well balanced fragment of XML. When an RDF Model is written as RDF/XML a special construction using a parseType='Literal' attribute is used to represent it.

In Jena, these attributes of a literal may be set when the literal is constructed, e.g. in tutorial 11:

// create the resource
Resource r = model.createResource();
// add the property
r.addProperty(RDFS.label, model.createLiteral("chat", "en"))
.addProperty(RDFS.label, model.createLiteral("chat", "fr"))
.addProperty(RDFS.label, model.createLiteral("<em>chat</em>", true));
// write out the Model
model.write(system.out);

produces

<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
>
<rdf:Description rdf:nodeID="A0">
<rdfs:label xml:lang='en'>chat</rdfs:label>
<rdfs:label xml:lang='fr'>chat</rdfs:label>
<rdfs:label rdf:parseType='Literal'><em>chat</em></rdfs:label>
</rdf:Description>
</rdf:RDF>

For two literals to be considered equal, they must either both be XML literals or both be simple literals. In addition, either both must have no language tag, or if language tags are present they must be equal. For simple literals the strings must be equal. XML literals have two notions of equality. The simple notion is that the conditions previously mentioned are true and the strings are also equal. The other notion is that they can be equal if the cannonicalization of their strings is equal.

Jena's interfaces also support typed literals. The old-fashioned way (shown below) treats typed literals as shorthand for strings: typed values are converted in the usual Java way to strings and these strings are stored in the Model. For example, try (noting that for simple literals, we can omit the model.createLiteral(...) call):

// create the resource
Resource r = model.createResource();
// add the property
r.addProperty(RDFS.label, "11")
.addProperty(RDFS.label, 11);
// write out the Model
model.write(system.out, "N-TRIPLE");

The output produced is:

_:A... <http://www.w3.org/2000/01/rdf-schema#label> "11" .

Since both literals are really just the string "11", then only one statement is added.

The RDFCore WG has defined mechanisms for supporting datatypes in RDF. Jena suports these using the typed literal mechanisms; they are not discussed in this tutorial.

Glossary

Blank Node: Represents a resource, but does not indicate a URI for the resource. Blank nodes act like existentially qualified variables in first order logic.
Dublin Core: A standard for metadata about web resources. Further information can be found at the Dublin Core web site.
Literal: A string of characters which can be the value of a property.
Object: The part of a triple which is the value of the statement
Predicate: The property part of a triple.
Property: A property is an attribute of a resource. For example DC.title is a property, as is RDF.type.
Resource: Some entity. It could be a web resource such as web page, or it could be a concrete physical thing such as a tree or a car. It could be an abstract idea such as chess or football. Resources are named by URI's.
Statement: An arc in an RDF Model, normally interpreted as a fact.
Subject: The resource which is the source of an arc in an RDF Model
Triple: A structure containing a subject, a predicate and an object. Another term for a statement.

译文:
RDF和Jena RDF API入门
________________________________________
前言
本文是一篇对W3C的资源描述框架(RDF)和 Jena(一个Java的RDF API)的教程性介绍. 本文是为那些不熟悉RDF的, 以及那些通过建立原形可以达到最好学习效果的, 或是因为其他原因希望能快速操作Jena的程序员而写的. 我们假设读者在阅读本文前已具有一定的XML和Java知识.
如果读者在没有理解RDF数据模型的基础上就迅速进入操作阶段,往往会导致失败和失望. 然而,如果光学习数据模型又是十分枯燥乏味的, 并常常会导致曲折的形而上学的难题. 更好的学习办法是在理解数据模型的同时练习操作它. 可以先学习一点数据模型再动手试一试.然后在学习一点再试一试. 这样一来就能达到理论实践相结合的效果.数据模型本身十分简单,所以学习过程不会太长.
RDF具有XML的语法, 所以许多熟悉XML的人就会认为以XML语法的形式来思考RDF. 然而, 这是不对的. RDF应该以它数据模型的形式来被理解. RDF数据可是用XML来表示, 但是理解数据模型的重要性更在理解此语法重要性之上.
Jena API的一个运行例子, 包括本教程中所有例子的工作源代码都可以在http://www.hpl.hp.com/semweb/下载.
________________________________________
目录
1. 导言
2. 陈述Statements
3. RDF写操作
4. RDF读操作
5. Jena RDF 包
6. 操纵模型
7. 查询模型
8. 对模型的操作
9. 容器Containers
10. 关于Literals和数据类型的更多探讨
11. 术语表
________________________________________
导言
资源描述框架是(RDF)是描述资源的一项标准(在技术上是W3C的推荐标准). 什么是资源? 这实在是一个很难回答的问题, 其精确的定义目前尚在争论中. 出于我们的目的, 我们可以把资源想象成任何我们可以确定识别的东西. 在本教程中,读者你本身就是一个资源, 而你的主页也是一个资源, 数字1和故事中巨大的白鲸都是资源.
在本教程中, 我们的例子会围绕人们展开. 假设人们会使用VCARDS, 而VCARD将由RDF表示机制来描述. 我们最好把RDF考虑成由结点和箭头的形式构成的图. 一个简单的vcard在RDF中可能看起来是这样的:

资源John Smith在图中用椭圆表示, 并被一个统一资源定位符(URI) 所标识, 在本例中是"http://.../JohnSmith"). 如果你想要通过你的浏览器来访问这个资源的话,你很有可能会失败. 四月的愚人节笑话并不经得起考验, 相反如果你的浏览器把John Smith传递到你的桌面的话, 你才该感到惊讶. 如果你并不熟悉URI's的话, 你可以把它们想象成简单的陌生名字.
资源拥有属性(property). 在这些例子中, 我们对John Smith名片上出现的那些属性很感兴趣.图1只显示了一个属性, John Smith的全名. 属性是由标有属性名的箭头表示的. 属性的名字也是一个URI, 但是由于URI十分冗长笨重, 所以图中将它显示为XML qname的形式. 在':'之前的部分称为命名空间前缀并表示了一个命名空间. 在':'之后的部分称为局部名, 并表示在命名空间中的一个名字. 在写成RDF XML形式时, 属性常常以qname的形式表示, 这是一个在图形和文本中的简单的缩写方法. 然而, 严格地讲, 属性应该用URI来标识. 命名空间前缀:局部名的形式是一种命名空间连接局部名的URI缩写. 当浏览器访问时, 用并没有强制属性的URI必须指向一些具体的事物.
每个属性都有一个值. 在此例中, 值为一个文本(literal), 我们现在可以把它看成一个字符串.文本在图中显示为长方形.
Jena是一个Java API, 我们可以用它来创建和操纵诸如上述例图的RDF图. Jena设有表示图(graph), 资源(resource), 属性和文本(literal)的对象类. 表示资源, 属性和文本的接口分别称为Resource, Property, 和Literal. 在Jena中, 一个图(graph)被称为一个模型并被Model接口所表示.
创建上述例图或称为上述模型的代码很简单:
// some definitions
static String personURI = "http://somewhere/JohnSmith";
static String fullName = "John Smith";

// create an empty Model
Model model = ModelFactory.createDefaultModel();

// create the resource
Resource johnSmith = model.createResource(personURI);

// add the property
johnSmith.addProperty(VCARD.FN, fullName);

这些代码先定义了一些常量, 然后使用了ModelFactory类中的createDefaultMode()方法创建了一个空的基于内存存储的模型(Model 或 model). Jena还包含了Model接口的其他实现方式. 例如, 使用关系数据库的, 这些类型 Model接口也可以从ModelFactory中创建.
于是John Smith这个资源就被创建了, 并向其添加了一个属性. 此属性由一个"常" ("constant")类VCARD提供, 这个类保存了在VCARD模式(schema)中所有定义的表示对象. Jena也为其他一些著名的模式提供了常类的表示方法, 例如是RDF和RDF模式, Dublin 核心标准和DAML.
创建资源和添加属性的代码可以写成更紧凑的层叠形式:
Resource johnSmith =
        model.createResource(personURI)
             .addProperty(VCARD.FN, fullName);
这个例子的工作代码可以在Jena发布的材料的教程包中的Tutorial1中找到. 作为练习, 你自己可以获得此代码并修改其以创建一个简单VCARD.
现在让我们为vcard再增加一些更详细的内容, 以便探索更多的RDF和Jena的特性.
在第一个例子里, 属性值为一个文本. 然而RDF属性也可以采用其他的资源作为其属性值. 下面这个例子使用常用的RDF技术展示了如何表示John Smith名字的不同部分:

在这里我们增加了一个新的属性, vcard:N, 来表示John Smith名字的结构. 这个模型有几点有趣之处. 注意属性vcard:N使用一个资源作为起属性值. 同时注意代表复合名字的椭圆并没有URI标识. 它被认为是一个空白结点(blank Node).
创建此例的Jena代码也十分简单. 首先是一些声明和对空模型的创建.
// some definitions
String personURI    = "http://somewhere/JohnSmith";
String givenName    = "John";
String familyName   = "Smith";
String fullName     = givenName + " " + familyName;

// create an empty Model
Model model = ModelFactory.createDefaultModel();

// create the resource
//   and add the properties cascading style
Resource johnSmith
  = model.createResource(personURI)
         .addProperty(VCARD.FN, fullName)
         .addProperty(VCARD.N,
                      model.createResource()
                           .addProperty(VCARD.Given, givenName)
                           .addProperty(VCARD.Family, familyName));
此例的工作代码可以在Jena发布材料的教程包的Tutorial2中得到.
________________________________________
陈述
RDF模型中的每一个箭头表示为一个陈述(statement). 每一个陈述声明了关于某个资源的某个事实. 一个陈述有三部分组成.
主体, 也就是箭头的出发的资源.
谓词, 也就是标识箭头的属性.
客体, 也就是箭头所指向的那个资源或文本.
一个陈述有时也叫做一个三元组的原因就是它由三部分组成.
一个RDF模型(译者注: 指Jena中的接口Model)是由一组陈述所组成的. 在Tutorial2中, 每调用一次addProperty函数就会在模型中增加另一个陈述. (因为一个模型是由一组陈述组成的, 所以增加一个重复的陈述并不会产生任何意义.) Jena模型接口定义了一个listStatements()方法, 此方法会返回一个StmtIterator类型的变量. StmtItor是Java中Iterator的一个子类型, 这个StmtIterator变量重复迭代了该接口模型中的所有陈述. StmtIterator类型中有一个方法nextStatement(), 该方法会从iterator返回下一个陈述. (就和next()返回的一样, 但是已将其映射为Statement类型). 接口Statement提供了访问陈述中主体, 谓词和客体的方法.
现在我们会用使用那个接口来扩展Tutorial2, 使起列出所有的创建的陈述并将它们打印出来. 此例完整的代码可以在Tutorial3中找到.
// list the statements in the Model
StmtIterator iter = model.listStatements();

// print out the predicate, subject and object of each statement
while (iter.hasNext()) {
    Statement stmt      = iter.nextStatement();  // get next statement
    Resource  subject   = stmt.getSubject();     // get the subject
    Property  predicate = stmt.getPredicate();   // get the predicate
    RDFNode   object    = stmt.getObject();      // get the object

    System.out.print(subject.toString());
    System.out.print(" " + predicate.toString() + " ");
    if (object instanceof Resource) {
       System.out.print(object.toString());
    } else {
        // object is a literal
        System.out.print(" \"" + object.toString() + "\"");
    }

    System.out.println(" .");
}
因为一个陈述的客体可以是一个资源也可以是一个文本. getObject()方法会返回一个类型为RDFNode的客体, RDFNode是Resource和Literal类共同的超类. 为了确定本例中的客体确切的类型, 代码中使用 instanceof来确定其类型和相应的处理.
运行后, 此程序回产生与此相似的输出:
http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#N anon:14df86:ecc3dee17b:-7fff.
anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Family  "Smith".
anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Given  "John" .
http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#FN  "John Smith".

现在你明白了为什么模型构建会更加清晰. 如果你仔细观察, 就会发现上面每一行都由三个域组成, 这三个域分别代表了每一个陈述的主体, 谓词和客体. 在此模型中有四个箭头, 所以会有四个陈述. "anon:14df86:ecc3dee17b:-7fff"是有Jena产生的一个内部标识符, 它不是一个URI, 也不应该与URI混淆. 它只是Jena处理时使用的一个内部标号.
W3C的RDF核心工作小组定义了一个类似的表示符号称为N-三元组(N-Triples). 这个名字表示会使用"三元组符号". 在下一节中我们会看到Jena有一个内置的N-三元组写机制(writer).
________________________________________
写RDF
Jena设有读写XML形式的RDF方法. 这些方法可以被用来将一个RDF模型保存到文件并在日后重新将其读回.
Tutorial3创建了一个模型并将其以三元组的形式输出. Tutorial4对Tutorial3做了修改, 使其将此模型以RDF XML的形式输出到标准输出流中. 这个代码依然十分简单: model.write可以带一个OutputStream的参数.
// now write the model in XML form to a file
model.write(System.out);

应该有类似的输出:
<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'
>
  <rdf:Description rdf:about='http://somewhere/JohnSmith'>
    <vcard:FN>John Smith</vcard:FN>
    <vcard:N rdf:nodeID="A0"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A0">
    <vcard:Given>John</vcard:Given>
    <vcard:Family>Smith</vcard:Family>
  </rdf:Description>
</rdf:RDF>

W3C的RDF规格说明书规定了如何用 XML的形式来表示RDF. RDF XML的语法十分复杂. 读者可以在RDF核心工作小组制定的RDF入门篇(primer)中找到更详细的指导. 但是不管怎么样, 让我们先迅速看一下应该如何解释上面的RDF XML输出
RDF常常嵌入在一个<rdf:RDF>元素中. 如果有其他的方法知道此XML是RDF的话,该元素是可以不写的. 然而我们常常会使用它. 在这个RDF元素中定义了两个在本文档中使用的命名空间. 接下来是一个<rdf:Description>元素, 此元素描述了URI为"http://somewhere/JohnSmith"的资源. 如果其中的rdf:about属性被省略的话, 这个元素就表示一个空白结点.
<vcard:FN>元素描述了此资源的一个属性. 属性的名字"FN"是属于vcard命名空间的. RDF会通过连接命名空间前缀的URI和名字局部名"FN"来形成该资源的URI "http://www.w3.org/2001/vcard-rdf/3.0#FN". 这个属性的值为文本"John Smith".
<vcard:N>元素是一个资源. 在此例中, 这个资源是用一个相对URI来表示的. RDF会通过连接这个相对URI和此文档的基准URI来把它转换为一个绝对URI.
但是, 在这个RDF XML输出中有一个错误, 它并没有准确地表示我们所创建的模型. 模型中的空白结点被分配了一个URI. 它不再是空白的了. RDF/XML语法并不能表示所有的RDF模型. 例如它不能表示一个同时是两个陈述的客体的空白结点. 我们用来写这个RDF/XML的'哑'writer方法并没有试图去正确的书写这个模型的子集, 虽然其原本可以被正确书写. 它给每一个空白结点一个URI, 使其不再空白.
Jena有一个扩展的接口, 它允许新的为不同的RDF串行化语言设计的writer可以被轻易地插入. 以上的调用会激发一个标准的'哑'writer方法. Jena也包含了一个更加复杂的RDF/XML writer, 它可以被用携带另一个参数的write()方法所调用.
// now write the model in XML form to a file
model.write(System.out, "RDF/XML-ABBREV");

此writer, 也就是所谓的PrettyWriter, 利用RDF/XML缩写语法把模型写地更为紧凑. 它也能保存尽可能保留空白结点. 然而, 它并不合适来输出大的模型. 因为它的性能不可能被人们所接受. 要输出大的文件和保留空白结点, 可以用N-三元组的形式输出:
// now write the model in XML form to a file
model.write(System.out, "N-TRIPLE");

这会产生类似于Tutorial3的输出, 此输出会遵循N-三元组的规格.

________________________________________
读RDF
Tutorial 5 演示了如何将用RDF XML记录的陈述读入一个模型. 在此例中, 我们提供了一个小型RDF/XML形式的vcard的数据库. 下面代码会将其读入和写出. 注意: 如果要运行这个小程序, 应该把输入文件放在你的classpath所指向的目录或jar中.

// create an empty model
Model model = ModelFactory.createDefaultModel();

// use the class loader to find the input file
InputStream in = Tutorial05.class
                               .getClassLoader()
                               .getResourceAsStream(inputFileName);
if (in == null) {
    throw new IllegalArgumentException(
                                 "File: " + inputFileName + " not found");
}

// read the RDF/XML file
model.read(new InputStreamReader(in), "");

// write it to standard out
model.write(System.out);

read()方法中的第二个参数是一个URI, 它是被用来解决相对URI的. 因为在测试文件中没有使用相对URI, 所以它允许被置为空值. 运行时, Tutorial5会产生类似如下的XML输出
<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'
>
  <rdf:Description rdf:nodeID="A0">
    <vcard:Family>Smith</vcard:Family>
    <vcard:Given>John</vcard:Given>
  </rdf:Description>
  <rdf:Description rdf:about='http://somewhere/JohnSmith/'>
    <vcard:FN>John Smith</vcard:FN>
    <vcard:N rdf:nodeID="A0"/>
  </rdf:Description>
  <rdf:Description rdf:about='http://somewhere/SarahJones/'>
    <vcard:FN>Sarah Jones</vcard:FN>
    <vcard:N rdf:nodeID="A1"/>
  </rdf:Description>
  <rdf:Description rdf:about='http://somewhere/MattJones/'>
    <vcard:FN>Matt Jones</vcard:FN>
    <vcard:N rdf:nodeID="A2"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A3">
    <vcard:Family>Smith</vcard:Family>
    <vcard:Given>Rebecca</vcard:Given>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A1">
    <vcard:Family>Jones</vcard:Family>
    <vcard:Given>Sarah</vcard:Given>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A2">
    <vcard:Family>Jones</vcard:Family>
    <vcard:Given>Matthew</vcard:Given>
  </rdf:Description>
  <rdf:Description rdf:about='http://somewhere/RebeccaSmith/'>
    <vcard:FN>Becky Smith</vcard:FN>
    <vcard:N rdf:nodeID="A3"/>
  </rdf:Description>
</rdf:RDF>

________________________________________
Jena RDF 包
Jena是一个为语义网应用设计的一个Java API. 对应用开发者而言, 主要可用的RDF包是com.hp.hpl.jena.rdf.model. 因为API是以接口的方式定义的, 所以应用代码可以使用不同的实现机制而不用改变代码本身. 这个包包含了可以表示模型, 资源, 属性, 文本, 陈述和其他RDF关键概念的接口, 还有一个用来创建模型的ModelFactory. 所以如果要应用代码与实现类保持独立, 最好尽可能地使用接口, 而不要使用特定的实现类.
com.hp.hpl.jena.Tutorial包包含了本教程所有例子中使用到的工作源代码.
com.hp.hpl.jena.impl这些包包含了许多执行时所常用的执行类. 比如, 它们定义了诸如ResourseImpl, PropertyImpl和LiteralImpl的类, 这些类可以被不同的应用直接使用也可以被继承使用. 应用程序应该尽可能少地直接使用这些类. 例如, 与其使用ResouceImpl来创建一个新的实例, 更好的办法是使用任何正在使用的模型的createResource方法来完成. 那样的话, 如果模型的执行采用了一个优化的Resouce执行, 那么在这两种类型中不需要有任何的转换工作.

________________________________________
操纵模型
到目前为止, 本教程主要讲述的是如何创建, 读入和输出RDF模型. 现在是时候要讲述如何访问模型中的信息.
如果有了一个资源的URI, 那么就可以用Model.getResource(String uri)来从模型获取这个资源对象. 这个方法被定义来返回一个资源对象, 如果它确实存在于模型中, 否则的话就创建一个新的. 例如, 如何从模型中获取Adam Smith资源, 这个模型是Tutorial5中从文件读入的:
// retrieve the John Smith vcard resource from the model
Resource vcard = model.getResource(johnSmithURI);

Resouce接口定义了一系列用于访问某个资源的属性的方法. Resource.getProperty(Property p)方法访问了该资源的属性. 这个方法不允许通常的Java访问的转换, 因为所返回的对象是Statement, 而不是你所预计的Property. 返回整个陈述的好处是允许应用程序通过使用它的某个访问方法来访问该陈述的客体来访问这个属性值. 例如如何获取作为vcard:N属性值的资源:
// retrieve the value of the N property
Resource name = (Resource) vcard.getProperty(VCARD.N)
.getObject();

一般而言, 一个陈述的客体可以是一个资源或是一个文本. 所以此应用程序代码知道这个值一定是个资源, 就将类型资源映射到返回的对象上. Jena的目标之一是提供会返回值为特定类型的方法, 这样,应用程序就不必再做类型转换工作, 也不必再编译时做类型检查工作. 以上的代码片段也可以写成更方便的形式:
// retrieve the value of the FN property
Resource name = vcard.getProperty(VCARD.N)
.getResource();
类似地, 属性的文本值也可以被获取:
// retrieve the given name property
String fullName = vcard.getProperty(VCARD.FN)
.getString();

在这个例子中, 资源vcard只有一个vcard:FN属性和一个vcard:N属性. RDF允许资源有重复的属性, 例如Adam可能有超过一个的昵称. 让我们假设他有两个昵称:
// add two nickname properties to vcard
vcard.addProperty(VCARD.NICKNAME, "Smithy")
.addProperty(VCARD.NICKNAME, "Adman");

正如前面所提到的那样, Jena将RDF模型表示为一组陈述, 所以在模型中新增一个与原有陈述有着相同的主体,谓词和客体的陈述并不会后什么作用. Jena没有定义会返回模型中存在的两个昵称中的哪一个. Vcard.getProperty(VCARD.NICKNAME)调用的结果是不确定的. Jena会返回这些值中的某一个, 但是并不保证两次连续的调用会同一个值.
一个属性很有可能会出现多次, 而方法Resource.listProperty(Property p)可以用来返回一个iterator, 这个iterator会列出所有的值. 此方法所返回的iterator返回的对象的类型为Statement.我们可以像这样列出所有的昵称:
// set up the output
System.out.println("The nicknames of \""
                      + fullName + "\" are:");
// list the nicknames
StmtIterator iter = vcard.listProperties(VCARD.NICKNAME);
while (iter.hasNext()) {
    System.out.println("    " + iter.nextStatement()
                                    .getObject()
                                    .toString());
}

此代码可以在Tutorial6中找到, 运行后会产生如下输出:
The nicknames of "John Smith" are:
Smithy
Adman
一个资源的所有属性可以用不带参数的listStatement()方法列出.
________________________________________
查询模型
前一节讨论了如何通过一个有着已知URI的资源来操纵模型. 本节要讨论查询模型. 核心的Jena API只支持一些有限的查询原语. 对于更强大查询设备RDQL的介绍不在此文档中.
列出模型所有陈述的Model.listStatements()方法也许是最原始的查询模型方式. 然而并不推荐在大型的模型上使用这个方法. 类似的有Model.listSubjects(), 但其所返回的iterator会迭代所有含有属性的资源, 例如是一些陈述的主体.
Model.listSubjectsWithProperty(Property p, RDFNode o)方法所返回的iterator跌代了所有具有属性p且p属性的值为o的资源. 我们可能会预计使用rdf:type属性来搜索资源的类型属性以获得所有的vcard资源:
// retrieve all resource of type Vcard.
ResIterator iter = model.listSubjectsWithProperty(RDF.type, VCARD.Vcard);

然而, 不幸的是, 我们现在正在使用的vcard模式并没有为vcard定义类型. 然而, 如果我们假设只有类型为vcard的资源才会使用vcard:FN属性, 并且在我们的数据中, 所有此类资源都有这样一个属性, 那么我们就可以像这样找到所有的vcard:
// list vcards
ResIterator iter = model.listSubjectsWithProperty(VCARD.FN);
while (iter.hasNext()) {
Resource r = iter.nextResource();
...
}

所有的这些查询方法不过是在原语查询方法model.listStatements(Select s)上稍做变化而已. 此方法会返回一个iterator, 该iterator会跌代模型中所有被s选中的陈述. 这个selector接口被设计成可扩展的, 但是目前, 它只有一个执行类,那就是com.hp.hpl.jena.rdf.model包中的SimpleSelector类. 在Jena中使用SimpleSelector是很少见的情况, 即当需要直接使用一个特定类而不是使用接口. SimpleSelector的构造函数带有三个参数:

Selector selector = new SimpleSelector(subject, predicate, object)

这个selector会选择所有主体与参数subject相配, 谓词与参数predicate相配, 且客体与参数object相配的陈述.
如果某个参数值为null, 则它表示与任何值均匹配; 否则的话, 它们就会匹配一样的资源或文本. (当两个资源有相同的URI或是同一个空白结点时, 这两个资源就是一样的; 当两个文本是一样的当且仅当它们的所有成分都是一样的.) 所以:
Selector selector = new SimpleSelector(null, null, null);

会选择模型中所有的陈述.
Selector selector = new SimpleSelector(null, VCARD.FN, null);

会选择所有谓词为VCARD.FN的陈述, 而对主体和客体没有要求. 作为一个特殊的缩写,
listStatements( S, P, O )
等同与
listStatements( new SimpleSelector( S, P, O ) )
下面的代码可以在Tutorial7中找到, 列举了数据库中所有vcard中的全名.
// select all the resources with a VCARD.FN property
ResIterator iter = model.listSubjectsWithProperty(VCARD.FN);
if (iter.hasNext()) {
    System.out.println("The database contains vcards for:");
    while (iter.hasNext()) {
        System.out.println("  " + iter.nextStatement()
                                      .getProperty(VCARD.FN)
                                      .getString());
    }
} else {
    System.out.println("No vcards were found in the database");
}

这个会产生类似如下的输出:
The database contains vcards for:
  Sarah Jones
  John Smith
  Matt Jones
  Becky Smith

你的下一个练习是修改此代码以使用SimpleSelector而不是使用listSubjectsWithProperty来达到相同的效果.
让我们看看如何对所选择的陈述实行更好的控制. SimpleSelector可以被继承, 它的select方法可以被修改来实现更好的过滤:
// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
    new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
        public boolean selects(Statement s)
            {return s.getString().endsWith("Smith");}
});

这个示例使用了一个简洁的Java技术, 就是当创建此类的一个实例时重载一个内联的方法. 这里selects(…)方法会检查以保证全名以"Smith"做结尾. 重要的是要注意对主体, 谓语和客体的过滤是在调用selects(…)方法之前的执行的, 所以额外的测试只会被应用于匹配的陈述.
完整的代码可以在Tutorial8中找到, 并会产生如下的输出:
The database contains vcards for:
  John Smith
  Becky Smith
你也许会认为下面的代码:
// do all filtering in the selects method
StmtIterator iter = model.listStatements(
  new
      SimpleSelector(null, null, (RDFNode) null) {
          public boolean selects(Statement s) {
              return (subject == null   || s.getSubject().equals(subject))
                  && (predicate == null || s.getPredicate().equals(predicate))
                  && (object == null    || s.getObject().equals(object))
          }
     }
});
等同于:
StmtIterator iter =
  model.listStatements(new SimpleSelector(subject, predicate, object)

虽然在功能上它们可能是等同的, 但是第一种形式会列举出模型中所有的陈述, 然后再对它们进行逐一的测试. 而第二种形式则允许执行时建立索引来提供性能. 你可以在一个大模型中自己试试验证一下, 但是现在先为自己倒一杯咖啡休息一下吧.
________________________________________
对模型的操作
Jena提供了把模型当作一个集合整体来操纵的三种操作方法. 即三种常用的集合操作:并, 交和差.
两个模型的并操作就是把表示两个模型的陈述集的并操作. 这是RDF设计所支持的关键操作之一. 它此操作允许把分离的数据源合并到一起. 考虑下面两个模型:

和
当它们被合并时, 两个http://...JohnSmith会合并成一个, 重复的vcard:FN箭头会被丢弃, 此时就会产生:

让我们看一下这个代码的功能(完整的代码在Tutorial9中), 再看看会发生什么.
// read the RDF/XML files
model1.read(new InputStreamReader(in1), "");
model2.read(new InputStreamReader(in2), "");

// merge the Models
Model model = model1.union(model2);

// print the Model as RDF/XML
model.write(system.out, "RDF/XML-ABBREV");

petty writer会产生如下的输出:
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#">
  <rdf:Description rdf:about="http://somewhere/JohnSmith/">
    <vcard:EMAIL>
      <vcard:internet>
        <rdf:value>John@somewhere.com</rdf:value>
      </vcard:internet>
    </vcard:EMAIL>
    <vcard:N rdf:parseType="Resource">
      <vcard:Given>John</vcard:Given>
      <vcard:Family>Smith</vcard:Family>
    </vcard:N>
    <vcard:FN>John Smith</vcard:FN>
  </rdf:Description>
</rdf:RDF>

即便你不熟悉RDF/XML的语法细节, 你仍然可以清楚的看见模型如同预期般被合并了. 我们可以在类似的方式下运算模型的交操作和差操作.
________________________________________
容器
RDF定义了一类特殊的资源来表示事物的集合. 这些资源称为容器. 一个容器的成员可以是资源也可以是文本. 有三类容器:
一个BAG是一个无序的集合.
一个ALT是一个用来表示备选项的无序的集合.
一个SEQ是一个有序的集合.
一个容器由一个资源表示. 该资源会有一个rdf:type属性, 属性值为rdf:Bag, 或rdf:Alt, 或是rdf:Seq, 再或是这些类型的子类型, 这取决于容器的类型. 容器的第一个成员是容器的rdf:_1的属性所对应的属性值; 第二个成员是容器的rdf:_2属性的值, 依此类推. 这些rdf:_nnn属性被称为序数属性.
例如, 一个含有Smith的vcard的简单bag容器的模型可能会看起来是这样的:

虽然bag容器的成员们被rdf:_1,rdf_2等等的属性所表示, 但是这些属性的顺序却并不重要. 即便我们将rdf:_1和rdf:_2的属性值交换, 但是交换后的模型仍然表示相同的信息.
Alt是设计用来表示被选项的. 例如, 我们假定有一个表示软件产品的资源. 它可能有一个属性指示从哪里可以获得次软件产品. 这个属性值可能是一个包含各种下载地址的Alt集合. Alt是无序的, 除了rdf:_1属性有着特殊的意义, 它表示默认的选项.
尽管我们可以用基本的资源和属性机制来处理容器, Jena为处理容器设计了显式的接口和执行类. 要避免在使用一个对象来操作容器的同时去使用低层方法来改变容器.
让我们修改Tutorial8以创建一个bag:
// create a bag
Bag smiths = model.createBag();

// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
    new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
        public boolean selects(Statement s) {
                return s.getString().endsWith("Smith");
        }
    });
// add the Smith's to the bag
while (iter.hasNext()) {
    smiths.add(iter.next().getSubject());
}

如果我们将次模型输出可以看见它含有类似如下的成分:
<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'
>
...
  <rdf:Description rdf:nodeID="A3">
    <rdf:type rdf:resource='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
    <rdf:_1 rdf:resource='http://somewhere/JohnSmith/'/>
    <rdf:_2 rdf:resource='http://somewhere/RebeccaSmith/'/>
  </rdf:Description>
</rdf:RDF>

这表示了Bag资源.
容器接口提供了一个iterator来列举容器的内容:
// print out the members of the bag
NodeIterator iter2 = smiths.iterator();
if (iter2.hasNext()) {
    System.out.println("The bag contains:");
    while (iter2.hasNext()) {
        System.out.println("  " +
            (Resource) iter2.next())
                            .getProperty(VCARD.FN)
                            .getString());
    }
} else {
    System.out.println("The bag is empty");
}

并会产生如下的输出:
The bag contains:
John Smith
Becky Smith

本例的可执行代码可以在Tutorial10中找到.
Jena类所提供的操纵容器的方法包括增加新成员, 在容器中间插入新成员和删除已有的成员. Jena容器类目前保证所使用的序数属性列表会从rdf:_1开始并且是相邻的. RDF核心工作小组放松了此项限制, 以允许有局部的容器表示. 所以这是Jena将来可能会修改的地方之一.

________________________________________
关于文本(Literals)和数据类型的更多探讨
RDF文本(literals)并不仅仅是简单的字符串而已. 文本可能有一个语言标签来指示该文本使用的语言. 有英语语言标签的文本"chat"会被认为与有着法语语言标签的文本"chat"是不同的. 这个奇怪的特性是原有RDF/XML语法产生的赝象(artefact).
另外, 事实上共有两种文本. 在一种里, 字符串成分只是简单的字符串. 而在另一种里, 字符串成分被预计为格式良好的XML片段. 当一个RDF模型被写成RDF/XML形式时, 一个特殊的使用parseType='Literal'的属性(attribute)构造会被使用来表示它.
在Jena中, 当一个文本被创建时, 这些属性就被设置了. 例如, 在Tutorial11中:
// create the resource
Resource r = model.createResource();

// add the property
r.addProperty(RDFS.label, model.createLiteral("chat", "en"))
.addProperty(RDFS.label, model.createLiteral("chat", "fr"))
.addProperty(RDFS.label, model.createLiteral("<em>chat</em>", true));

// write out the Model
model.write(system.out);

会产生:

<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
>
  <rdf:Description rdf:nodeID="A0">
    <rdfs:label xml:lang='en'>chat</rdfs:label>
    <rdfs:label xml:lang='fr'>chat</rdfs:label>
    <rdfs:label xml:lang='en' rdf:parseType='Literal'><em>chat</em></rdfs:label>
  </rdf:Description>
</rdf:RDF>

如果两个文本被认为是相同的, 它们一定都是XML的文本或都是简单的文本. 另外, 它们两个要么都没有语言标签, 要么有相同的语言标签. 对简单的文本而言, 两者的字符串一定是相同的. XML文本的等同有两点要求. 第一, 前面提到的要求必须被满足并且字符串必须相同. 第二, 如果它们字符串的cannonicalization一样的话, 它们就是一样的.(译者注: 找不到cannonicalization中文的解释.).
Jena接口也支持类型文字. 老式的对待类型文字的方法是把他们当成是字符串的缩写: 有类型的值会通过Java的常用方法转换为字符串, 并且这些字符串被存储在模型中. 例如, 可以试一试(注意:对于简单类型文字, 我们可以省略对model.createLiteral(…)的调用):
// create the resource
Resource r = model.createResource();

// add the property
r.addProperty(RDFS.label, "11")
.addProperty(RDFS.label, 11);

// write out the Model
model.write(system.out, "N-TRIPLE");

产生的输出如下:
_:A... <http://www.w3.org/2000/01/rdf-schema#label> "11" .

因为两个文本都是字符串"11", 所以只会有一个陈述被添加.
RDF核心工作小组定义了支持RDF数据类型的机制. Jena支持那些使用类型文字的机制; 但是本教程中不会对此讨论.
________________________________________
术语表
空白结点
表示一个资源, 但是并没有指示该资源的URI. 空白结点的作用如同第一逻辑中的存在符合变量.
Dublin 核心
一个关于网络资源的元数据标准. 更详细的信息可以在Dublin Core web site找到.
文本
一个可以作为属性值的字符串.
客体
三元组的一部分, 也就是陈述的值.
谓词
三元组的属性部分.
属性
属性(property)是资源的一个属性(attribute). 例如, DC.title是一个属性, RDF.type也是一个属性.
资源
某个实体. 它可以是一个网络资源, 例如一个网页; 它也可以是一个具体的物理对象, 例如一棵树或一辆车; 它也可以是一个抽象的概念, 例如国际象棋或足球. 资源由URI命名.
陈述
RDF模型中的一个箭头, 通常被解理解为一个事实
主体
RDF模型中箭头出发点的那个资源.
三元组
一个含有主体, 谓词和客体的结构. 是陈述的另一种称呼.
________________________________________
脚注
RDF资源的标签可以包括一个片段标签, 例如http://hostname/rdf/Tutorial/#ch-Introduction, 所以, 严格地讲, 一个RDF资源是由一个URI表示的.
文本可以表示为一个字符串, 也可以有一个可选的语言编码来表示该字符串的语言. 例如, 文本可能有一个表示英语的语言编码"en", 而文本"deux"可能有一个表示法语的语言编码"fr".

posted on 2007-06-07 14:44 小姜阅读(1287) 评论(1) 编辑收藏举报

刷新页面返回顶部

小姜在线