XML

XML (Extensible Markup Language) was first formulated in the late 1990s.

World Wide Web Consortium (W3C)

XML has been used to create a variety of widely used internet languages. Among these are:

  • Wireless Application protocol – or WAP – and Wireless Markup Language – or WML – as markup languages for handheld devices
  • Extensible HyperText Markup Language – or XHTML – created to extend HTML
  • Really Simple Syndication – or RSS – languages for news feeds
  • Web Services Description Language – or WSDL – for describing available web services
  • Synchronized Multimedia Integration Language – or SMIL – for describing multimedia for the Web
  • Resource Description Framework – or RDF – and Web Ontology Language – or OWL – for describing ontology and resources

When elements contain only other elements, they are said to have element content. When a child element contains only text and no child elements of its own, it’s said to have simple content.

 

DTD

A Document Type Definition, or DTD, is a set of declarations that contain the structure rules for the validation of XML. It lists the parent elements and their child elements, and provides a data type for each child element.

To include an internal or inline DTD in an XML document, you insert the DTD declaration after the XML version declaration but before the root XML element.

You can set the standalone attribute to yes to force the use of an internal DTD. If you leave the attribute out, it default to no, which allows either internal or external DTDs.

When you use an HTML viewer to view an XML document that includes an inline DTD, the DTD isn’t included in the code. However, the DTD is included in the source code for the page.

External DTD

There are two ways to refer to an external DTD document. You can use an absolute address, such as a URL or file path. Or you can use a relative file path.

Inline DTD Syntax: <!DOCTYPE rootnode [inline_DTD]>

To use an external DTD, you must specify either the SYSTEM or PUBLIC attribute for the DTD. SYSTEM specifies a DTD that you wrote, whereas PUBLIC specifies a DTD defined by a standards body like W3C and made available to the public.

For a SYSTEM DTD, you specify the location of the DTD file using either an absolute or relative path.

You can also use a combination of external and inline validation. To do this, you refer to the external DTD, then you extend it using an inline DTD.

For a PUBLIC declaration, you need to include an identifier.  The identifier consists of a plus or minus sign. The minus sign indicates an official DTD. Whereas the plus sign includes that the DTD is public but unofficial. The sign followed by the name of the governing body that created the DTD. The DTD name and the language used make up the rest of the identifier.

Syntax: <!DOCTYPE rootnode PUBLIC “identifier” “URI”>

Defining elements in DTD     Syntax: <!ELEMENT [element name] [specifications]>

In the specifications for an element, you can include:

  • ANY

Any characters, data, or set of child elements is allowed

  • (#PCDATA)

Like ANY but without child elements

  • EMPTY
  • Child elements

In a comma-separated list enclosed in parentheses

  • Re-used elements

If two child elements have the same name, you list them twice in the declaration for the parent element. However, you need to declare the child element only once if it’s a child of two different parent elements.

  • Qualifiers

There are a number of qualifiers you can include after a child element name. You include + to specify that the child element might appear once or more than once for that parent element. You include ? to specify that there can be only a single instance or no instance of the child within that parent element. And you include * to indicate that the child element is optional.

  • choice of child elements

 

Instead of using a child element, you can include an attribute for an element in the DTD. You use <!ATTLIST> to declare an attribute. You include the element name, the attribute name, and the attribute type. And you specify a default value.

Syntax: <!ATTLIST element_name attribute_name attribute_type “default_value”>

There are a number of attribute data type declarations:

  • CDATA

Any character data is allowed. However, reserved characters will be ignored by a parser.

  • IDREF

A unique ID, which must also have been named for another element, must be provided for the attribute

  • IDREFS

Multiple IDs separated by whitespace are allowed

  • ENTITY

You use entities to display characters that also have code meanings

  • ENTITIES

You can provide multiple entities, each separated by a space

  • NMTOKEN

An XML name token can be provided. This allows only a smaller set of characters than CDATA to be used for a name

  • NMTOKENS
  • NOTATION

Multiple notations can be provided. However, a notation type attribute must have been declared in the DTD

There are three keywords you can use with the attribute data type

The #REQUIRED keyword makes the attribute mandatory even if there is no value

The #IMPLTED keyword to make an attribute optional

The #FIXED keyword to provide a value that can’t be changed for the attribute

Instead of specifying an attribute type, you can provide a list of possible attribute values. You list the options in parentheses and separate each of them using the pipe character.

You can provide a default value in quotation marks if none is specified in the XML

Certain characters have special meanings in a DTD.

&lt; = <

&gt; = >

&quot; = “

&apos; = ‘  (apostrophe)

&amp; = & (ampersand)

Creating your own entity

Syntax: <!ENTITY entity_key “entity_translated_value”>

You wrap the entity key in an ampersand and a semicoion.

You can create a separate document that contains all the entities you want to use. You then refer to the external entities XML file using the SYSTEM keyword. You include the entity key and the URI of the external entity XML file.

Syntax: <!ENTITY entity_key SYSTEM “URI”>

XML Schemas

XML Schemas are a W3C-supported, XML-based alternative to DTDs. They define the required structure, content, and semantics of XML documents.

XML Schemas aim to resolve issues experienced with DTDs.

  • DTDs don’t allow you to define datatypes extensively.
  • DTDs don’t let you specify exactly how many times an element can appear in an XML document. However, you can use a qualifier to provide some indication of this. For instance, the plus (+) qualifier in the example specifies that the element must appear one or more times. You can also use the “?” and “*” qualifiers to specify an unknown or unlimited number of times.
  • The DTD explicitly defines the element hierarchy. E.g. <!ELEMENT membership (join, month)>. In this case, the <join> element followed by <month>. However, it doesn’t allow you to reuse declarations once they’ve been set, so you can’t later add an element – even if it uses the same set of subelements as those defined in the element hierarchy.
  • A DTD isn’t very precise in specifying possible element values.

 

An XML Schema document consists of several key components and is saved with the .xsd file extension.

Creating an XML Schema

1) Declaration Syntax: <? xml version=”1.0” encoding=”UTF-8” standalone=”yes”  ?>

Although you’re only required to include the first part of the XML declaration. It’s advisable to also specify the encoding and whether the XML Schema is a standalone document.

2) Specify the XML schema root element. In addition to the element declaration, you must include a namespace declaration. You include the xmlns attribute to specify that a document is associated with the W3C’s XML Schema namespace. Namespace help to prevent naming conflicts by allowing you to use prefixes – the xs prefix in this instance – to denote all the XML elements of an XML Schema document.

Syntax:  <xs: schema xmlns:xs=”http://www.w3.org/2001/XML Schema”>

3) Specify various attributes

The attribute you can use with the root element in an XML Schema are

  • attributeFormDefault

You can use attributeFormDefault to specify whether an element must use an attribute. You use one of two values to do this – qualified or unqualified. The default value is unqualified, which means you don’t have to add a prefix to qualify the attribute. However, if the attribute is set to qualified, you must construct an instance to qualify the attribute.

  • elementFormDefault

You can use elementFormDefault to specify whether the XML Schema requires a particular element. You do this by assigning it one of two possible values – qualified or unqualified. The default value is unqualified. If you don’t qualify the attribute, the schema doesn’t require an element.

  • targetNamespace

You can use targetNamespace within the <schema> element to assign an intended namespace to an XML document. Using a namespace helps with the XML validation process.

  • Version

You can use version to indicate the version of the XML Schema document in the XML root element. Note that this isn’t the version of the W3C XML Schema Language used, but rather the version of the XML Schema document itself.

  • xml:lang

You can use xml:lang to specify the language used for an XML Schema document. This is particularly useful if you’re using similar versions of the same XML Schema document, but with differing element and attribute names.

4) XML documents that use an XML Schema document will require the elements you reference. So you need to declare any particular elements that are required.

Elements can be either simple or complex in type.

You can define simple types using the <simpleType> element. Elements that don’t contain child elements or attributes are considered simple types. You can use three types when declaring simple XML elements, namely

  • atomic types

Atomic types don’t contain any child elements or attributes, and no particular rules govern their content.

  • list types

List types enable you to define multiple values within a single XML element.

  • union types

Union types enable you to integrate various item types within a single XML element.

Complex elements contain other child elements or attributes. You define these using the <complexType> elements.

You can declare a complex type as an anonymous complex type or a named complex type.

Declaring a named complex type is similar to declaring an anonymous complex type, but you specify a name using the name attribute.

You can construct complex types using one of two elements:

  • <sequence>

Using <sequence> means all the items in the list are presented in the instance document in the order in which they are declared within the complex type.

  • <all>

You can use the <all> element when constructing complex types to allow you to place the elements in any order in the instance document. However, all the required elements must still exist in the construction.

Element’s cardinality

You can expand on the built-in restrictions of an XML Schema document by including two attributes that restrict the XML even more – minOccurs and maxOccurs.  Both minOccurs and maxOccurs have a default value of one.

If the maxOccurs attribute is set to unbounded, an element can occur any number of times. If you assign the maxOccurs attribute a value of zero, it means the element may occur once or not at all.

Default value

In addition to minOccurs and maxOccurs values, you can specify whether an element has a default value. You do this using the default attribute within an element.

In addition to default values, you can assign two other types of values to elements:

  • Fixed values

Fixed values are similar to default values except that they can’t be altered at runtime. When you assign an element the fixed value and an end user attempts to use another value at runtime, the new value will be detected as invalid.

  • Null values

You use the nillable attribute, with a value of either true or false. A value of true means the element is set to null, and false means it can’t be assigned a null value.

 

Declaring attributes

Another way you can define elements for an XML Schema is by declaring attributes. An attribute is a value contained within an element.

Similar to an element declaration,  you can declare an attribute by providing the datatype as well as its name.

You can declare an attribute within the <schema>, <complexType>, or <attributeGroup> elements.  If the <complexType> element contains other declarations, such as other elements, you must put the attribute declarations after the other declarations, at the bottom.

When declaring attributes for elements, there are two more attributes you can use within the declaration.

  • You can include the use attribute to specify whether an element’s attribute is required. Declaring this attribute value is optional, but if you do decide to use it you can assign the attribute one of three values – optional, prohibited, or required. By default this value is set to optional.
  • You can assign an attribute a default value. This specifies the attribute’s initial value when it’s being created.

In some instances you might want to prevent end users from entering any value they like for an attribute. To do this, you can put a restriction on an attribute’s values using this syntax.

Syntax:   <xs: restriction base=”xs:string”>

                <xs: [element value]=”n”/>

                  </xs: restriction>

You can restrict attribute values in one of three ways – using the <minInclusive> element, the <maxInclusive> element, or the <enumeration> elements.

You can place a restriction on attribute values by using

  • <minLength>
  • <maxLength>
  • <enumeration>

Group of attributes

You declare a group of attributes using the <attributeGroup> element. Within this tag, you simply point to the attribute group using the ref attribute as a reference.  If you’re using attribute groups to define your elements, you can still declare an element with attributes other than those specified in the attribute group, if you prefer.

If you don’t want to use an attribute that’s defined in an attribute group for a particular element, you can use the prohibited value with the use attribute to disable the attribute.

Reuse XML Schema

You can reuse XML Schema documents if you want to reference multiple XSD files to act as the foundation for a particular XML document.

You can do this in one of two ways, using either

  • <import>

You would typically do this if the two XML Schema documents use different namespaces.

  • <include>

You can use the <include> element if the other XML Schema has the same namespace. However, you can use <include> even if the document doesn’t have a defined namespace.

Comments

You use the <!—and the --> sequences of characters to enclose a comment, which will then be ignored at runtime.

Another way to add comments to XML is by using the <annotation> element. This element is a child element of the <element> element.

The <annotation> element has two child elements you can use to add comments.

<documentation>: document a specific element using text.

<appInfo>:  include plain text and application-specific comments within an element.

Tools

Some of the tools you can use to create, edit, and validate XML Schema include

  • Altova XML Spy
  • Saxon
  • Stylus Studio
  • Visual Schema
  • Validome