读后笔记 -- Java核心技术(第11版 卷 II) Chapter3 XML

3.1 XML Introduction

An XML document is made up of elements.

  • An element can have attributes (key/value pairs) and child nodes.
  • Child nodes can be elements or text.

XML is widely used for structured documents, configration files, communication protocols, data interchange, and so on.

The JSON people think that JSON is better for data interchange: http://www.json.org/xml.html


3.3  Parsing an XML Doucment

1. Three parser methods:

  • 1.1 DOM (Document Object Model) -- produces a tree structure, used for medium/small document
  • 1.2 SAX (Smiple API for XML) -- notify you whenever it encounters another feature (such as the start or end of an element)
  • 1.3 "pull parser" (StAX) -- where you program a loop that gets each feature

 

2. If not a huge data set, it's simplest to use the DOM. Here is how:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

// usage 1): parse a file
File f = ...;
Document doc = builder.parse(f);

// usage 2): parse a URL
URL u = ...;
Document doc = builder.parse(u);

// usage 3): parse an inputStream
InputStream in = ...
Doucment doc = builder.parse(in);

 

3. Analyzing the DOM

3.1 To analyze the DOM tree, start with the root element:

Element elem = doc.getDocumentElement();

3.2 When you have any element, you care about three pieces of information:

  • 1) Tag name
  • 2) Attributes
  • 3) Children

3.2.1 Get the tag name by calling:  elem.getTagName()

3.2.2 This Code walks throught all attributes:

NamedNodeMap attributes = element.getAttributes();
for (int i = 0; i < attributes.getLength(); i++) {
    Node attribute = attributes.item(i);
    String name = attribute.getNodeName();
    String value = attribute.getNodeValue();
    ...
}

3.2.3 For Children:

NodeList children = root.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
    Node child = children.item(i);
    ...
}

 

4. Node Types

4.1 Most applications need to process Element and Text nodes, unless your XML document has a DTD or schema, the DOM includes all whitespace as Text nodes:

4.2 Can filter text out like this:

Node chld = children.item(i);
if (child instanceof Element) {
    Element childElement = (Element) child;
    ...
}

4.3 To get the text child from an element (such as <font> element above), call:

Text textNode = (Text) childElement.getFirstChild();
String text = textNode.getData().trim();

3.4 Validation

1. Many XML documents have speicifc rules about vaild elements and attributes.

 

2. Java API supports two mechanism for describing these rules:

  • Document type definitions (DTD) -- old style
  • XML Schema

 

3. When a parser validates a document, it checks that the document conforms to the rules.

 

4. To turn on DTD validation, call

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);                           // 该工厂生成的所有文档将根据 DTD 来验证它们的输入
factory.setIgnoringElementContentWhitespace(true);     // 生成器将不会报告文本节点中的空白字符

  then, xml using with DTD 

<?xml version="1.0"?>
<!DOCTYPE config SYSTEM "config.dtd">
<config>
    <entry id="background">
        <construct class="java.awt.Color">
            <int>55</int>
            <int>200</int>
            <int>100</int>
        </construct>
    </entry>
    <entry id="currency">
        <factory class="java.util.Currency">
            <string>USD</string>
        </factory>
    </entry>
</config>
------ config.dtd 部分内容 -------
<!ELEMENT entry (string|int|boolean|construct|factory)>      // 元素内容的规则
<!ATTLIST entry id ID #IMPLIED>    //元素属性的规则, att: attribute

 

5. For XML Schema, use following code. Unfortunately, the parser doesn't discard whitespace.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// 下面2条针对 XML Schema 也是必须的
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);

factory.setNamespaceAware(true);
final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);

xml using with XML Schema

<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="config.xsd">
    <entry id="background">
        <construct class="java.awt.Color">
            <int>55</int>
            <int>200</int>
            <int>100</int>
        </construct>
    </entry>
    <entry id="currency">
        <factory class="java.util.Currency">
            <string>USD</string>
        </factory>
    </entry>
</config>

3.5 XPath

1. It can be tedious to analyze the DOM tree by visiting descends. => XPath is a standard way of locating node sets. For example, /html/body/table describes all tables in an XHTML document.

 

2. To use XPath, you need a factory, then call the evaluate method to get a string result:

// to create a factory
XPathFactory xpFactory = XPathFactory.newInstance();
XPath path = xpFactory.newXPath();

// call evaluate
String title = path.evaluate("/html/head/title", doc);

 

3. To get a result as a node list, node, or number, call

// node list
XPathNodes result = path.evaluateExpression("/html/body/form", doc, XPathNode.class); // >= Java 9 NodeList nodes = (NodeList) path.evaluate("/html/body/form", doc, XPathConstants.NODESET); // < Java 9 // node
Node node = path.evaluateExpression("/html/body/from[1]", doc, Node.class); // >= Java 9 Node node = (Node) path.evaluate("/html/body/form[1]", doc, XPathConstants.NODE); // < Java 9 // number
int count = path.evaluateExpression("count(/html/body/form)", doc, Integer.class); // >= Java 9 int count = ((Number) path.evaluate("count("html/body/form")", doc, XPathConstants.NUMBER)).intValue(); // < Java 9

 

4. You don't have to start a search at the top of the document:

result = path.evaluate(expression, node);

3.6 Namespaces

1. With namespaces, XMLdocuments can use elements from two grammers. This follwing document contains XHTML and SVG:

// Note the xmlns:prefix in the root node
<html xmls="http://www.w3.org/1999/xhtml"
    xmlns:svg="http://www.w3.org/2000/svg">

// document as below: An unprefixed element (body) is XML, and the SVG elements have an svg prefix
<body>
    <svg:svg width="100" height="100">
        <svg:circle cx="50" cy="50" r="40" stroke="green" stroke-width="4"/>
    </svg:svg>
</body>

 

2. To turn on namespace processing in Java, call

factory.setNamespaceAware(true);

 

3. Now, getNodeName yields the qualified name such as svg:circle in our example.

4. There are methods to get the namesapce URI and the unprefixed tag name:

Node.getNamespaceURI()   // Gets the name space URI, http://www.w3.org/2000/svg
Node.getLocalName()      // Gets the unprefixed name, such as circle

3.7 Streaming parser

3.7.1 SAX Parser

1. Streaming parsers are useful for parsing huge documents.

2. Instead of building a tree structure (DOM, section 3.3, -2), the SAX parser reports events. You supply a handler with methods:

  • startElement and endElement
  • characters
  • startDocument and endDocument

3. In the callback methods, you get the element name and attributes, or the text content.

4. Start the parsing process like this:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(source, handler);

 

3.7.2 StAX Parser

1. The StAX parser is a "pull parser". Instead of installing an event handler, you iterate through the events:

InputStream in = ...;

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);

while (parser.hasNext()) {
    int eventType = parser.Next();
    ...
}

 

2. Then branch on the event type (START_ELEMENT, END_ELEMENT, CHARACTERS, START_DOCUMENT, END_DOCUMENT, and so on).

2.1 Element -- Analyze an element like this:

String name = parser.getName();    // the local name; call getQName if you need namespace data
int attrCount = parser.getAttributeCount();
for (int i = 0; i < attrCount; i++) {
    Process parser.getAttributeLocalName(i) and parser.getAttributeValue(i)
}

 You can also look up an attribute by name:

String value = parser.getAttributeValue(null, attributeName);

2.2 CHARACTERS, call parser.getText() returns the text


3.8 Building XML Document

两种方式:

  • 1. build a DOM tree (section 3.8.1/3.8.2) -> write document (section 3.8.3)
  • 2. directly write XML Document with StAX (section 3.8.4)

 

方式一:

1. Building a DOM tree

1.1 without namespace (section 3.8.1)

// step1: Don't write XML with print statements, you can build a DOM tree with follwing codes:
Document doc = builder.newDoucment();
Element rootElement = doc.createElement(rootName);    // 创建根节点
Element childElement = doc.createElement(childName);  // 创建子节点
Text textNode = doc.createTextNode(textContents);     // 创建文本节点

// step2: Set attributes like this:
rootElement.setAttribute(name, value);

// step3: Attach the chldren to the parents:
doc.appendChild(rootElement);
rootElement.appendChild(childElement);
childElement.appendChild(textNode);

 

1.2 with namespace (section 3.8.2)

// 将生成器工厂设置为命名空间感知的,然后创建生成器
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();

// 创建节点
String namespace = "http://www.w3.org/2000/svg";
Document doc = builder.newDoucment();
Element rootElement = doc.createElementNS(namespace, "svg");

// 设置属性
rootElement.setAttributeNS(namespace, qualifiedName, value);

// 添加节点
doc.appendChild(rootElement);

 

2. writing Documents with LSSerializer interface (3.8.3)

There is no easy way to write a DOM tree. The easiest approach is with LSSerializer interface

// 1. get an instance with this magic incantation:
DOMImplementation impl = doc.getImplementation();
var implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer ser = implLS.createLSSerializer();

// 2. If you want sapces and line breaks, set this flag:
ser.getDomConfig().setParameter("format-pretty-print", true);

// 3.1 You can also save the document to a file:
LSOutput out = implLS.createLSOutput();
out.setEncoding("UTF-8");
out.setByteStream(Files.newOutputStream(path));
ser.write(doc, out);

// 3.2 Or you can turn the document into a string:
String str = ser.writeToString(doc);

 

方式二:

(Easier) Writing an XML Document with StAX

Wasteful to build a DOM tree just to write a document. The StAX API lets you write an XML document directly.

// 1. Construct an XMLStreamWriter from an OutputStream instance:
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(out);

// 2. To produce the XML header, call:
writer.writeStartDocument();

// 3. Then start the first element:
writer.writeStartElement(name);

// 4. Add attributes by calling:
writer.writeAttribute(name, value);

// 5. Now you can add child elements by calling writeStartElement again, or write characters with:
writer.writeCharacters(text);

// 6. You can write a self-closing tag (such as <img .../>) with the writeEmptyElement method:
writer.writeEmptyElement(name);

// 7. Call writeEndElement to end an element:
writer.writeEndElement();

// 8. Call writeEndDocument at the end of the document:
writer.writeEndDocument();
// 9. When you are all done, close the XMLStreamWriter -- it isn't auto-closable.
writer.close();

 

  

posted on 2023-02-10 10:31  bruce_he  阅读(25)  评论(0编辑  收藏  举报