双语学习xml系列----之一 什么是xml?(第二小节)
A Brief History of Markup
You can see that there are advantages to binary file formats (easy to understand by a computer, compact,the ability to add meta data), as well as advantages to text files (universally interchangeable). Wouldn’t it be ideal if there were a format that combined the universality of text files with the efficiency and rich information storage capabilities of binary files?
This idea of a universal data format is not new. In fact, for as long as computers have been around, programmers have been trying to find ways to exchange information between different computer programs. An early attempt to combine a universally interchangeable data format with rich information storage capabilities was Standard Generalized Markup Language (SGML). SGML is a text-based language that can be used to mark up data—that is, add meta data—in a way that is self-describing. (You’ll see in amoment what self-describing means.)
SGML was designed to be a standard way of marking up data for any purpose, and took off mostly in
large document management systems. When it comes to huge amounts of complex data, a lot of considerations must be taken into account, so SGML is a very complicated language. However, with that complexity comes power.
A very well-known language based on the SGML work is the HyperText Markup Language (HTML).
HTML uses many of SGML’s concepts to provide a universal markup language for the display of information,and the linking of different pieces of information. The idea was that any HTML document (or web page) would be presentable in any application that was capable of understanding HTML (termed a web browser). A number of examples are given in Figure 1-3.
Not only would that browser be able to display the document, but if the page contained links (termed
hyperlinks) to other documents, the browser would also be able to seamlessly retrieve them as well.
Furthermore, because HTML is text-based, anyone can create an HTML page using a simple text editor,or any number of web page editors, some of which are shown in Figure 1-4.
Even many word processors, such as WordPerfect and Word, allow you to save documents as HTML.
Think about the ramifications of Figures 1-3 and 1-4: Any HTML editor, including a simple text editor,can create an HTML file, and that HTML file can then be viewed in any web browser on the Internet!
甚至像WordPerfect,Microsoft Word 这样的文字处理软件都支持把文件另存为HTML格式的文件,再看图1-3,1-4,任何一种HTML编辑器,包括简单的文本编辑器都能创建HTML文件,任何一种浏览器都能够在Internet上查看HTML文件。
So What Is XML?
Unfortunately, SGML is such a complicated language that it’s not well suited for data interchange over the web. In addition, although HTML has been incredibly successful, it’s limited in scope: It is only intended for displaying documents in a browser. The tags it makes available do not provide any information about the content they encompass, only instructions about how to display that content. This means that you could create an HTML document that displays information about a person, but that’s about all you could do with the document. You couldn’t write a program to figure out from that document which piece of information relates to the person’s first name, for example, because HTML doesn’t have any facilities to describe this kind of specialized information. In fact, HTML wouldn’t even know that the document was about a person at all. Extensible Markup Language (XML) was created to address these issues.
Note that despite the acronym, it’s spelled “Extensible,” not “eXtensible.” Mixing these up is a common mistake.
XML is a subset of SGML, with the same goals (markup of any type of data), but with as much of the
complexity eliminated as possible. XML was designed to be fully compatible with SGML, meaning any document that follows XML’s syntax rules is by definition also following SGML’s syntax rules, and can therefore be read by existing SGML tools. It doesn’t go both ways, however, so an SGML document is not necessarily an XML document.
It is important to realize that XML is not really a “language” at all, but a standard for creating languages that meet the XML criteria (we go into these rules for creating XML documents in Chapter 2). In other words, XML describes a syntax that you use to create your own languages. For example, suppose you have data about a name, and you want to be able to share that information with others as well as use that information in a computer program. Instead of just creating a text file like this:
John Doe
or an HTML file like this
<p>John Doe</p>
you might create an XML file like the following:
Even from this simple example, you can see why markup languages such as SGML and XML are called
“self-describing.” Looking at the data, you can easily tell that this is information about a <name>, and
you can see that there is data called <first> and more data called <last>. You can give the tags any
names you like, but if you’re going to use XML, you might as well use it right and give things meaningful
John Doe
<p>John Doe</p>
You can also see that the XML version of this information is much larger than the plain-text version.
Using XML to mark up data adds to its size, sometimes enormously, but achieving small file sizes isn’t one of the goals of XML; it’s only about making it easier to write software that accesses the information, by giving structure to the data
If bandwidth is a critical issue for your applications, you can always compress your XML documents
before sending them across the network—compressing text files yields very good results.
If you’re running Internet Explorer 5 or later, you can view the preceding XML in your browser, as
shown in the following Try It Out. (You can also use other web browsers, such as Firefox, to display the XML examples in this chapter. All of the screenshots shown, however, are of Internet Explorer 6.)
Try It Out Opening an XML File in Internet Explorer
1. Open Notepad and type in the following XML:
1. 打开记事本程序,软件如下的XML信息:
2. Save the document to your hard drive as name.xml. If you’re using Windows XP, be sure to
change the Save as Type drop-down option to All Files. (Otherwise, Notepad will save the document
with a .txt extension, causing your file to be named name.xml.txt.) You might also
want to change the Encoding drop-down to Unicode, as shown in Figure 1-5. (Find more information
on encodings in Chapter 2.)
2.把这个文件以”ame.xml”为文件名保存在硬盘上,如果你用的是Windows XP操作系统,你需要修改一下保存的文件类型为”所有文件”;如图1-5所示:
3. You can then open the file in Internet Explorer (for example, by double-clicking on the file in
Windows Explorer), where it will look something like Figure 1-6.