双语学习xml系列----之一 什么是xml?(第二小节)
A Brief History of Markup
You can see that there are advantages to binary file formats (easy to understand by a computer, compact,the ability to add meta data), as well as advantages to text files (universally interchangeable). Wouldn’t it be ideal if there were a format that combined the universality of text files with the efficiency and rich information storage capabilities of binary files?
This idea of a universal data format is not new. In fact, for as long as computers have been around, programmers have been trying to find ways to exchange information between different computer programs. An early attempt to combine a universally interchangeable data format with rich information storage capabilities was Standard Generalized Markup Language (SGML). SGML is a text-based language that can be used to mark up data—that is, add meta data—in a way that is self-describing. (You’ll see in amoment what self-describing means.)
SGML was designed to be a standard way of marking up data for any purpose, and took off mostly in
large document management systems. When it comes to huge amounts of complex data, a lot of considerations must be taken into account, so SGML is a very complicated language. However, with that complexity comes power.
A very well-known language based on the SGML work is the HyperText Markup Language (HTML).
HTML uses many of SGML’s concepts to provide a universal markup language for the display of information,and the linking of different pieces of information. The idea was that any HTML document (or web page) would be presentable in any application that was capable of understanding HTML (termed a web browser). A number of examples are given in Figure 1-3.
标记的简史:
在上文中你已经知道二进制格式文件的优点(便于计算机理解,简洁,方便地添加元数据),同时也看到了文本格式文件的优点(便于统一交换数据),那么要是有一种格式的文件兼有这两者的优点不是更好吗?
其实统一数据格式这个思想早就有了.当计算机开始流行时,程序员就已经开始去寻找一种能够在各种计算机程序之间交换数据的方法.早期的尝试是标准生成标记语言SGML,它是一种基于文本的富信息存储语言,可以用于标记数据,是一种自描述的语言.
标准生成标记语言的开发目的就是用来作为标记各种数据的标准,它起源于大文件管理系统.当一些复杂的数据用它来标记时,要遵守很多标记规则.但是伴随着复杂的规则,同时也表现出来很强的数据标记能力.
HTML是一个众所周知的基于SGML的标记语言.HTML用SGML的观念来提供一个统一的显示和链接不同信息的标准,这样HTML文件(网页文件)可以在任何能够解析它的程序中展现(如1-3所示),这些程序就称为浏览器.
Not only would that browser be able to display the document, but if the page contained links (termed
hyperlinks) to other documents, the browser would also be able to seamlessly retrieve them as well.
Furthermore, because HTML is text-based, anyone can create an HTML page using a simple text editor,or any number of web page editors, some of which are shown in Figure 1-4.
不仅浏览器可以来显示HTML文件,如果这个HTML文件包含有超链接,浏览器也可以检测到。同时由于HTML文件是基于文本的,因此任何人都可以用文本编辑器来创建网页文件,常用的编辑器如1-4所示:
Even many word processors, such as WordPerfect and Word, allow you to save documents as HTML.
Think about the ramifications of Figures 1-3 and 1-4: Any HTML editor, including a simple text editor,can create an HTML file, and that HTML file can then be viewed in any web browser on the Internet!
甚至像WordPerfect,Microsoft Word 这样的文字处理软件都支持把文件另存为HTML格式的文件,再看图1-3,1-4,任何一种HTML编辑器,包括简单的文本编辑器都能创建HTML文件,任何一种浏览器都能够在Internet上查看HTML文件。
So What Is XML?
Unfortunately, SGML is such a complicated language that it’s not well suited for data interchange over the web. In addition, although HTML has been incredibly successful, it’s limited in scope: It is only intended for displaying documents in a browser. The tags it makes available do not provide any information about the content they encompass, only instructions about how to display that content. This means that you could create an HTML document that displays information about a person, but that’s about all you could do with the document. You couldn’t write a program to figure out from that document which piece of information relates to the person’s first name, for example, because HTML doesn’t have any facilities to describe this kind of specialized information. In fact, HTML wouldn’t even know that the document was about a person at all. Extensible Markup Language (XML) was created to address these issues.
Note that despite the acronym, it’s spelled “Extensible,” not “eXtensible.” Mixing these up is a common mistake.
XML is a subset of SGML, with the same goals (markup of any type of data), but with as much of the
complexity eliminated as possible. XML was designed to be fully compatible with SGML, meaning any document that follows XML’s syntax rules is by definition also following SGML’s syntax rules, and can therefore be read by existing SGML tools. It doesn’t go both ways, however, so an SGML document is not necessarily an XML document.
It is important to realize that XML is not really a “language” at all, but a standard for creating languages that meet the XML criteria (we go into these rules for creating XML documents in Chapter 2). In other words, XML describes a syntax that you use to create your own languages. For example, suppose you have data about a name, and you want to be able to share that information with others as well as use that information in a computer program. Instead of just creating a text file like this:
John Doe
or an HTML file like this
<html>
<head><title>Name</title></head>
<body>
<p>John Doe</p>
</body>
</html>
you might create an XML file like the following:
<name>
<first>John</first>
<last>Doe</last>
</name>
Even from this simple example, you can see why markup languages such as SGML and XML are called
“self-describing.” Looking at the data, you can easily tell that this is information about a <name>, and
you can see that there is data called <first> and more data called <last>. You can give the tags any
names you like, but if you’re going to use XML, you might as well use it right and give things meaningful
names.
究竟什么是XML呢?
在上面提到SGML是一种很强大的语言,但不幸的是SGML过于复杂,因此它不适合在Web上交换数据;另外虽然HTML获得了巨大的成功,但是它也有自己的局限,因为它只能用于在浏览器中显示数据。它的那些标签只是标明了在浏览器中怎么去显示数据,而不能提供这些标签中的内容是做什么用的,有什么意义。比如我们用HTML文件来表示一个人的一些信息,这样是可以的,但是我们不能再写一个程序从这个HTML文件中指出这个人所对应的姓名等,HTML也不知道文件中表示的是什么信息。为了解决这个问题,XML就诞生了。
XML是SGML的一个子集,也是为了标记各种数据的,但是它比SGML更灵活。它严格按照SGML的语法,因此,一个格式良好的XML文件可以被现有的SGML工具读取,但是其他的SGML就不一定是XML格式的。
我们要意识到XML不是一种语言,它只是用来创建符合XML格式的语言。也就是说XML描述了一种你自己需要语言的语法。比如,你有一个姓名的数据,你想把这个姓名信息在不同的计算机程序之间共享,这样你就不能创建如一下的文本信息:
John Doe
也不能创建如下的HTML文件:
<html>
<head>
<title>Name</title>
</head>
<body>
<p>John Doe</p>
</body>
</html>
你可以建立一个如下的XML文件:
<name>
<first>John</first>
<last>Doe</last>
</name>
从这个简单的例子你就可以看出像SGML和XML这样的标记语言,它们为什么被称作是”自描述语言”了,看到这<name>,我们就知道它是用来标示什么信息,同时看到<first>,<last>我们就可以更进一步地来了解到关于姓名的一些信息。由此可以看出在XML语言中你可以给标签起任何你认为理想的名字,但是这些名字最好要有意义。
You can also see that the XML version of this information is much larger than the plain-text version.
Using XML to mark up data adds to its size, sometimes enormously, but achieving small file sizes isn’t one of the goals of XML; it’s only about making it easier to write software that accesses the information, by giving structure to the data
.
If bandwidth is a critical issue for your applications, you can always compress your XML documents
before sending them across the network—compressing text files yields very good results.
If you’re running Internet Explorer 5 or later, you can view the preceding XML in your browser, as
shown in the following Try It Out. (You can also use other web browsers, such as Firefox, to display the XML examples in this chapter. All of the screenshots shown, however, are of Internet Explorer 6.)
你可以看到XML版本的文件来表示信息比普通的文本文件要大一些,虽然如此,但是XML文件更便于软件的读取,同时也给出一数据的结构信息。
如果在网络传输中对带宽有限制,你可以在传输前对XML文件进行压缩后传输,这样就会提高效率。
如果你装有IE5或IE6你可以在IE浏览器中来查看XML文件,现在就让我们动手做一些实例(你也可以用firefox或者其他的浏览器):
Try It Out Opening an XML File in Internet Explorer
1. Open Notepad and type in the following XML:
<name>
<first>John</first>
<last>Doe</last>
</name>
动手实践一:在浏览器中打开XML文件
1. 打开记事本程序,软件如下的XML信息:
<name>
<first>John</first>
<last>Doe</last>
</name>
2. Save the document to your hard drive as name.xml. If you’re using Windows XP, be sure to
change the Save as Type drop-down option to All Files. (Otherwise, Notepad will save the document
with a .txt extension, causing your file to be named name.xml.txt.) You might also
want to change the Encoding drop-down to Unicode, as shown in Figure 1-5. (Find more information
on encodings in Chapter 2.)
2.把这个文件以”ame.xml”为文件名保存在硬盘上,如果你用的是Windows XP操作系统,你需要修改一下保存的文件类型为”所有文件”;如图1-5所示:
3. You can then open the file in Internet Explorer (for example, by double-clicking on the file in
Windows Explorer), where it will look something like Figure 1-6.
3.用浏览器打开刚才的文件,如1-6所示: