双语学习xml系列----之一 什么是xml?(第一小节)
What Is XML?
XML (Extensible Markup Language) is a buzzword you will see everywhere on the Internet, but it’s
also a rapidly maturing technology with powerful real-world applications, particularly for the
management, display, and organization of data. Together with its many related technologies,
which are covered in later chapters, XML is an essential technology for anyone working with data,
whether publicly on the web or privately within your own organization. This chapter introduces
you to some XML basics and begins to show you why learning about it is so important.
This chapter covers the following:
❑The two major categories of computer file types—binary files and text files—and the
advantages and disadvantages of each
❑The history behind XML, including other markup languages such as SGML and HTML
❑How XML documents are structured as hierarchies of information
❑A brief introduction to some of the other technologies surrounding XML, which you will
work with throughout the book
❑A quick look at some areas where XML is useful
.
Xml在internet上是一个新词,但是它正在伴随着强有力的现实应用程序下日趋成熟,特别在数据的管理,展现和组织上。不管是web上公开数据还是在内部使用数据,XML都是一种很好的技术。和XML相关的技术会在以后的章节陆续讲到,本章主要讲述xml的一些基本知识,以及xml的重要性等,详细如下:
1. 两种主要的文件类型:二进制文件、文本文件以及它们的优缺点
2. Xml的历史
3. 与xml相关技术的简单简单介绍
4. Xml主要用在哪些领域
Of Data, Files, and Text
XML is a technology concerned with the description and structuring of data, so before you can
really delve into the concepts behind XML, you need to understand how computers store and
access data. For our purposes, computers understand two kinds of data files: binary files and
text files.
关于数据、文件以及文本
Xml是与数据描述、数据的结构相关的一种技术,因此在了解xml之间你需要知道计算机是怎样去存储与读取数据的。大家都知道计算机可以读取二进制文件和文本文件。
Binary Files
A binary file, at its simplest, is just a stream of bits (1s and 0s). It’s up to the application that created a
binary file to understand what all of the bits mean. That’s why binary files can only be read and produced
by certain computer programs, which have been specifically written to understand them.
For instance, when a document is created with Microsoft Word, the program creates a binary file with an
extension of “doc,’’ in its own proprietary format. The programmers who wrote Word decided to insert
certain binary codes into the document to denote bold text, codes to denote page breaks, and other codes
for all of the information that needs to go into a “doc’’ file. When you open a document in Word, it interprets
those codes and displays the properly formatted text or prints it to the printer.
The codes inserted into the document are meta data, or information about information. Examples could
be “this word should be in bold,” “that paragraph should be centered,” and so on. This meta data is
really what differentiates one file type from another; the different types of files use different kinds of
meta data. For example, a word processing document has different meta data than a spreadsheet document,
because they are describing different things. Not so obviously, documents from different word
processing applications, such as Microsoft Word and WordPerfect, also have different meta data, because
the applications were written differently (see Figure 1-1).
二进制文件:
二进制文件就是一些1或0的比特流。它的创建取决于特定应用程序对于比特流的理解。这就是为什么只有一些特定的应用程序对能去记取这些比特流,而某些计算机程序则不能读取。
比如,当我们用microsoft Word 来建立一个文件,word应用程序就会创建一个以”doc”为扩展名的二进制文件,而这个二进制文件就是word自己的格式。当我们对word文档中的字体加粗、分页或是进行其他操作时,word应用程序就会在二进制文件中加入相应的表示符号,这样你在下次打开或打印时就会按相应的样式来显示。
这些表示格式的二进制信息就是word文档的元数据,也可以称为关于信息的信息,就像对文字加粗、创建段落等。当然这些元数据会因为应用程序的不同而不同,比如word和电子表格的元数据就是用不同的符号去表示。同时对于不同的文档处理软件,比如microsoft word和WordPerfect,它们的元数据也是不一样的。(看图1-1)
Figure 1-1
You can’t assume that a document created with one word processor will be readable by another, because
the companies who write word processors all have their own proprietary formats for their data files.
Word documents open in Microsoft Word, and WordPerfect documents open in WordPerfect.
Luckily, most word processors come with translators or import utilities, which can translate documents
from other word processors into formats that can be understood natively. If I have Microsoft Word
installed on my computer and someone gives me a WordPerfect document, I might be able to import it
into Word so that I can read the document. Of course, many of us have seen the garbage that sometimes
occurs as a result of this translation; sometimes applications are not as good as we’d like them to be at
converting the information.
Binary file formats are advantageous because it is easy for computers to understand these binary codes—
meaning that they can be processed much faster than nonbinary formats—and they are very efficient for
storing this meta data. There is also a disadvantage, as you’ve seen, in that binary files are proprietary.
You might not be able to open binary files created by one application in another application, or even in the
same application running on another platform.
同时你不要指望不一样的文本处理器生成的文件它们之间可以相互读取,因为它们的所用来表示的二进制流不一样,但幸运的是现在大家的文档处理器都带有一个翻译器或其他的导入实用程序,它们可以把不同格式的二进制流翻译成自己能够理解的格式,比如我们用Microsoft Word可以打开WordPerfect的文档。但是它并不会像我们希望中的那样,在翻译的时候有时也会发生错误。
二进制文件有以下几个优点:便于计算机理解,这样意味着在处理同一些数据时,二进制格式的就相对会快些,同时二进制文件存储这些元数据效率高。二进制文件也有一大弱点就是因为其创建时是根据特定的应用程序,所以我们不能用其他的应用程序去打开,即使同一个应用程序,在不同的操作平台上也不能打开。
Text Files
Like binary files, text files are also streams of bits. However, in a text file these bits are grouped together
in standardized ways, so that they always form numbers. These numbers are then further mapped to
characters. For example, a text file might contain the following bits:
1100001
This group of bits would be translated as the number 97, which could then be further translated into the
letter a.
This example makes a number of assumptions. A better description of how numbers are represented in
text files is given in the “Encoding” section in Chapter 2.
Because of these standards, text files can be read by many applications, and can even be read by
humans, using a simple text editor. If I create a text document, anyone in the world can read it (as long
as they understand English, of course) in any text editor they wish. Some issues still exist, such as the
fact that different operating systems treat line-ending characters differently, but it is much easier to share
information when it’s contained in a text file than when the information is in a binary format.
Figure 1-2 shows some of the applications on my machine that are capable of opening text files. Some of
these programs only allow me to view the text, while others will let me edit it as well.
Figure 1-2
文本文件:
文本文件像二进制文件一样也是由比特流组成的,但是文本文件的比特流是一些按照一定的规则排列的,因些它们形成有一定的数字,这些数字则对应相应的字符,比如一个文本文件包含有一下的比特序列:
1100001
这个序列代表数字97,而97又对应字符“a”;
根据文本文件的规则,文本文件可以被很多应用程序读取,同时我们用一个文字阅读器也可以读取。如果我们创建一个文本文件,任何人都可以用他喜欢的文本阅读器来读取它,(如果他能读懂英语)。但是仍旧会存在一些问题,比如不同的操作系统对换行符的表示不一样,但是比起二进制文件来说,它更容易让我们去共享信息。
图1-2是我计算机上可以阅读文本文件的应用程序,有些应用程序不仅可以阅读,还可以编辑。
In its early days, the Internet was almost completely text-based, which enabled people to communicate
with relative ease. This contributed to the explosive rate at which the Internet was adopted, and to the
ubiquity of applications such as e-mail, the World Wide Web, newsgroups, and so on.
The disadvantage of text files is that adding other information—our meta data, in other words—is
more difficult and bulky. For example, most word processors enable you to save documents in text form,
but if you do, you can’t mark a section of text as bold or insert a binary picture file. You will simply get
the words with none of the formatting.
在早些时候,internet 也是基于文本的,这样会方便我们交流与internet相关的东西。它使email,www,新闻组等以爆炸性地增长。
文本文件也有它自己的弱点,在添加像二进制文件中的元数据时会非常困难并且文件会变得很大。比如,很多文本处理器都允许你保存文件为文本格式,但是你不能是文件中的某些部分以加粗格式存储,另外你也不能在其中插入二进制图片,你得到的仅仅是没有任何格式的文本。