使用序列化快速读写XML文件
InfoPath是微软提供的一个非常好用的XML编辑工具,可以用InfoPath编辑好各种各样表单的模板以及数据输入界面,然后其他人可以使用这个表单模板输入表单数据,提交到比如说ERP系统当中去—因为它保存的数据是XML格式的,而且你可以在InfoPath里面设置XML验证表单的方式,我会在另一篇文章里面介绍如何使用InfoPath创建一个表单模板并且使用这个模板。
例如下面是一个InfoPath生成的XML文件:
<?xml version="1.0" encoding="UTF-8"?> <? color: blue; font-family: 'Courier New';"> solutionVersion="1.0.0.15" productVersion="12.0.0" PIVersion="1.0.0.0" href="file:///C:"Documents%20and%20Settings"xyz"My%20Documents"Paper.xsn" name="urn:schemas-microsoft-com:office:infopath:Paper:-myXSD-2005-10-21T21-12-27" ?> <? color: blue; font-family: 'Courier New';"> progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?> <my:Paper xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003" xml:lang="zh-cn"> <my:Candidate>张三</my:Candidate> <my:PositionApplies>STE</my:PositionApplies> <my:ApplicationDate xsi:nil="true"></my:ApplicationDate> <my:FillInOffice>false</my:FillInOffice> <my:Questions> <my:Question> <my:QuestionText>asdfasdf</my:QuestionText> <my:Answers my:SelectedAnswer="false"> <my:AnswerDescription>asfasfdsa</my:AnswerDescription> </my:Answers> <my:Answers my:SelectedAnswer="true"> <my:AnswerDescription>afsafd</my:AnswerDescription> </my:Answers> <my:Answers my:SelectedAnswer="false"> <my:AnswerDescription>afasfdf</my:AnswerDescription> </my:Answers> <my:Answers my:SelectedAnswer="false"> <my:AnswerDescription>asfadsfsafd</my:AnswerDescription> </my:Answers> <my:Id>1</my:Id> </my:Question> </my:Questions> <my:Skill>C#</my:Skill> </my:Paper> |
如果将上面的XML文件封装的话,最直观的封装方法就是这样的了:
public class Answers : List<Answer> { }
public class Questions : List<Question> { }
public class Answer { public bool SelectedAnswer { get; set; }
public string AnswerDescription { get; set; } }
public class Question { public Question() { Answers = new Answers(); }
public string QuestionText { get; set; }
public Answers Answers { get; set; }
public int Id { get; set; } }
public class Paper { public Paper() { Questions = new Questions(); }
public string Candidate { get; set; }
public string PositionApplies { get; set; }
public DateTime ApplicationDate { get; set; }
public bool FillInOffice { get; set; }
public Questions Questions { get; set; }
public string Skill { get; set; } } |
至今为止,看起来还是一片完美,类型和属性都是直接映射到XML对应的节点和属性上去了,接着我们想用下面的代码来尝试将自己生成的自定义的对象序列化成一个XML文件,并且对比两个文件的内存是否相似:
// 省去了实例化自定义对象的一些代码
using (XmlWriter writer = XmlWriter.Create(@"c:"test.xml") ) { XmlSerializer xs = new XmlSerializer(typeof(Paper)); xs.Serialize(writer, paper); } |
然而最终的序列化结果却不能被InfoPath打开,对比我们序列化生产的文件和InfoPath生成的文件,可以发现两个文件主要的差别在对于节点的处理,有些节点的内容可以为空,而我们在C#代码里面却定义成值类型—值类型不能为空值,所有的节点和属性都是在一个命名空间下面的(而我们通过序列化生成的文件是在默认的命名空间底下的)。在下表中,我将序列化生成的文件的差别用浅灰色标注出来,而InfoPath文件的差别用黄色标注出来:
序列化生成的文件 |
InfoPath原始文件 |
<?xml version="1.0" encoding="utf-8"?> <Paper xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Candidate>张三</Candidate> <PositionApplies>STE</PositionApplies> <ApplicationDate>2009-03-10T00:59:05.9912457+08:00</ApplicationDate> <FillInOffice>false</FillInOffice> <Questions> <Question> <QuestionText>This is a test question</QuestionText> <Answers> <Answer> <SelectedAnswer>false</SelectedAnswer> <AnswerDescription>Answer 1</AnswerDescription> </Answer> <Answer> <SelectedAnswer>true</SelectedAnswer> <AnswerDescription>Answer 2</AnswerDescription> </Answer> <Answer> <SelectedAnswer>false</SelectedAnswer> <AnswerDescription>Answer 3</AnswerDescription> </Answer> <Answer> <SelectedAnswer>false</SelectedAnswer> <AnswerDescription>Answer 4</AnswerDescription> </Answer> </Answers> <Id>0</Id> </Question> </Questions> <Skill>C#</Skill> </Paper> |
<?xml version="1.0" encoding="UTF-8"?> <? background: yellow; color: blue; font-family: 'Courier New';"> solutionVersion="1.0.0.15" productVersion="12.0.0" PIVersion="1.0.0.0" href="file:///C:"Documents%20and%20Settings"xyz"My%20Documents"Paper.xsn" name="urn:schemas-microsoft-com:office:infopath:Paper:-myXSD-2005-10-21T21-12-27" ?> <? background: yellow; color: blue; font-family: 'Courier New';"> progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?> <my:Paper xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27" xmlns:xd=http://schemas.microsoft.com/office/infopath/2003 xml:lang="zh-cn"> <my:Candidate>张三</my:Candidate> <my:PositionApplies>STE</my:PositionApplies> <my:ApplicationDate xsi:nil="true"></my:ApplicationDate> <my:FillInOffice>false</my:FillInOffice> <my:Questions> <my:Question> <my:QuestionText>asdfasdf</my:QuestionText> <my:Answers my:SelectedAnswer="false"> <my:AnswerDescription>asfasfdsa</my:AnswerDescription> </my:Answers> <my:Answers my:SelectedAnswer="true"> <my:AnswerDescription>afsafd</my:AnswerDescription> </my:Answers> <my:Answers my:SelectedAnswer="false"> <my:AnswerDescription>afasfdf</my:AnswerDescription> </my:Answers> <my:Answers my:SelectedAnswer="false"> <my:AnswerDescription>asfadsfsafd</my:AnswerDescription> </my:Answers> <my:Id>1</my:Id> </my:Question> </my:Questions> <my:Skill>C#</my:Skill> </my:Paper> |
其中,我们可以注意到两者的差别有下列几项,为了方便说明,下面的文字将InfoPath文件简称I文件,而序列化文件简称S文件:
1. I文件中所有的节点都在别名为my的命名空间里面.
2. S文件没有<?mso* ?>的程序处理指令节点(ProcessInstruction Node),这也就是为什么我们用序列化生成的XML文件在资源管理器里不能通过双击用InfoPath打开文件.
3. I文件里面有一些值类型的节点的值为空,而在C#中值类型是不能为空的.
4. S文件里面将Paper.Questions.Answers数组的Answer都当作一个个独立的节点序列化了,而I文件当中,每一个C#的Answer对象都是用Answers节点表示.
5. I文件里面,SelectedAnswer是一个Xml属性,而S文件当中,它却被序列化成一个Xml节点了。
虽然有这么多的差异,但是幸运的是,我们的.NET Framework的XML序列化功能足够强大,强大到可以让我们通过编写很少的代码将上面4个差异去掉.
1. XmlSerializer提供了一个构造函数,允许我们在执行序列化的时候,显示地加上需要的命名空间支持.然后你可以在类型的每一个域(Field)上面加上对应的属性表明你要将该域(Field)序列化到哪一个命名空间里面去.例如下面代码
public class Question { public Question() { Answers = new Answers(); } [XmlElement( "http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27")] public string QuestionText { get; set; } public Answers Answers { get; set; } public int Id { get; set; } }
XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27");
XmlSerializer xs = new XmlSerializer(typeof(Paper)); xs.Serialize(reader, paper, ns); |
2. 这个差别.NET序列化没有提供在序列化的时候写入程序处理指令信息的功能,因为序列化应该只做序列化操作,其他的操作应该由其他的函数或者功能模块完成,为什么?想不通的话请去读一下著名的《Code Complete》。但是我们可以通过在序列化的时候,给XmlSeriailzer提供一个XmlWriter实例,序列化操作执行之前,我们先显示地写入程序处理指令信息,然后再让XmlSerializer接着在XmlWriter当前的指针位置开始我们未竟的事业。
3. 这个可以通过C# 2.0里面就提供的可空值类型(Nullable)来处理,如果你没有使用或者听说过可空值类型(Nullable)的话,还是赶紧去翻翻书吧,毕竟现在C# 4.0都快要发布了……
4. XmlSerializer所在的命名空间提供了很多属性(Attribute)来控制序列化过程中节点写入的方式,XmlElement属性(Attribute)可以在序列化数组的时候,让你有机会指定数组中每一个元素序列化到Xml文件当中对应的节点名称。
5. 而XmlAttribute属性(Attribute)将XmlSerializer默认的把类型域序列化成一个Xml节点的操作修改成序列化成一个Xml属性,在将类型域序列化Xml属性的时候,默认的行为是不会加上为Xml属性加上命名空间的,因此我们需要指定XmlAttribute的Form域设置成XmlSchemaForm.Qualified,这样在序列化的时候,XmlSerializer就会自动为Xml属性加上命名空间了。
下面是完整的序列化和反序列化的源代码:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; using System.Xml.Serialization; using System.Xml; using System.Xml.Schema;
namespace TestSerialization { public class Answers : List<Answer> { }
public class Questions : List<Question> { }
public class Answer { [XmlAttribute(Form = XmlSchemaForm.Qualified)] public bool SelectedAnswer { get; set; }
public string AnswerDescription { get; set; } }
public class Question { public Question() { Answers = new Answers(); }
public string QuestionText { get; set; }
[XmlElement] public Answers Answers { get; set; }
public int Id { get; set; } }
public class Paper { public Paper() { Questions = new Questions(); }
public string Candidate { get; set; }
public string PositionApplies { get; set; }
public DateTime? ApplicationDate { get; set; }
public bool FillInOffice { get; set; }
[XmlArray] public Questions Questions { get; set; }
public string Skill { get; set; } }
class Program { static void Main(string[] args) { Paper paper = new Paper { ApplicationDate = DateTime.Now, Candidate = "Shi Yimin", PositionApplies = "STE I" };
Question question = new Question(); question.QuestionText = "This is a test question"; question.Answers.Add(new Answer { AnswerDescription = "Answer 1", SelectedAnswer = false }); question.Answers.Add(new Answer { AnswerDescription = "Answer 2", SelectedAnswer = true }); question.Answers.Add(new Answer { AnswerDescription = "Answer 3", SelectedAnswer = false }); question.Answers.Add(new Answer { AnswerDescription = "Answer 4", SelectedAnswer = false });
paper.Questions.Add(question); paper.Skill = "C#";
XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27"); ns.Add("xd", "http://schemas.microsoft.com/office/infopath/2003");
using (XmlWriter writer = XmlWriter.Create(@"c:"test.xml") ) { writer.WriteProcessingInstruction(""> "solutionVersion=""1.0.0.15"" productVersion=""12.0.0"" " + "PIVersion=""1.0.0.0"" href=""file:///C:""Documents%20and%20Settings""v-yishi""My%20Documents""Paper.xsn"" " + "name=""urn:schemas-microsoft-com:office:infopath:Paper:-myXSD-2005-10-21T21-12-27"""); writer.WriteProcessingInstruction(""> "progid=""InfoPath.Document"" versionProgid=""InfoPath.Document.2"""); XmlSerializer xs = new XmlSerializer(typeof(Paper), "http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27"); xs.Serialize(writer, paper, ns); }
using ( FileStream fs = new FileStream(@"E:"家庭作业"InterviewForm.xml", FileMode.Open) ) { XmlReader reader = XmlReader.Create(fs); reader.MoveToElement(); XmlSerializer xs =new XmlSerializer(typeof(Paper), "http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-10-21T21:12:27"); Paper paper1 = (Paper)xs.Deserialize(reader); } } } } |
用InfoPath打开我们序列化的Xml文件试试?喔……
实际上,既然从Xml节点到类型的映射这么直接,能不能有个自动化的方法生成映射的类型呢?呵呵,未完待续.