C# XML序列化/反序列化参考
.NET提供了很不错的XML序列化/反序列化器,(它们所在的命名空间为System.Xml.Serialization)这是很方便的,下面对它的使用做一些总结,以供参考。
1,简单序列化
public static string SerializeXml(object data) { using (StringWriter sw = new StringWriter()) { XmlSerializer xz = new XmlSerializer(data.GetType()); xz.Serialize(sw, data); return sw.ToString(); } }
以上代码是序列化为字符串,如果需要以流的形式返回序列化结果给客户端,或写入文件,那么通常需要选择一种编码,常见的编码格式是UTF-8,但某些特殊场合也许你会被要求使用GB2312编码,下面例子是使用GB2312编码的情况:
public static MemoryStream SerializeXml(object data) { MemoryStream ms = new MemoryStream(); StreamWriter sw = new StreamWriter(ms, Encoding.GetEncoding("GB2312")); XmlSerializer xz = new XmlSerializer(data.GetType()); xz.Serialize(sw, data); return ms; }
这样就直接把对象以特定编码格式序列化到MemoryStream里去了,当然也许你想到了,先使用前面的SerializeXml生成字符串,再把字符串以特定编码格式写到流或者字节数组中去不行吗?当然行,不过这样会多出一步,不够直接。
这里还有个要注意的地方,序列化到流的时候,不要对Stream及TextWriter对象包在using里,因为这样会导致流返回的时候已经被关闭。
2,简单反序列化
FileStream fs = File.Open("file.xml", FileMode.Open); using (StreamReader sr = new StreamReader(fs, Encoding.UTF8)) { XmlSerializer xz = new XmlSerializer(typeof(Department)); Department dept = (Department)xz.Deserialize(sr); //blah blah ... }
其中Department是你要反序列化出来的类,同样需要注意编码,这里指定的是UTF-8,但不排除有别的可能。
其实序列化和反序列化时可逆的,你通过怎样的类和编码把对象序列化成xml,就能通过怎样的类和编码将xml反序列化成对象。
3,指定XML标签的名字
[XmlRoot("department")] public class Department { public string DeptName { get; set; } [XmlElement("extra")] public DeptExtraInfo DeptExtraInfo { get; set; } }
通过XmlRoot注解和XmlElement注解即可实现,其中XmlRoot用于指定“根”,也就是XML的最上一层的Tag。
4,指定XML标签的属性
[XmlRoot("department")] public class Department { public string DeptName { get; set; } = "研发部"; [XmlAttribute("timestamp")] public int Timestamp = 10; }
利用XmlAttribute注解,这么一来,Timestamp就成为了department这个根节点的timestamp属性。
5,去掉不需要的namespace
默认情况下,xml的头会带上类似这样的一个namespace:
<department xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- blah blah blah --> </department>
你不需要的话可以修改一下序列化方法:
public static string SerializeXml(object data) { using (Utf8Writer sw = new Utf8Writer()) { XmlSerializer xz = new XmlSerializer(data.GetType()); XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("", ""); xz.Serialize(sw, data, ns); return sw.ToString(); } }
上面提到的Utf8Writer是指定要用UTF-8的编码输出,包括xml头部的encoding属性,的代码如下:
public sealed class Utf8Writer : StringWriter { public override Encoding Encoding => Encoding.UTF8; }
顺便提一下,一般情况下,我们都会使用UTF-8编码,但也有特殊情况(在对接一些历史遗留系统时)下是需要用GBK (现在和GB2312算是同义) 的,这种情况下,Encoding改为:
public override Encoding Encoding => Encoding.GetEncoding("GB2312");
6,序列化集合的时候不要“再包一层”
这个怎么说呢?先看这么一个类:
[XmlRoot("department")] public class Department { public string DeptName { get; set; }; public List<Employee> Details { get; set; }; }
序列化出来的结果是:
<?xml version="1.0" encoding="utf-8"?> <department> <DeptName>研发部</DeptName> <Employees> <Employee> <EmpName>张三</EmpName> <EmpSalary>10000</EmpSalary> </Employee> <Employee> <EmpName>李四</EmpName> <EmpSalary>8000</EmpSalary> </Employee> </Employees> </department>
注意Employee这个标签外面包了一层Employees,这个也许不是你想要的结果,这才是你想要的结果:
<?xml version="1.0" encoding="utf-8"?> <department> <DeptName>研发部</DeptName> <Employee> <EmpName>张三</EmpName> <EmpSalary>10000</EmpSalary> </Employee> <Employee> <EmpName>李四</EmpName> <EmpSalary>8000</EmpSalary> </Employee> </department>
这个怎么做呢?很简单,在Employees前面加个XmlElement注解即可:
[XmlRoot("department")] public class Department { public string DeptName { get; set; } = "研发部"; [XmlElement("Employee")] public List<Employee> Employees { get; set; } = new List<Employee>(); }
另外,如果是只是想改一下之前的Employees标签的名字的话,用这样一个注解:[XmlArray("NewName")]。
7,序列化null值属性
默认情况下,null值的属性是不会被序列化的,想想看为什么?
因为生成<DeptName />这样的序列化结果的话,没办法知道DeptName到底是null还是空字符串,所以比较好的解决方法是在序列化之前,把null字符串填充为空字符串。可以考虑写一个帮助方法,利用反射遍历一个对象里的所有字符串属性,将null设置为空字符串,当然了,实际的情况要考虑得更全面点,比如对象里还有对象,而且还包含可枚举对象的情况,估计得使用递归。篇幅问题,代码我就不贴了。
另外还有一种比较地道的做法,不需要改变对象的值,那就是在对象上加上[XmlElement(IsNullable = true)]注解,但这样带来的问题就是会在序列化生成的tag中多出一个xsi:nil="true"这样的属性来。
8,手工反序列化
有些情况实在太特殊,没办法直接用简单的Deserialize方法来反序列化,例如这个XML:
<?xml version="1.0" encoding="UTF-8"?> <ns0:DeliveryAddressUpdate_S10 xmlns:ns0="urn:ABC:GAIA:CN:LoadSetNoAndChineseDelAddr:ISC0186"> <Line> <ASNNNB>95175154 </ASNNNB> <CHDANR>00476</CHDANR> <ASCUID>SHD3SHD3</ASCUID> <IGAAUC>上海</IGAAUC> <IGAAUC>闵行区</IGAAUC> <IGAAUC>七莘路8888号</IGAAUC> <IGAAUC>XXXX大楼XXXX室</IGAAUC> </Line> <Line> <ASNNNB>124321 </ASNNNB> <CHDANR>4321</CHDANR> <ASCUID>4312</ASCUID> <IGAAUC>上海</IGAAUC> <IGAAUC>浦东新区</IGAAUC> <IGAAUC>浦东大道9999号</IGAAUC> <IGAAUC>YYYY大楼YYYY室</IGAAUC> </Line> </ns0:DeliveryAddressUpdate_S10>
首先根节点很奇葩,默认反序列化器不认,另外就是IGAAUC,重复多次,它的意图是说重复的这几个IGAAUC拼接在一起,生成一个地址,这个默认的反序列化显然做不到,手工读吧,参考代码如下:
List<Address> addrList = new List<Address>(); Address currentAddress = new Address(); XmlTextReader reader = new XmlTextReader(new MemoryStream(File.ReadAllBytes("test.xml"))); while (reader.Read()) { if (reader.IsStartElement()) { switch (reader.Name) { case "Line": currentAddress = new Address(); addrList.Add(currentAddress); break; case "ASNNNB": currentAddress.Asnnb = reader.ReadString(); break; case "CHDANR": currentAddress.Chdanr = reader.ReadString(); break; case "ASCUID": currentAddress.Ascuid = reader.ReadString(); break; case "IGAAUC": currentAddress.Igaauc += reader.ReadString().Trim() + "\r\n"; break; } } } //addrList便是结果
9,强制闭合标签
默认情况下,如果XML的标签中的内容为空的话,标签就“自闭合”(self-closing),如:
<?xml version="1.0" encoding="gb2312"?> <HCHX_DATA> <TRANSMIT> <MESSAGE_ID /> <MESSAGE_TYPE /> <EMS_NO /> <ORDER_NO /> <FUNCTION_CODE /> <CHK_RESULT /> <MESSAGE_DATE>0001-01-01T00:00:00</MESSAGE_DATE> <SENDER_ID /> <SEND_ADDRESS /> <RECEIVER_ID /> <RECEIVER_ADDRESS /> <MESSAGE_SIGN /> <SEND_TYPE /> </TRANSMIT> </HCHX_DATA>
而我们想要这样的效果:
<?xml version="1.0" encoding="gb2312"?> <HCHX_DATA> <TRANSMIT> <MESSAGE_ID></MESSAGE_ID> <MESSAGE_TYPE></MESSAGE_TYPE> <EMS_NO></EMS_NO> <ORDER_NO></ORDER_NO> <FUNCTION_CODE></FUNCTION_CODE> <CHK_RESULT></CHK_RESULT> <MESSAGE_DATE>0001-01-01T00:00:00</MESSAGE_DATE> <SENDER_ID></SENDER_ID> <SEND_ADDRESS></SEND_ADDRESS> <RECEIVER_ID></RECEIVER_ID> <RECEIVER_ADDRESS></RECEIVER_ADDRESS> <MESSAGE_SIGN></MESSAGE_SIGN> <SEND_TYPE></SEND_TYPE> </TRANSMIT> </HCHX_DATA>
那就需要自己对XmlWriter进行一些修改,用一个新的序列化的方法:
public static string SerializeXml(object data) { XmlSerializer s = new XmlSerializer(data.GetType()); using (Utf8Writer sw = new Utf8Writer()) { XmlWriterSettings settings = new XmlWriterSettings(); settings.NewLineChars = Environment.NewLine; settings.Indent = true; XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("", ""); using (XmlWriter writer = XmlWriter.Create(sw, settings)) { s.Serialize(new XmlWriterForceFullEnd(writer), data, ns); } return sw.ToString(); } }
上面的代码中还包括了XmlWriterSettings,它可以指定缩进,换行等格式。对于XmlWriter,用XmlWriterForceFullEnd进行一下包装,XmlWriterForceFullEnd的代码如下:
public class XmlWriterForceFullEnd : XmlWriter { private readonly XmlWriter _baseWriter; public XmlWriterForceFullEnd(XmlWriter w) { _baseWriter = w; } //Force WriteEndElement to use WriteFullEndElement public override void WriteEndElement() { _baseWriter.WriteFullEndElement(); } public override void WriteFullEndElement() { _baseWriter.WriteFullEndElement(); } public override void Close() { _baseWriter.Close(); } public override void Flush() { _baseWriter.Flush(); } public override string LookupPrefix(string ns) { return (_baseWriter.LookupPrefix(ns)); } public override void WriteBase64(byte[] buffer, int index, int count) { _baseWriter.WriteBase64(buffer, index, count); } public override void WriteCData(string text) { _baseWriter.WriteCData(text); } public override void WriteCharEntity(char ch) { _baseWriter.WriteCharEntity(ch); } public override void WriteChars(char[] buffer, int index, int count) { _baseWriter.WriteChars(buffer, index, count); } public override void WriteComment(string text) { _baseWriter.WriteComment(text); } public override void WriteDocType(string name, string pubid, string sysid, string subset) { _baseWriter.WriteDocType(name, pubid, sysid, subset); } public override void WriteEndAttribute() { _baseWriter.WriteEndAttribute(); } public override void WriteEndDocument() { _baseWriter.WriteEndDocument(); } public override void WriteEntityRef(string name) { _baseWriter.WriteEntityRef(name); } public override void WriteProcessingInstruction(string name, string text) { _baseWriter.WriteProcessingInstruction(name, text); } public override void WriteRaw(string data) { _baseWriter.WriteRaw(data); } public override void WriteRaw(char[] buffer, int index, int count) { _baseWriter.WriteRaw(buffer, index, count); } public override void WriteStartAttribute(string prefix, string localName, string ns) { _baseWriter.WriteStartAttribute(prefix, localName, ns); } public override void WriteStartDocument(bool standalone) { _baseWriter.WriteStartDocument(standalone); } public override void WriteStartDocument() { _baseWriter.WriteStartDocument(); } public override void WriteStartElement(string prefix, string localName, string ns) { _baseWriter.WriteStartElement(prefix, localName, ns); } public override WriteState WriteState { get { return _baseWriter.WriteState; } } public override void WriteString(string text) { _baseWriter.WriteString(text); } public override void WriteSurrogateCharEntity(char lowChar, char highChar) { _baseWriter.WriteSurrogateCharEntity(lowChar, highChar); } public override void WriteWhitespace(string ws) { _baseWriter.WriteWhitespace(ws); } }
其实关键的就是WriteEndElement这个方法,一行代码而已,但由于XmlWriter 是抽象类,很多虚方法得这么转一下。