C# XML序列化/反序列化参考

.NET提供了很不错的XML序列化/反序列化器,(它们所在的命名空间为System.Xml.Serialization)这是很方便的,下面对它的使用做一些总结,以供参考。

1,简单序列化

public static string SerializeXml(object data) {
    using (StringWriter sw = new StringWriter()) {
        XmlSerializer xz = new XmlSerializer(data.GetType());
        xz.Serialize(sw, data);
        return sw.ToString();
    }
}

以上代码是序列化为字符串,如果需要以流的形式返回序列化结果给客户端,或写入文件,那么通常需要选择一种编码,常见的编码格式是UTF-8,但某些特殊场合也许你会被要求使用GB2312编码,下面例子是使用GB2312编码的情况:

public static MemoryStream SerializeXml(object data) {
    MemoryStream ms = new MemoryStream();
    StreamWriter sw = new StreamWriter(ms, Encoding.GetEncoding("GB2312"));
    XmlSerializer xz = new XmlSerializer(data.GetType());
    xz.Serialize(sw, data);
    return ms;
}

这样就直接把对象以特定编码格式序列化到MemoryStream里去了,当然也许你想到了,先使用前面的SerializeXml生成字符串,再把字符串以特定编码格式写到流或者字节数组中去不行吗?当然行,不过这样会多出一步,不够直接。

这里还有个要注意的地方,序列化到流的时候,不要对Stream及TextWriter对象包在using里,因为这样会导致流返回的时候已经被关闭。

2,简单反序列化

FileStream fs = File.Open("file.xml", FileMode.Open);
using (StreamReader sr = new StreamReader(fs, Encoding.UTF8)) {
    XmlSerializer xz = new XmlSerializer(typeof(Department));
    Department dept = (Department)xz.Deserialize(sr);
    //blah blah ...
}

其中Department是你要反序列化出来的类,同样需要注意编码,这里指定的是UTF-8,但不排除有别的可能。

其实序列化和反序列化时可逆的,你通过怎样的类和编码把对象序列化成xml,就能通过怎样的类和编码将xml反序列化成对象。

3,指定XML标签的名字

[XmlRoot("department")]
public class Department {
    public string DeptName { get; set; }

    [XmlElement("extra")]
    public DeptExtraInfo DeptExtraInfo { get; set; }
}

通过XmlRoot注解和XmlElement注解即可实现,其中XmlRoot用于指定“根”,也就是XML的最上一层的Tag。

4,指定XML标签的属性

[XmlRoot("department")]
public class Department {
    public string DeptName { get; set; } = "研发部";

    [XmlAttribute("timestamp")]
    public int Timestamp = 10;
}

利用XmlAttribute注解,这么一来,Timestamp就成为了department这个根节点的timestamp属性。

5,去掉不需要的namespace

默认情况下,xml的头会带上类似这样的一个namespace:

<department xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- blah blah blah -->
</department>

你不需要的话可以修改一下序列化方法:

public static string SerializeXml(object data) {
    using (Utf8Writer sw = new Utf8Writer()) {
        XmlSerializer xz = new XmlSerializer(data.GetType());
        XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
        ns.Add("", "");
        xz.Serialize(sw, data, ns);
        return sw.ToString();
    }
}

上面提到的Utf8Writer是指定要用UTF-8的编码输出,包括xml头部的encoding属性,的代码如下:

    public sealed class Utf8Writer : StringWriter {
        public override Encoding Encoding => Encoding.UTF8;
    }

顺便提一下,一般情况下,我们都会使用UTF-8编码,但也有特殊情况(在对接一些历史遗留系统时)下是需要用GBK (现在和GB2312算是同义) 的,这种情况下,Encoding改为:

public override Encoding Encoding => Encoding.GetEncoding("GB2312");

 

6,序列化集合的时候不要“再包一层”

这个怎么说呢?先看这么一个类:

    [XmlRoot("department")]
    public class Department {
        public string DeptName { get; set; };

        public List<Employee> Details { get; set; };
    }

序列化出来的结果是:

<?xml version="1.0" encoding="utf-8"?>
<department>
  <DeptName>研发部</DeptName>
  <Employees>
    <Employee>
      <EmpName>张三</EmpName>
      <EmpSalary>10000</EmpSalary>
    </Employee>
    <Employee>
      <EmpName>李四</EmpName>
      <EmpSalary>8000</EmpSalary>
    </Employee>
  </Employees>
</department>

注意Employee这个标签外面包了一层Employees,这个也许不是你想要的结果,这才是你想要的结果:

<?xml version="1.0" encoding="utf-8"?>
<department>
  <DeptName>研发部</DeptName>
  <Employee>
    <EmpName>张三</EmpName>
    <EmpSalary>10000</EmpSalary>
  </Employee>
  <Employee>
    <EmpName>李四</EmpName>
    <EmpSalary>8000</EmpSalary>
  </Employee>
</department>

这个怎么做呢?很简单,在Employees前面加个XmlElement注解即可:

    [XmlRoot("department")]
    public class Department {
        public string DeptName { get; set; } = "研发部";

        [XmlElement("Employee")]
        public List<Employee> Employees { get; set; } = new List<Employee>();
    }

另外,如果是只是想改一下之前的Employees标签的名字的话,用这样一个注解:[XmlArray("NewName")]。

7,序列化null值属性

默认情况下,null值的属性是不会被序列化的,想想看为什么?

因为生成<DeptName />这样的序列化结果的话,没办法知道DeptName到底是null还是空字符串,所以比较好的解决方法是在序列化之前,把null字符串填充为空字符串。可以考虑写一个帮助方法,利用反射遍历一个对象里的所有字符串属性,将null设置为空字符串,当然了,实际的情况要考虑得更全面点,比如对象里还有对象,而且还包含可枚举对象的情况,估计得使用递归。篇幅问题,代码我就不贴了。

另外还有一种比较地道的做法,不需要改变对象的值,那就是在对象上加上[XmlElement(IsNullable = true)]注解,但这样带来的问题就是会在序列化生成的tag中多出一个xsi:nil="true"这样的属性来。

8,手工反序列化

有些情况实在太特殊,没办法直接用简单的Deserialize方法来反序列化,例如这个XML:

<?xml version="1.0" encoding="UTF-8"?>
<ns0:DeliveryAddressUpdate_S10 xmlns:ns0="urn:ABC:GAIA:CN:LoadSetNoAndChineseDelAddr:ISC0186">
  <Line>
    <ASNNNB>95175154 </ASNNNB>
    <CHDANR>00476</CHDANR>
    <ASCUID>SHD3SHD3</ASCUID>
    <IGAAUC>上海</IGAAUC>
    <IGAAUC>闵行区</IGAAUC>
    <IGAAUC>七莘路8888号</IGAAUC>
    <IGAAUC>XXXX大楼XXXX室</IGAAUC>
  </Line>
  <Line>
    <ASNNNB>124321 </ASNNNB>
    <CHDANR>4321</CHDANR>
    <ASCUID>4312</ASCUID>
    <IGAAUC>上海</IGAAUC>
    <IGAAUC>浦东新区</IGAAUC>
    <IGAAUC>浦东大道9999号</IGAAUC>
    <IGAAUC>YYYY大楼YYYY室</IGAAUC>
  </Line>
</ns0:DeliveryAddressUpdate_S10>

首先根节点很奇葩,默认反序列化器不认,另外就是IGAAUC,重复多次,它的意图是说重复的这几个IGAAUC拼接在一起,生成一个地址,这个默认的反序列化显然做不到,手工读吧,参考代码如下:

List<Address> addrList = new List<Address>();
Address currentAddress = new Address();
XmlTextReader reader = new XmlTextReader(new MemoryStream(File.ReadAllBytes("test.xml")));
while (reader.Read()) {
    if (reader.IsStartElement()) {
        switch (reader.Name) {
            case "Line":
                currentAddress = new Address();
                addrList.Add(currentAddress);
                break;
            case "ASNNNB":
                currentAddress.Asnnb = reader.ReadString();
                break;
            case "CHDANR":
                currentAddress.Chdanr = reader.ReadString();
                break;
            case "ASCUID":
                currentAddress.Ascuid = reader.ReadString();
                break;
            case "IGAAUC":
                currentAddress.Igaauc += reader.ReadString().Trim() + "\r\n";
                break;
        }
    }
}
//addrList便是结果

 9,强制闭合标签

默认情况下,如果XML的标签中的内容为空的话,标签就“自闭合”(self-closing),如:

<?xml version="1.0" encoding="gb2312"?>
<HCHX_DATA>
  <TRANSMIT>
    <MESSAGE_ID />
    <MESSAGE_TYPE />
    <EMS_NO />
    <ORDER_NO />
    <FUNCTION_CODE />
    <CHK_RESULT />
    <MESSAGE_DATE>0001-01-01T00:00:00</MESSAGE_DATE>
    <SENDER_ID />
    <SEND_ADDRESS />
    <RECEIVER_ID />
    <RECEIVER_ADDRESS />
    <MESSAGE_SIGN />
    <SEND_TYPE />
  </TRANSMIT>
</HCHX_DATA>

而我们想要这样的效果:

<?xml version="1.0" encoding="gb2312"?>
<HCHX_DATA>
  <TRANSMIT>
    <MESSAGE_ID></MESSAGE_ID>
    <MESSAGE_TYPE></MESSAGE_TYPE>
    <EMS_NO></EMS_NO>
    <ORDER_NO></ORDER_NO>
    <FUNCTION_CODE></FUNCTION_CODE>
    <CHK_RESULT></CHK_RESULT>
    <MESSAGE_DATE>0001-01-01T00:00:00</MESSAGE_DATE>
    <SENDER_ID></SENDER_ID>
    <SEND_ADDRESS></SEND_ADDRESS>
    <RECEIVER_ID></RECEIVER_ID>
    <RECEIVER_ADDRESS></RECEIVER_ADDRESS>
    <MESSAGE_SIGN></MESSAGE_SIGN>
    <SEND_TYPE></SEND_TYPE>
  </TRANSMIT>
</HCHX_DATA>

那就需要自己对XmlWriter进行一些修改,用一个新的序列化的方法:

        public static string SerializeXml(object data) {
            XmlSerializer s = new XmlSerializer(data.GetType());
            using (Utf8Writer sw = new Utf8Writer()) {
                XmlWriterSettings settings = new XmlWriterSettings();
                settings.NewLineChars = Environment.NewLine;
                settings.Indent = true;
                XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
                ns.Add("", "");
                using (XmlWriter writer = XmlWriter.Create(sw, settings)) {
                    s.Serialize(new XmlWriterForceFullEnd(writer), data, ns);
                }
                return sw.ToString();
            }
        }

上面的代码中还包括了XmlWriterSettings,它可以指定缩进,换行等格式。对于XmlWriter,用XmlWriterForceFullEnd进行一下包装,XmlWriterForceFullEnd的代码如下:

        public class XmlWriterForceFullEnd : XmlWriter {
            private readonly XmlWriter _baseWriter;

            public XmlWriterForceFullEnd(XmlWriter w) {
                _baseWriter = w;
            }

            //Force WriteEndElement to use WriteFullEndElement
            public override void WriteEndElement() { _baseWriter.WriteFullEndElement(); }

            public override void WriteFullEndElement() {
                _baseWriter.WriteFullEndElement();
            }

            public override void Close() {
                _baseWriter.Close();
            }

            public override void Flush() {
                _baseWriter.Flush();
            }

            public override string LookupPrefix(string ns) {
                return (_baseWriter.LookupPrefix(ns));
            }

            public override void WriteBase64(byte[] buffer, int index, int count) {
                _baseWriter.WriteBase64(buffer, index, count);
            }

            public override void WriteCData(string text) {
                _baseWriter.WriteCData(text);
            }

            public override void WriteCharEntity(char ch) {
                _baseWriter.WriteCharEntity(ch);
            }

            public override void WriteChars(char[] buffer, int index, int count) {
                _baseWriter.WriteChars(buffer, index, count);
            }

            public override void WriteComment(string text) {
                _baseWriter.WriteComment(text);
            }

            public override void WriteDocType(string name, string pubid, string sysid, string subset) {
                _baseWriter.WriteDocType(name, pubid, sysid, subset);
            }

            public override void WriteEndAttribute() {
                _baseWriter.WriteEndAttribute();
            }

            public override void WriteEndDocument() {
                _baseWriter.WriteEndDocument();
            }

            public override void WriteEntityRef(string name) {
                _baseWriter.WriteEntityRef(name);
            }

            public override void WriteProcessingInstruction(string name, string text) {
                _baseWriter.WriteProcessingInstruction(name, text);
            }

            public override void WriteRaw(string data) {
                _baseWriter.WriteRaw(data);
            }

            public override void WriteRaw(char[] buffer, int index, int count) {
                _baseWriter.WriteRaw(buffer, index, count);
            }

            public override void WriteStartAttribute(string prefix, string localName, string ns) {
                _baseWriter.WriteStartAttribute(prefix, localName, ns);
            }

            public override void WriteStartDocument(bool standalone) {
                _baseWriter.WriteStartDocument(standalone);
            }

            public override void WriteStartDocument() {
                _baseWriter.WriteStartDocument();
            }

            public override void WriteStartElement(string prefix, string localName, string ns) {
                _baseWriter.WriteStartElement(prefix, localName, ns);
            }

            public override WriteState WriteState {
                get { return _baseWriter.WriteState; }
            }

            public override void WriteString(string text) {
                _baseWriter.WriteString(text);
            }

            public override void WriteSurrogateCharEntity(char lowChar, char highChar) {
                _baseWriter.WriteSurrogateCharEntity(lowChar, highChar);
            }

            public override void WriteWhitespace(string ws) {
                _baseWriter.WriteWhitespace(ws);
            }
        }

其实关键的就是WriteEndElement这个方法,一行代码而已,但由于XmlWriter 是抽象类,很多虚方法得这么转一下。

 

posted @ 2017-09-07 13:43  guogangj  阅读(23145)  评论(3编辑  收藏  举报