代码笔记:使用Xml2Linq和CodeDom自动重整多语化资源文件
本贴没啥干货,纯碎只是记录下写过的代码,路过的大侠们可以绕道走~
背景:项目的多语化的Message使用的是:
- 用XML Messages.xml 来存放languages resource
- 使用C# MessageKeys 类存放所有Message key及其在XML资源文件中的对应Message ID
- 在代码中调用这个MessageKey
现在,项目有个Refactor工作,他们设计了更高一级别的Common Message,这样就不希望各个子项目中的message太繁杂难管理,而我“不幸”被分派了这个比较坑妈的活,就是将我当时所在项目的近2000条Message的一条一条的整理查看,并将可以转换成Common Message的转换掉,而且Message ID和Message Code都要重新整理,且里面的描述性文字都得由BA重新审核了改掉。这样就是个很痛苦的手工活。我想想就受不了,这眼得瞎,只得想了个办法。
- 将Messages.xml的文件及对应的MessageKey name导出成excel,这样大家可以看着分析excel
- 经过大家的讨论消减后,得到最终化的excel
- 将excel重新导成Messages.xml,以及C# MessageKeys类(因为之前有将MessageKey也导出到EXCEL中,是为了能导出)
- 各程序员们开始进行代码重构,然后BA可以再慢慢对DESCRIPTION修修补补,我就可以再使用这个自动程序生成XML
- 将来有任何变化,可以直接改了XML,再自动生成CLASS,很方便
于是,这里面就要用到几个小技术
- 为XML构建一些DTO (Data Transfer Object)
- 导出XML为excel,这个要用到System.Xml.Linq
- 导出excel为XML,这个要使用到System.Xml.Serialization,以及使用Excel作为数据源(直接用的connection string,没有用interrop)
- 导出excel为C#代码文件,使用System.CodeDom
多的不说了,上代码了,第一次使用CodeDom自动生成C#,还是挺有意思的。
XML Messages.xml 文件
<Messages xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <MessagesByType typeCode="1" typeName="Validation Error"> <Message id="110"> <EventId>50</EventId> <Code>110</Code> <ShortText>Invalid value </ShortText> <Description>{0} can't be less than {1}.</Description> <Variables> <Variable formatIndex="0">Field name</Variable> <Variable formatIndex="1">Field name</Variable> </Variables> </Message> </MessagesByType> <MessagesByType typeCode="2" typeName="Application Error"> <Message id="410"> <EventId>50</EventId> <Code>410</Code> <ShortText>Invalid value </ShortText> <Description>{0} can't be less than {1}.</Description> <Variables> <Variable formatIndex="0">Field name</Variable> <Variable formatIndex="1">Field name</Variable> </Variables> </Message> </MessagesByType> </Messages>
从上述XML来分析,进而生成以下Data Models,新建一个类名为DataObjXML
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml.Serialization; namespace RefactorMessages { [Serializable] public class Messages { // if not setting to "MessagesByType", a parent node of <MessagesByTypes> // beyonds <MessagesByType> will be generated [XmlElement("MessagesByType")] public List<MessagesByType> MessagesByTypes { get; set; } } [Serializable] public class MessagesByType { [XmlElement("Message")] public List<Message> Messages { get; set; } [XmlAttribute] public string typeCode { get; set; } [XmlAttribute] public string typeName { get; set; } } [Serializable] public class Variable { [XmlAttribute] public string formatIndex { get; set; } [XmlText] public string text { get; set; } } [Serializable] public class Message { [XmlIgnore] [XmlAttribute] public string category { get; set; } [XmlAttribute] public string id { get; set; } [XmlIgnore] [XmlAttribute] public string old_id { get; set; } [XmlIgnore] public string TypeName { get; set; } [XmlIgnore] public string TypeCode { get; set; } [XmlIgnore] public string ClassName { get; set; } public string EventId { get; set; } public string Code { get; set; } public string ShortText { get; set; } public string Description { get; set; } public List<Variable> Variables { get; set; } } }
在做这个XML数据实体时,要留意到的是,
- 之前集合属性MessagesByTypes在没有加[XmlElement("MessagesByType")]之前,反序列化后生成的XML会是<MessagesByTypes><MessagesByType>…</MessagesByType></MessagesByTypes>,这个与我想要的有出入,所以想了这个方法,试了下,没想到能成,现在就能直接成生<Messages..><MessagesByType>..</MessagesByType></Messages>,少了代表List的上层结点。
- 如果不希望最后序列化后出现在XML,就需要增加[XmlIgnore]属性
- 如果这一个属性需要序列化为属性而非元素,需要增加[XmlAttribute]以作标识
- 另外地球人都知道的,所有的类都需要标识为可序列化,使用[Serializable]
虽然基本都是一些小常识,在这做个记录。
有了这个基本数据实体后,就可以开始工作了。
首先,需要将XML导出成我想要的可供大家坐在一起容易分析讨论的。这里面用到了System.xml.Linq,因为Message的typeCode一共可能出现5种,我想先按typeCode的id再按message的id排序再导出,完整代码如下,要注意的是,最后每个值都最好做个空值判断,(其中KeyNameInClass 就是我之后导出为C#类时需要用到的)
private DataTable GenerateDataTableFromXml(bool includeDetect = true) { DataTable dt = null; Encoding ec = new UTF8Encoding(true); using (StreamReader sr = new StreamReader(_xmlPath, ec)) { XDocument document = XDocument.Load(sr); var query = from msg in document.Descendants(@"MessagesByType").Descendants(@"Message") select new { TypeCode = msg.Parent.Attribute("typeCode").Value, TypeName = msg.Parent.Attribute("typeName").Value, MessageId = msg.Attribute("id").Value, EventID = msg.Element("EventID") == null ? string.Empty : msg.Element("EventID").Value, Code = msg.Element("Code") == null ? string.Empty : msg.Element("Code").Value, ShortText = msg.Element("ShortText") == null ? string.Empty : msg.Element("ShortText").Value, Description = msg.Element("Description") == null ? string.Empty : msg.Element("Description").Value, Variables = GetMessageVariables(msg.Element("Variables"), msg.Elements("Variable").ToList()), //the xml key's field name in MessageKey class KeyNameInClass = string.Join("\r\n", _allXMLKeys.Where(o => o.MessageId == msg.Attribute("id").Value).Select(o => o.ClassName).ToList()) }; var list = query.ToList(); dt = new DataTable(); dt.Columns.Add("Type"); dt.Columns.Add("MessageId"); dt.Columns.Add("EventID"); dt.Columns.Add("Code"); dt.Columns.Add("ShortText"); dt.Columns.Add("Description"); dt.Columns.Add("Variables"); dt.Columns.Add("KeyNameInClass"); foreach (var o in list) { DataRow dr = dt.NewRow(); dr["Type"] = o.TypeName; dr["MessageId"] = o.MessageId; dr["EventID"] = o.EventID; dr["Code"] = o.Code; dr["ShortText"] = o.ShortText; dr["Description"] = o.Description; dr["Variables"] = o.Variables; dr["KeyNameInClass"] = o.KeyNameInClass; dt.Rows.Add(dr); } } return dt; }
上面的类是为了构建一个需要导出的DataTable,然后导出为csv。
接下来,经过大家的热烈又想死的跑了近2000行的讨论分析、削削减减后,我们得到了一个全新的spreadsheet,并且将最终大家认可的保存为一个名为,let’s say,Finalized的sheet,我现在就需要将这个sheet重新导出成和之前Messages.xml一模一样格式的xml。使用到一般的ado.net,连接串技术,"Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;" 来取得excel数据,并得到之前定义的需要序列化为xml的Messages对象。
private Messages PrepareObjectFromExcel(string excelFilePath, string sheetName) { var fileName = excelFilePath; var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName); var adapter = new OleDbDataAdapter("SELECT * FROM [" + sheetName + "$]", connectionString); var ds = new DataSet(); adapter.Fill(ds, sheetName); var data = ds.Tables[sheetName].AsEnumerable(); var query = data.Select(x => new Message { TypeName = x.Field<string>("Type Name"), TypeCode = x.Field<string>("Type Code"), ClassName = x.Field<string>("Key Name In Class"), old_id = x.Field<string>("OldMessageId"), category = x.Field<string>("Category"), id = x.Field<string>("MessageId"), EventId = x.Field<string>("EventID"), Code = GetCodeFromMessageId(x.Field<string>("MessageId")), ShortText = x.Field<string>("ShortText"), Description = x.Field<string>("Description"), Variables = ReadVariables(x.Field<string>("Variables")), }).ToList<Message>().GroupBy(o => new { o.TypeCode, o.TypeName }); Messages msg = new Messages(); var msgsByTypes = new List<MessagesByType>(); foreach (var type in query) { MessagesByType msgBody = new MessagesByType { typeCode = type.Key.TypeCode, typeName = type.Key.TypeName, Messages = type.ToList() }; msgsByTypes.Add(msgBody); } msg.MessagesByTypes = msgsByTypes; }
然后写几行代码序列化一下保存为xml文件即可(以下为完整Serializer.cs,或见源码下载),对了,如果你想得到size更小的xml而不关心formatting的话,settings.Indent = true 是可以置为false的,
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; using System.Runtime.Serialization.Formatters.Binary; using System.IO.Compression; using System.Xml.Serialization; namespace RefactorMessages { /// <summary> /// from TFS\FOTFP\R2\Common\CBP\HP.FOT.CBP.Shared\Helpers\Serializer.cs /// </summary> public class Serializer { const int STRING_MAX_LENGTH = 16 * 1024 + 256;//the length of any string object < 16K #region for binaryformatter serialize public byte[] SerializeObject(object serializableObject) { MemoryStream stream = new MemoryStream(); BinaryFormatter b = new BinaryFormatter(); b.Serialize(stream, serializableObject); byte[] bytes = stream.GetBuffer(); stream.Close(); return bytes; } /// <summary> /// /// </summary> /// <typeparam name="T">the type of the object need to be serialized</typeparam> /// <param name="serializableObject"></param> /// <returns></returns> public static string SerializeObjectToString<T>(T serializableObject) { Serializer serializer = new Serializer(); byte[] bytes = serializer.SerializeObject(serializableObject); // SEP this was causing problems with deserialization so I commented it out //bytes = Compress(bytes); return ToString(bytes); } #endregion #region for binaryformatter deserialize public object DeSerializeObject(byte[] someBytes) { MemoryStream stream = new MemoryStream(someBytes); BinaryFormatter b = new BinaryFormatter(); object retObj = b.Deserialize(stream); stream.Close(); return retObj; } /// <summary> /// /// </summary> /// <typeparam name="T">the type of the object got by deserialized</typeparam> /// <param name="serializedObjectBytesStr"></param> /// <returns></returns> public static T DeSerializeObject<T>(string serializedObjectBytesStr) { T retObj; try { byte[] bytes = ToBytes(serializedObjectBytesStr); // SEP this was causing problems with deserialization so I commented it out //bytes = Decompress(bytes); Serializer serializer = new Serializer(); retObj = (T)serializer.DeSerializeObject(bytes); } catch { retObj = default(T); } return retObj; } #endregion public static string SerializeObject2XML(object serializableObject) { if (serializableObject == null) return null; return SerializeObject2XML(serializableObject, serializableObject.GetType()); } public static string SerializeObject2XML(object serializableObject, Type type) { if (serializableObject == null) return null; if (type == null) type = serializableObject.GetType(); System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings(); settings.OmitXmlDeclaration = true; settings.Indent = true; System.Text.StringBuilder builder = new System.Text.StringBuilder(); System.Xml.XmlWriter xw = System.Xml.XmlWriter.Create(builder, settings); System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(serializableObject.GetType()); x.Serialize(xw, serializableObject); return builder.ToString(); } /// <summary> /// Deserializes workflow markup into an T object /// </summary> /// <param name="xml">string workflow markup to deserialize</param> /// <param name="obj">Output T object</param> /// <param name="exception">output Exception value if deserialize failed</param> /// <returns>true if this XmlSerializer can deserialize the object; otherwise, false</returns> public static bool Deserialize<T>(string xml, out T obj, out System.Exception exception) { exception = null; obj = default(T); try { obj = Deserialize<T>(xml); return true; } catch (System.Exception ex) { exception = ex; return false; } } public static bool Deserialize<T>(string xml, out T obj) { System.Exception exception = null; return Deserialize(xml, out obj, out exception); } public static T Deserialize<T>(string xml) { System.IO.StringReader stringReader = null; try { XmlSerializer serializer = new XmlSerializer(typeof(T)); stringReader = new System.IO.StringReader(xml); return ((T)(serializer.Deserialize(System.Xml.XmlReader.Create(stringReader)))); } finally { if ((stringReader != null)) { stringReader.Dispose(); } } } #region private method private static string ToString(byte[] ms) { return Convert.ToBase64String(ms); } private static byte[] ToBytes(string serializedObj) { return Convert.FromBase64String(serializedObj); } private static byte[] Compress(byte[] buffer) { MemoryStream ms = new MemoryStream(); DeflateStream stream = new DeflateStream(ms, CompressionMode.Compress, true); stream.Write(buffer, 0, buffer.Length); stream.Close(); buffer = ms.ToArray(); ms.Close(); return buffer; } private static byte[] Decompress(byte[] buffer) { MemoryStream ms = new MemoryStream(); ms.Write(buffer, 0, buffer.Length); ms.Position = 0; DeflateStream stream = new DeflateStream(ms, CompressionMode.Decompress); stream.Flush(); byte[] decompressBuffer = new byte[STRING_MAX_LENGTH]; int nSizeIncept = stream.Read(decompressBuffer, 0, STRING_MAX_LENGTH); stream.Close(); ms.Close(); byte[] lastResult = new byte[nSizeIncept]; System.Buffer.BlockCopy(decompressBuffer, 0, lastResult, 0, nSizeIncept); return lastResult; } #endregion } }
接下来要提到的是有趣的利用代码来生成c#文件了,这里面还要提到的就是,messageKey之前的程序员们针对Key的业务SCOPE有写大量region,我不愿意扔掉这个,所以为此在spreadsheet中特地增加了一列为category来手动维护,好在MessageID的命名规则是与这个category有关的,所以这个手动维护不是难事,CodeDom支持Directive,即可以在代码前前面面写statement。唯一坑妈的是,不知道是否是出于什么安全性的考虑,CodeDom居然其实是不支持生成static类的,虽然provider有提供SupportPublicStatic(如代码),但其实是没有用的,而且见鬼的也不支持给member加readonly限制符,所以我只得较tricky的将field name写成"readonly xxxx”了,最后唯一美中不足的就是这个类不是我想要的public static class,我须得手动加上static才好,等想到办法再更新这段(目前木有),
这段利用System.CodeDom类生成代码的文件就不贴在这儿了,自行下载找GenerateCodeMessageKeyClass.cs
using System; using System.Reflection; using System.IO; using System.CodeDom; using System.CodeDom.Compiler; using Microsoft.CSharp; using System.Diagnostics; using System.Linq; namespace RefactorMessages { /// <summary> /// This code example creates a graph using a CodeCompileUnit and /// generates source code for the graph using the CSharpCodeProvider. /// </summary> public class GenerateCodeMessageKeyClass { public enum DirectiveType { None, OnlyStart, OnlyEnd, Both, } /// <summary> /// Define the compile unit to use for code generation. /// </summary> CodeCompileUnit targetUnit; /// <summary> /// The only class in the compile unit. This class contains 2 fields, /// 3 properties, a constructor, an entry point, and 1 simple method. /// </summary> CodeTypeDeclaration targetClass; /// <summary> /// Define the class. /// </summary> public GenerateCodeMessageKeyClass(Messages messages, string tower) { targetUnit = new CodeCompileUnit(); CodeNamespace clsMessageKeys = new CodeNamespace("Util.LoggingSupport"); clsMessageKeys.Imports.Add(new CodeNamespaceImport("System")); clsMessageKeys.Imports.Add(new CodeNamespaceImport("Util.Messaging")); targetClass = new CodeTypeDeclaration(tower.ToString().ToUpper() + "MessageKeys"); targetClass.IsClass = true; targetClass.TypeAttributes = TypeAttributes.Public; clsMessageKeys.Types.Add(targetClass); targetUnit.Namespaces.Add(clsMessageKeys); foreach (var msg in messages.MessagesByTypes) { AddSubClass(msg); } } /// <summary> /// Add an entry point to the class. /// </summary> public void AddSubClass(MessagesByType messagesByType) { CodeTypeDeclaration cls = new CodeTypeDeclaration(); cls.Name = string.Join("", messagesByType.typeName.Split(' ')) + "Messages"; cls.IsClass = true; cls.TypeAttributes = TypeAttributes.Public; cls.Attributes = MemberAttributes.Static; string comment; switch (messagesByType.typeCode) { case "1": comment = "Validation Error 200~399"; break; case "2": comment = "Application Error 400~799"; break; case "3": comment = "System Error 800~999"; break; case "4": comment = "Info Messages 001~099"; break; case "5": comment = "Warn Messages 100~199"; break; default: comment = string.Empty; break; } cls.Comments.Add(new CodeCommentStatement(comment)); var query = messagesByType.Messages.GroupBy(o => o.category); foreach (var grp in query) { int index = 0; var list = grp.ToList(); foreach (var msg in list) { DirectiveType directiveType = DirectiveType.None; if (!string.IsNullOrEmpty(grp.Key)) { if (list.Count == 1) { directiveType = DirectiveType.Both; } else if (index == 0) { directiveType = DirectiveType.OnlyStart; } else if (index == list.Count - 1) { directiveType = DirectiveType.OnlyEnd; } } AddFields(directiveType, grp.Key, cls, msg); index++; } } targetClass.Members.Add(cls); } /// <summary> /// /// </summary> /// <param name="flagEndOrStart">-1 start; 0 content; 1 end; 2 both for one field</param> /// <param name="cls"></param> /// <param name="msg"></param> public void AddFields(DirectiveType directiveType, string categoryName, CodeTypeDeclaration cls, Message msg) { // Declare the widthValue field. CodeMemberField field = new CodeMemberField(); field.Attributes = MemberAttributes.Public | MemberAttributes.Static; field.Name = msg.ClassName; field.Type = new CodeTypeReference("readonly MessageKey"); string messageType; switch (msg.TypeCode) { case "1": messageType = "MessageType.VALIDATION_ERROR"; break; case "2": messageType = "MessageType.APPLICATION_ERROR"; break; case "3": messageType = "MessageType.SYSTEM_ERROR"; break; case "4": messageType = "MessageType.INFO"; break; case "5": messageType = "MessageType.WARN"; break; default: messageType = "MessageType.UNASSIGNED"; Debug.Assert(false, "need to modify spreadsheet to specify the type"); break; } field.InitExpression = new CodeObjectCreateExpression( new CodeTypeReference("MessageKey"), new CodeTypeReferenceExpression(messageType), new CodePrimitiveExpression(msg.id)); // Add region if (directiveType == DirectiveType.OnlyStart || directiveType == DirectiveType.Both) { field.StartDirectives.Add(new CodeRegionDirective { RegionMode = CodeRegionMode.Start, RegionText = categoryName }); } if (directiveType == DirectiveType.OnlyEnd || directiveType == DirectiveType.Both) { field.EndDirectives.Add(new CodeRegionDirective { RegionMode = CodeRegionMode.End, RegionText = categoryName }); } cls.Members.Add(field); } /// <summary> /// Generate CSharp source code from the compile unit. /// </summary> /// <param name="filename">Output file name</param> public void Generate(string fileName) { CodeDomProvider provider = CodeDomProvider.CreateProvider("CSharp"); provider.Supports(GeneratorSupport.PublicStaticMembers); CodeGeneratorOptions options = new CodeGeneratorOptions(); options.BracingStyle = "Block"; using (StreamWriter sourceWriter = new StreamWriter(fileName)) { provider.GenerateCodeFromCompileUnit( targetUnit, sourceWriter, options); } } } }
最后生成的类会长成这样,using跑到下面去了,当然可以换一种方法来start with,现在是start with namespace,provider还支持GenerateCodeFromExpression,GenerateCodeFromStatement,这样也可以比较"tricky”的先写两行using,再写namespace且不再需要import了。总之CodeDom可以用的场合还有很多,需要用到的时候,google会告诉你的,
namespace LoggingSupport { using System; using Util.Messaging; public class MessageKeys { // Validation Error 200~399 public class ValidationErrorMessages { #region AM Common public static readonly MessageKey ValCannotBeLessThan = new MessageKey(MessageType.VALIDATION_ERROR, "110"); public static readonly MessageKey ValCannotBeGreaterThan = new MessageKey(MessageType.VALIDATION_ERROR, "111"); #endregion } // Warn Messages 100~199 public class WarningMessages { //... } }
做这个小东西,其实蛮有意思的,可惜的是,现在需要为这件事重构的子项目已经不多了,所以用处不是特别大,但至少解放了我 XD,如果在最开始做refactor时我这个小工具就在子项目组中运行,可以省掉大家不少手动copy & paste还有玩瞎眼的烦心事了,好在仍然对维护起作用,是有价值的小工具。
之后会考虑将这个console project改成个小winform,不然输入一堆path args,挺麻烦的。记录到此,写这篇博客用掉半小时,值!以后有什么小东西,一定要多记录!
BTW,WLW怎么增加tags?