[Python]快速解析数据库视图XML配置获取数据库字段说明
在当前项目中,我收到数据库开发人员提供的XML视图文件,其中包含了表信息; 但这些信息混杂在大量的UI配置中,很难阅读,于是我决定用Python来编写一个简单的程序来进行 XML 解析,将所需的数据字段信息转换成CSV格式,再导入到Excel中(耗时2小时),有如下几点技术体会:
- Python中采用minidom进行解析时,其XML文件必须是UTF-8编码格式,否则会出错。在进行解析前要先进行编码转换工作;
- Python中的DOM节点Node值获取必须要用firstChild.nodeValue形式,不能直接用nodeValue来获取;
- Python中解析后的String值都是UTF-8格式,所以其File IO操作必须用codecs方式;
- Python编程时逐步从逐行解释方式过渡到OPP方式,这样虽然步骤比较多,但调试方便;
参考代码如下:
class dbviewxmladapter: """ """ def __init__(self): self._version = "0.1" self._path = "e:\\Temp\\Work" self._files = [] self._lines = [] def setPath( self, path ): self._path = path def addFile( self, filename ): self._files.append( filename ) def getNodeValue( self, element, tagName ): return element.getElementsByTagName( tagName )[0].firstChild.nodeValue def getSubNodeValue( self, element, tagName ): subNode = element.getElementsByTagName( 'BizObjPropertyDBInfo' )[0] return subNode.getElementsByTagName( tagName )[0].firstChild.nodeValue def parseXml( self ): import xml.dom.minidom try: for file in self._files: filename = self._path + '\\' + file print filename f = open( filename ) doc = xml.dom.minidom.parse( f ) viewEName = doc.getElementsByTagName('BizObject')[0].getElementsByTagName('EName')[0].firstChild.nodeValue viewCName = doc.getElementsByTagName('BizObject')[0].getElementsByTagName('CName')[0].firstChild.nodeValue line = viewEName + ', , , , , , ' + viewCName self._lines.append( line ) items = doc.getElementsByTagName( 'BizObjProperty' ) for item in items: EName = self.getNodeValue( item, 'EName' ) CName = self.getNodeValue( item, 'CName' ) Description = self.getNodeValue( item, 'Description' ) Type = self.getSubNodeValue( item, 'Type' ) Length = self.getSubNodeValue( item, 'Length' ) Size = self.getSubNodeValue( item, 'Size') IsPK = self.getSubNodeValue( item, 'IsPK' ) == '1' IsNullable = self.getSubNodeValue( item, 'IsNullable' ) == '1' line = EName + ',' + Type + ',' + Length + ',' + Size + ',' + str(IsPK) + ', ' + str(IsNullable) + ',' + CName + ':' + Description self._lines.append( line ) finally: print "over" def printLines( self ): for line in self._lines: print line def writeToCSVFile( self, outfilename ): import codecs filename = self._path + '\\' + outfilename f = codecs.open( filename,'w','utf-8' ) for line in self._lines: f.write( line + '\n' ) f.flush() f.close() # TestSuite Scripts aObject = dbviewxmladapter() for i in range(5): filename = str(i+1) + ".xml" aObject.addFile( filename ) #aObject.addFile("5.xml") aObject.parseXml() #aObject.printLines() aObject.writeToCSVFile( "all.csv" )