[Python]快速解析数据库视图XML配置获取数据库字段说明

在当前项目中,我收到数据库开发人员提供的XML视图文件,其中包含了表信息; 但这些信息混杂在大量的UI配置中,很难阅读,于是我决定用Python来编写一个简单的程序来进行 XML 解析,将所需的数据字段信息转换成CSV格式,再导入到Excel中(耗时2小时),有如下几点技术体会:

  1. Python中采用minidom进行解析时,其XML文件必须是UTF-8编码格式,否则会出错。在进行解析前要先进行编码转换工作;
  2. Python中的DOM节点Node值获取必须要用firstChild.nodeValue形式,不能直接用nodeValue来获取;
  3. Python中解析后的String值都是UTF-8格式,所以其File IO操作必须用codecs方式;
  4. Python编程时逐步从逐行解释方式过渡到OPP方式,这样虽然步骤比较多,但调试方便;

参考代码如下:

class dbviewxmladapter:
	"""
	"""

	def __init__(self):
		self._version = "0.1"
		self._path = "e:\\Temp\\Work"
		self._files = []
		self._lines = []

	def setPath( self, path ):
		self._path = path

	def addFile( self, filename ):
		self._files.append( filename )

	def getNodeValue( self, element, tagName ):
		return element.getElementsByTagName( tagName )[0].firstChild.nodeValue

	def getSubNodeValue( self, element, tagName ):
		subNode = element.getElementsByTagName( 'BizObjPropertyDBInfo' )[0]
		return subNode.getElementsByTagName( tagName )[0].firstChild.nodeValue

	def parseXml( self ):
		import xml.dom.minidom
		try:
			for file in self._files:
				filename = self._path + '\\' + file
				print filename
				f = open( filename )
				doc = xml.dom.minidom.parse( f )
				viewEName = doc.getElementsByTagName('BizObject')[0].getElementsByTagName('EName')[0].firstChild.nodeValue
				viewCName = doc.getElementsByTagName('BizObject')[0].getElementsByTagName('CName')[0].firstChild.nodeValue
				line = viewEName + ', , , , , , ' + viewCName
				self._lines.append( line )
				items = doc.getElementsByTagName( 'BizObjProperty' )
				for item in items:
					EName = self.getNodeValue( item, 'EName' )
					CName = self.getNodeValue( item, 'CName' )
					Description = self.getNodeValue( item, 'Description' )
					Type = self.getSubNodeValue( item, 'Type' )
					Length = self.getSubNodeValue( item, 'Length' )
					Size = self.getSubNodeValue( item, 'Size')
					IsPK = self.getSubNodeValue( item, 'IsPK' ) == '1'
					IsNullable = self.getSubNodeValue( item, 'IsNullable' ) == '1'
					line = EName + ',' + Type + ',' + Length + ',' + Size + ',' + str(IsPK) + ', ' + str(IsNullable) + ',' + CName + ':' + Description
					self._lines.append( line )
		finally:
			print "over"

	def printLines( self ):
		for line in self._lines:
			print line
			
	def writeToCSVFile( self, outfilename ):
		import codecs
		filename = self._path + '\\' + outfilename
		f = codecs.open( filename,'w','utf-8' )
		for line in self._lines:
			f.write( line + '\n' )
		f.flush()
		f.close()

# TestSuite Scripts
aObject = dbviewxmladapter()
for i in range(5):
	filename = str(i+1) + ".xml"
	aObject.addFile( filename )
#aObject.addFile("5.xml")
aObject.parseXml()
#aObject.printLines()
aObject.writeToCSVFile( "all.csv" )
posted @ 2010-05-17 16:13  yankchina  阅读(525)  评论(0编辑  收藏  举报