Excel开发学习笔记:读取xml文件及csv文件
遇到一个数据处理自动化的问题,于是打算开发一个基于excel的小工具。在业余时间一边自学一边实践,抽空把一些知识写下来以备今后参考,因为走的是盲人摸象的野路子,幼稚与错误请多包涵。
开发环境基于VSTO,具体配置:visual studio 2010,VB .Net,excel 2007,文档级别的定制程序。
读取xml文件
有好多种读取xml的方式,xmlDOM比较常见,我使用了另外一种,它以数据流的方式打开文件并读取内容
Imports System.Xml
Dim group As New List(Of String)
Using reader As XmlReader = XmlReader.Create(OpenFileDialog2.FileName)
While reader.ReadToFollowing("group")
reader.MoveToAttribute("name")
group.Add(reader.Value)
End While
End Using
Dim group As New List(Of String)
Using reader As XmlReader = XmlReader.Create(OpenFileDialog2.FileName)
While reader.ReadToFollowing("group")
reader.MoveToAttribute("name")
group.Add(reader.Value)
End While
End Using
读取csv文件
常见的csv文件以逗号','分割每行的字段,如果字段中正好也有',',则需要用双引号""括起来,比如下面2行的情况。这给解析字段带来了麻烦,不能简单的调用string的split函数。
"","","","","Case_1,Case_2,Case_3","Case_1,Case_2,Case_3","Case_1,Case_2,Case_3","Case_1,Case_2,Case_3",
project,Comm,factor5%(dB),"[0,Infinity]",-2.0599684918456296 ,-2.0599684918456296 ,-2.0599684918456296 ,-2.0599684918456296 ,
常见的csv文件以逗号','分割每行的字段,如果字段中正好也有',',则需要用双引号""括起来,比如下面2行的情况。这给解析字段带来了麻烦,不能简单的调用string的split函数。
"","","","","Case_1,Case_2,Case_3","Case_1,Case_2,Case_3","Case_1,Case_2,Case_3","Case_1,Case_2,Case_3",
project,Comm,factor5%(dB),"[0,Infinity]",-2.0599684918456296 ,-2.0599684918456296 ,-2.0599684918456296 ,-2.0599684918456296 ,
幸运的是遇到的需求中并不关注带双引号""的字段,所以对字符串进行了预处理,把""中的','替换掉,使之不对用','分割产生干扰。
'dataFileStr是string数组,保存了csv文件的完整内容
Dim dataFileStr As String() = IO.File.ReadAllLines(file)'file是文件路径的字符串
'转换格式
ChangeCsvFormat(dataFileStr)
'分割第一行的字段
Dim titleEle As String() = dataFileStr(0).Split(New Char() {","c})
'修改读取的csv内容,把""中间的逗号替换为~号,方便后续用逗号分词
Private Sub ChangeCsvFormat(ByRef csvLines As String())
For i As Long = 0 To csvLines.Count - 1
Dim begin As Integer = -1, endIdx As Integer = -1
Dim cnt As Integer = 0
Do
cnt += 1
If cnt Mod 2 = 0 Then
endIdx = csvLines(i).IndexOf("""", begin + 1)
If endIdx > 0 Then
Dim orgSub As String = csvLines(i).Substring(begin, endIdx - begin + 1)
Dim newSub As String = orgSub.Replace(",", "~")
csvLines(i) = csvLines(i).Replace(orgSub, newSub)
End If
Else
endIdx += 1
begin = csvLines(i).IndexOf("""", endIdx)
End If
Loop Until begin < 0 Or endIdx < 0
Next
End Sub
Dim dataFileStr As String() = IO.File.ReadAllLines(file)'file是文件路径的字符串
'转换格式
ChangeCsvFormat(dataFileStr)
'分割第一行的字段
Dim titleEle As String() = dataFileStr(0).Split(New Char() {","c})
'修改读取的csv内容,把""中间的逗号替换为~号,方便后续用逗号分词
Private Sub ChangeCsvFormat(ByRef csvLines As String())
For i As Long = 0 To csvLines.Count - 1
Dim begin As Integer = -1, endIdx As Integer = -1
Dim cnt As Integer = 0
Do
cnt += 1
If cnt Mod 2 = 0 Then
endIdx = csvLines(i).IndexOf("""", begin + 1)
If endIdx > 0 Then
Dim orgSub As String = csvLines(i).Substring(begin, endIdx - begin + 1)
Dim newSub As String = orgSub.Replace(",", "~")
csvLines(i) = csvLines(i).Replace(orgSub, newSub)
End If
Else
endIdx += 1
begin = csvLines(i).IndexOf("""", endIdx)
End If
Loop Until begin < 0 Or endIdx < 0
Next
End Sub