VBA比较两个Excel数据的异同
代码背景:
- 由于Excel本身无法简单的比较两个Excel数据的异同,所以用VBA编写代码的方式来实现。
- 这里的比较条件是:数据行为单位,假设对应Sheet中没有重复数据,对应数据行的所有列的数据都相等,即为此行数据相同。
- 这里的两个Sheet的数据行量级别大约为:50000 * 50000,数据列大约:50,对应Cell中的字符串大约100以内,中英文混合。
- 如何在Excel中调出VBA的编写工具,请参考如下链接: https://jingyan.baidu.com/article/63f236281f17650208ab3d97.html
整体来说,需求非常明确,若是不考虑效率的话,代码逻辑比较简单,循环比较即可。
相关代码:

Sub CompareData() Dim i As Long Dim j As Long Dim fullSheetName As String fullSheetName = "Sheet1" Set fullSheet = Sheets(fullSheetName) Dim fullDataRange As Variant fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value Dim fullSheetRowMax As Long fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count Dim partialSheetName As String partialSheetName = "Sheet2" Set partialSheet = Sheets(partialSheetName) Dim partialDataRange As Variant partialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value Dim partialSheetRowMax As Long partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count Dim columnMax As Integer columnMax = 46 Dim columnMark As Integer columnMark = 48 Dim sameRow As Boolean For i = 1 To fullSheetRowMax For j = 1 To partialSheetRowMax sameRow = True For columnIndex = 1 To columnMax If fullDataRange(i, columnIndex) <> partialDataRange(j, columnIndex) Then sameRow = False Exit For End If Next columnIndex If sameRow Then fullSheet.Cells(i, columnMark) = 1 Exit For End If Next j Next i MsgBox "Successfully!" End Sub
上述代码实际运行大约用30分钟完成此数量级的比较,大约1000亿次的比较。
当然了我们需要更快的比较方式,那么就需要对大数据进行结构优化,即:将partial的sheet中的数据进行分组,比如每1000个row放到一个组里,然后用一个标志位标记这个组里1000个row是否都有相同的数据,如有都有的话,那么下次再比较的时候就可以跳过这个组,进行下一组的1000个row的循环遍历。相同数量级,大约2分钟比较完成。
注:实际数据是按照时间进行抽取出来的,所以partial的sheet数据 大致都在full的sheet的前半部分相同,如果数据无规律,非常混乱,那么还要对每一个row的数据进行结构优化,即:用另外一个标记为进行标记此row是否有相同的数据,判断的时候先判断这个标记位】
相关代码如下:
【注:函数中的一些变量都是HardCode的,要根据具体数据进行修改】

Public Type PartialBasedModule IsAllSame As Boolean SheetDataRange As Variant SameCount As Long End Type Sub CompareData() Dim i As Long Dim j As Long Dim k As Long Dim fullSheetName As String fullSheetName = "Sheet1" Set fullSheet = Sheets(fullSheetName) Dim fullDataRange As Variant fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value Dim fullSheetRowMax As Long fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count Dim partialSheetName As String partialSheetName = "Sheet2" Set partialSheet = Sheets(partialSheetName) Dim PartialDataRange As Variant PartialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value Dim partialSheetRowMax As Long partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count Dim partialSheetPages() As PartialBasedModule partialSheetPages = SeparatePartialSheet(PartialDataRange, partialSheetRowMax) Dim columnMax As Integer columnMax = 46 Dim columnMark As Integer columnMark = 48 Dim sameRow As Boolean For i = 1 To fullSheetRowMax For j = 1 To UBound(partialSheetPages) If partialSheetPages(j).SameCount < 1000 Then For k = 1 To UBound(partialSheetPages(j).SheetDataRange) sameRow = True For ColumnIndex = 1 To columnMax If fullDataRange(i, ColumnIndex) <> partialSheetPages(j).SheetDataRange(k, ColumnIndex) Then sameRow = False Exit For End If Next ColumnIndex If sameRow Then fullSheet.Cells(i, columnMark) = 1 partialSheetPages(j).SameCount = partialSheetPages(j).SameCount + 1 Exit For End If Next k Else sameRow = False End If If sameRow Then Exit For End If Next j Next i MsgBox "Successfully!" End Sub Public Function SeparatePartialSheet(ByRef PartialDataRange As Variant, ByVal rowCount As Long) As PartialBasedModule() Dim eachPageCount As Long eachPageCount = 1000 Dim pageCount As Integer pageCount = Int(rowCount / eachPageCount) + 1 Dim pageIndex As Long Dim pageArr() As PartialBasedModule Dim startIndex As Long Dim endIndex As Long For pageIndex = 1 To pageCount Dim seperatedDataRange(1 To 1000, 1 To 46) As Variant Dim seperatedIndex As Long seperatedIndex = 1 Dim colIndex As Integer If pageIndex < pageCount Then endIndex = pageIndex * eachPageCount Else endIndex = rowCount End If For startIndex = (pageIndex - 1) * eachPageCount + 1 To endIndex For colIndex = 1 To 46 seperatedDataRange(seperatedIndex, colIndex) = PartialDataRange(startIndex, colIndex) Next colIndex seperatedIndex = seperatedIndex + 1 Next startIndex Dim pageData As PartialBasedModule pageData.SheetDataRange = seperatedDataRange pageData.SameCount = 0 pageData.IsAllSame = False ReDim Preserve pageArr(pageIndex) pageArr(pageIndex) = pageData Next pageIndex SeparatePartialSheet = pageArr End Function
给每个Row都加上标记的代码如下所示:【相同界别的数据,大约1分钟完成比较】

Public Type RowModule IsSame As Boolean RowData As Variant End Type Public Type PartialBasedModule IsAllSame As Boolean SheetDataRange() As RowModule SameCount As Long End Type Sub CompareData() Dim i As Long Dim j As Long Dim k As Long Dim fullSheetName As String fullSheetName = "Sheet1" Set fullSheet = Sheets(fullSheetName) Dim fullDataRange As Variant fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value Dim fullSheetRowMax As Long fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count Dim partialSheetName As String partialSheetName = "Sheet2" Set partialSheet = Sheets(partialSheetName) Dim PartialDataRange As Variant PartialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value Dim partialSheetRowMax As Long partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count Dim partialSheetPages() As PartialBasedModule partialSheetPages = SeparatePartialSheet(PartialDataRange, partialSheetRowMax) Dim columnMax As Integer columnMax = 46 Dim columnMark As Integer columnMark = 48 Dim sameRow As Boolean For i = 1 To fullSheetRowMax For j = 1 To UBound(partialSheetPages) If partialSheetPages(j).SameCount < 1000 Then For k = 1 To UBound(partialSheetPages(j).SheetDataRange) sameRow = True If partialSheetPages(j).SheetDataRange(k).IsSame Then sameRow = False Else For ColumnIndex = 1 To columnMax If fullDataRange(i, ColumnIndex) <> partialSheetPages(j).SheetDataRange(k).RowData(ColumnIndex) Then sameRow = False Exit For End If Next ColumnIndex If sameRow Then fullSheet.Cells(i, columnMark) = 1 partialSheetPages(j).SheetDataRange(k).IsSame = True partialSheetPages(j).SameCount = partialSheetPages(j).SameCount + 1 Exit For End If End If Next k Else sameRow = False End If If sameRow Then Exit For End If Next j Next i MsgBox "Successfully!" End Sub Public Function SeparatePartialSheet(ByRef PartialDataRange As Variant, ByVal rowCount As Long) As PartialBasedModule() Dim eachPageCount As Long eachPageCount = 1000 Dim pageCount As Integer pageCount = Int(rowCount / eachPageCount) + 1 Dim pageIndex As Long Dim pageArr() As PartialBasedModule Dim startIndex As Long Dim endIndex As Long For pageIndex = 1 To pageCount Dim seperatedDataRange(1 To 1000) As RowModule Dim dataRows(1 To 1000) As Variant Dim seperatedIndex As Long seperatedIndex = 1 Dim colIndex As Integer If pageIndex < pageCount Then endIndex = pageIndex * eachPageCount Else endIndex = rowCount End If For startIndex = (pageIndex - 1) * eachPageCount + 1 To endIndex Dim dataRow(1 To 46) As Variant For colIndex = 1 To 46 dataRow(colIndex) = PartialDataRange(startIndex, colIndex) Next colIndex Dim currentRowModule As RowModule currentRowModule.RowData = dataRow currentRowModule.IsSame = False seperatedDataRange(seperatedIndex) = currentRowModule seperatedIndex = seperatedIndex + 1 Next startIndex Dim pageData As PartialBasedModule pageData.SheetDataRange = seperatedDataRange pageData.SameCount = 0 pageData.IsAllSame = False ReDim Preserve pageArr(pageIndex) pageArr(pageIndex) = pageData Next pageIndex SeparatePartialSheet = pageArr End Function
最终的一个简单的数据结构如下图所示:
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架