Excel 2007 底层实现方式
一、EXCEL的底层实现
能力有限,了解的比较浅,有不足之处望指正,首先看下图:
一、 excel2007是使用xml格式来存储的,把一个excel文件后缀改为.zip,打开之后就直接可以看到一个excel文件对应的xml格式的文件了。
1.1、docProps目录
1.1.1、 core.xml文件
<?xml version="1.0" encoding="utf-8"?> <cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <!-- 创建人 --> <dc:creator>Administrator</dc:creator> <!-- 最后修改人 --> <cp:lastModifiedBy>Administrator</cp:lastModifiedBy> <!-- 创建时间 --> <dcterms:created xsi:type="dcterms:W3CDTF">2008-09-11T17:22:52Z</dcterms:created> <!-- 修改时间 --> <dcterms:modified xsi:type="dcterms:W3CDTF">2017-03-31T15:29:39Z</dcterms:modified> </cp:coreProperties>
1.1.2、 app.xml文件
<?xml version="1.0" encoding="utf-8"?> <Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"> <!-- 文档类型 --> <Application>Microsoft Excel</Application> <!-- 是否加密 --> <DocSecurity>0</DocSecurity> <ScaleCrop>false</ScaleCrop> <HeadingPairs> <vt:vector size="2" baseType="variant"> <vt:variant> <vt:lpstr>工作表</vt:lpstr> </vt:variant> <vt:variant> <vt:i4>3</vt:i4> </vt:variant> </vt:vector> </HeadingPairs> <TitlesOfParts> <vt:vector size="3" baseType="lpstr"> <vt:lpstr>Sheet1</vt:lpstr> <vt:lpstr>Sheet2</vt:lpstr> <vt:lpstr>Sheet3</vt:lpstr> </vt:vector> </TitlesOfParts> <Company/> <LinksUpToDate>false</LinksUpToDate> <!-- 是否共享 --> <SharedDoc>false</SharedDoc> <HyperlinksChanged>false</HyperlinksChanged> <!-- 版本 --> <AppVersion>12.0000</AppVersion> </Properties>
1.2、 xl目录,该目录保存的就是excel中的详细内容了
1.2.1、workbook.xml文件
文件包含一对<sheets>标签,其中的每个<sheet>元素都代表Excel 2007文件中的一个sheet,工作表的名称就是其name属性的值,这里有三个sheet
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"> <fileVersion appName="xl" lastEdited="4" lowestEdited="4" rupBuild="4506"/> <workbookPr defaultThemeVersion="124226"/> <bookViews> <workbookView xWindow="0" yWindow="105" windowWidth="12765" windowHeight="5715"/> </bookViews> <sheets> <sheet name="Sheet1" sheetId="1" r:id="rId1"/> <sheet name="Sheet2" sheetId="2" r:id="rId2"/> <sheet name="Sheet3" sheetId="3" r:id="rId3"/> </sheets> <calcPr calcId="125725"/> </workbook>
1.2.2、 _rels/workbook.xml.rels
定义每个sheetid对应的sheet内容文件sheet1.xml,共享的单元格内容文件sharedstring.xml,样式文件style.xml是当前单元格的样式字体,颜色等样式的xml配置。
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet3.xml"/> <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet2.xml"/> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/> <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/> <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/> <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/> </Relationships>
1.2.3、 worksheets文件夹,保存的就是excel中的sheet
1.2.4、sheet.xml文件
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"> <!-- 容积,即该sheet中以供占用了多大的地方,哪个单元格到哪个单元格,矩形的左上角单元格和右下角单元格 此属性可以判断列数,如下就是:(M-A+1) --> <dimension ref="A1:M8"/> <!-- sheet视图的配置信息,如打开时默认选中的单元格是哪个 --> <sheetViews> <sheetView tabSelected="1" workbookViewId="0"> <selection activeCell="G12" sqref="G12"/> </sheetView> </sheetViews> <!-- sheet的格式参数 --> <sheetFormatPr defaultRowHeight="14.25"/> <!-- 行中每列的配置参数 --> <cols> <col min="1" max="1" width="7.875" customWidth="1"/> <col min="2" max="2" width="6.75" customWidth="1"/> <col min="3" max="3" width="6" customWidth="1"/> <col min="4" max="4" width="4.5" customWidth="1"/> <col min="5" max="5" width="8.5" customWidth="1"/> <col min="6" max="6" width="9.375" customWidth="1"/> <col min="7" max="7" width="13.5" customWidth="1"/> <col min="8" max="8" width="7" customWidth="1"/> <col min="9" max="9" width="7.625" customWidth="1"/> <col min="10" max="10" width="4.625" customWidth="1"/> <col min="11" max="11" width="9.5" bestFit="1" customWidth="1"/> </cols> <!-- 数据区域,这个里面就是我们的单元格的相关数据信息了,是以行(Row)的方式记录的 --> <sheetData> <!-- excel中的行 --> <row r="1" spans="1:13"> <!-- excel中的列 r:单元格坐标 s:单元格的style,对应style.xml中cellXfs元素下的某个元素xf元素, 此处就是对应索引为12的元素,从0开始计算,即为第13个xf元素。 t:元素类型,s表示是字符串类型,对应到sharedStrings.xml中的sst元素 --> <c r="A1" s="12" t="s"> <!-- 如果是t="s" 则存储的是sharedStrings.xml中sst元素下的某个si元素,此处对应索引为0的元素,即第一个si元素 否则,这里存储的就是单元格的值 --> <v>0</v> </c><c r="B1" s="12" t="s"> <v>10</v> </c><c r="C1" s="12" t="s"> <v>1</v> </c><c r="D1" s="12" t="s"> <v>2</v> </c><c r="E1" s="12" t="s"> <v>3</v> </c><c r="F1" s="12" t="s"> <v>4</v> </c><c r="G1" s="12" t="s"> <v>5</v> </c><c r="H1" s="12" t="s"> <v>6</v> </c><c r="I1" s="12" t="s"> <v>7</v> </c><c r="J1" s="12" t="s"> <v>8</v> </c><c r="K1" s="12" t="s"> <v>9</v> </c> </row> <row r="2" spans="1:13"> <c r="A2" t="s"> <v>11</v> </c><c r="B2" s="10" t="s"> <v>12</v> </c><c r="C2" s="1"> <v>2</v> </c><c r="D2" s="2" t="s"> <v>13</v> </c><c r="E2" s="3" t="s"> <v>14</v> </c><c r="F2" s="4"> <v>89</v> </c><c r="G2" s="5"> <v>10</v> </c><c r="H2" s="6"> <v>12</v> </c><c r="I2" s="7"> <v>0.2</v> </c><c r="J2" s="8"> <v>0.5</v> </c><c r="K2" s="9"> <v>4564645</v> </c><c r="L2" s="10"/> <c r="M2" s="11"/> </row> <row r="3" spans="1:13"> <!-- 此处为什么没有A3? 因为excel中常规列如果没有值,此处就不记录,因此在读取excel是应该特别注意,否则可能导致列错乱 --> <c r="B3" s="10"/> <c r="C3" s="1"/> <c r="D3" s="2"/> <c r="E3" s="3"/> <c r="F3" s="4"/> <c r="G3" s="5"/> <c r="H3" s="6"/> <c r="I3" s="7"/> <c r="J3" s="8"/> <c r="K3" s="9"/> </row> <!-- 该行每列都是常规列,因此没有值的就不记录在这里 --> <row r="4" spans="1:13"> <c r="A4" s="12" t="s"> <v>15</v> </c><c r="E4" t="s"> <v>16</v> </c> </row> <!-- 5,6,7 行都是合并再一起的单元格 --> <row r="5" spans="1:13"> <!-- 合并单元格的值记录在起始单元格里,其余单元格置空 --> <c r="A5" s="14" t="s"> <v>17</v> </c><c r="B5" s="13"/> <c r="C5" s="13"/> <c r="D5" s="13"/> <c r="E5" s="13"/> <c r="F5" s="13"/> <c r="G5" s="13"/> <c r="H5" s="13"/> <c r="I5" s="13"/> <c r="J5" s="13"/> <c r="K5" s="13"/> </row> <row r="6" spans="1:13"> <c r="A6" s="13"/> <c r="B6" s="13"/> <c r="C6" s="13"/> <c r="D6" s="13"/> <c r="E6" s="13"/> <c r="F6" s="13"/> <c r="G6" s="13"/> <c r="H6" s="13"/> <c r="I6" s="13"/> <c r="J6" s="13"/> <c r="K6" s="13"/> </row> <row r="7" spans="1:13"> <c r="A7" s="13"/> <c r="B7" s="13"/> <c r="C7" s="13"/> <c r="D7" s="13"/> <c r="E7" s="13"/> <c r="F7" s="13"/> <c r="G7" s="13"/> <c r="H7" s="13"/> <c r="I7" s="13"/> <c r="J7" s="13"/> <c r="K7" s="13"/> </row> <row r="8" spans="1:13"> <c r="A8" s="13"/> <c r="B8" s="13"/> <c r="C8" s="13"/> <c r="D8" s="13"/> <c r="E8" s="13"/> <c r="F8" s="13"/> <c r="G8" s="13"/> <c r="H8" s="13"/> <c r="I8" s="13"/> <c r="J8" s="13"/> <c r="K8" s="13"/> </row> </sheetData> <!-- 这个是记录哪个到哪个是合并单元格的信息的 --> <mergeCells count="1"> <!-- A5到K8被合并了 --> <mergeCell ref="A5:K8"/> </mergeCells> <!-- --> <phoneticPr fontId="1" type="noConversion"/> <!-- 页的相关属性设置 --> <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/> <!-- 分页设置 --> <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/> <!-- 图形信息,由图可以看出,这里引用了draw的文档了--> <drawing/> </worksheet>
1.2.5、sharedStrings.xml文件
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="18" uniqueCount="18"> <!-- si存储的是字符串的值 --> <!-- 此处存储的是sheet.xml中c(列)中属性带有t="s"的列的值, 带有t="s"字符串列的定位是通过c元素下的v元素中的值来定位的 此处对应的就是v的值为0的单元格的实际值 --> <si> <t>常规</t> <phoneticPr fontId="1" type="noConversion"/> </si> <!-- sheet.xml中c元素中属性t="s"的子元素<v>1</v>的单元格的实际值 --> <si> <t>数值</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>货币</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>会计专用</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>短日期</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>长日期</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>时间</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>百分比</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>分数</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>科学计数</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>文本</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>common</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>String</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>$</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>D1</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>常规A</t> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <r> <rPr> <sz val="11"/> <color theme="1"/> <rFont val="宋体"/> <family val="3"/> <charset val="134"/> </rPr><t>常规</t> </r><r> <rPr> <sz val="11"/> <color theme="1"/> <rFont val="Tahoma"/> <family val="2"/> <charset val="134"/> </rPr><t>D</t> </r> <phoneticPr fontId="1" type="noConversion"/> </si> <si> <t>说明: 为了简单,行数就设置的少点 第一行为标题行(本身是常规行),分别标注了每列的属性。 第三行和第二行一样的列格式 第四行为常规行</t> <phoneticPr fontId="1" type="noConversion"/> </si> </sst>
1.2.6、style.xml文件
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <!-- 数字格式化显示的定义 --> <numFmts count="5"> <!-- 此处的numFmtId和cellXfs中的numFmtId对应 --> <!-- 会计专用 --> <!-- 该元素表示cellXfs中xf元素的numFmtId="44"对应的格式化样式 --> <numFmt numFmtId="44" formatCode="_ "¥"* #,##0.00_ ;_ "¥"* \-#,##0.00_ ;_ "¥"* "-"??_ ;_ @_ "/> <!-- 数值格式 --> <numFmt numFmtId="176" formatCode="0.00_);[Red]\(0.00\)"/> <!-- 货币 --> <numFmt numFmtId="177" formatCode=""¥"#,##0.00_);[Red]\("¥"#,##0.00\)"/> <!-- 长日期 --> <numFmt numFmtId="178" formatCode="[$-F800]dddd\,\ mmmm\ dd\,\ yyyy"/> <!-- 时间 --> <numFmt numFmtId="179" formatCode="[$-F400]h:mm:ss\ AM/PM"/> </numFmts> <!-- 字体样式 --> <fonts count="3"> <font> <sz val="11"/> <color theme="1"/> <name val="Tahoma"/> <family val="2"/> <charset val="134"/> </font> <font> <sz val="9"/> <name val="Tahoma"/> <family val="2"/> <charset val="134"/> </font> <font> <sz val="11"/> <color theme="1"/> <name val="宋体"/> <family val="3"/> <charset val="134"/> </font> </fonts> <!-- 填充样式,包括背景色 --> <fills count="2"> <fill> <patternFill patternType="none"/> </fill> <fill> <patternFill patternType="gray125"/> </fill> </fills> <!-- 边框 --> <borders count="1"> <border> <left/> <right/> <top/> <bottom/> <diagonal/> </border> </borders> <!-- 提供给cellXfs来调用 --> <cellStyleXfs count="1"> <!-- 此处的numFmtId和cellXfs中的numFmtId对应 --> <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/> </cellStyleXfs> <!-- 真正的样式定义 --> <cellXfs count="15"> <!-- 此处的numFmtId和cellStyleXfs、numFmts等中的numFmtId对应 --> <!-- sheet.xml中的c(列)元素属性中s的属性值都对应这里一个xf元素,s的属性值存储的就是xf的索引--> <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/> <!-- 该元素表示sheet.xml中c元素中s="1"对应的样式 --> <xf numFmtId="176" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <!-- 该元素表示sheet.xml中c元素中s="2"对应的样式 --> <xf numFmtId="177" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="44" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="178" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="179" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="10" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="12" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="11" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="49" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="13" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/> <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1"/> <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1"> <alignment horizontal="left" vertical="top"/> </xf> <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1"> <alignment horizontal="left" vertical="top" wrapText="1"/> </xf> </cellXfs> <cellStyles count="1"> <cellStyle name="常规" xfId="0" builtinId="0"/> </cellStyles> <dxfs count="0"/> <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/> </styleSheet>
二、使用java读取EXCEL
1. 依赖的jar包
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.15</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.15</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-schemas</artifactId> <version>3.15</version> </dependency> <dependency> <groupId>org.apache.xerces</groupId> <artifactId>com.springsource.org.apache.xerces</artifactId> <version>2.8.1</version> </dependency> <dependency> <groupId>commons-lang</groupId> <artifactId>commons-lang</artifactId> <version>2.6</version> </dependency>
2. IExcelRowReader.java
package com.wms.xml; import java.util.List; public interface IExcelRowReader { void getRows(int sheetIndex, int curRow, List<String> rowlist); }
3. ExcelReaderUtil.java
package com.wms.xml; import org.apache.poi.openxml4j.exceptions.OpenXML4JException; import org.xml.sax.SAXException; import java.io.IOException; public class ExcelReaderUtil { public static void read2007Excel(IExcelRowReader rowReader, String path) throws OpenXML4JException, SAXException, IOException { Excel2007Reader excel2007Reader = new Excel2007Reader(); excel2007Reader.setRowReader(rowReader); excel2007Reader.load(path); } }
4. Excel2007Reader.java
package com.wms.xml; import org.apache.commons.lang.StringUtils; import org.apache.poi.openxml4j.exceptions.OpenXML4JException; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.ss.usermodel.BuiltinFormats; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.xssf.eventusermodel.XSSFReader; import org.apache.poi.xssf.model.SharedStringsTable; import org.apache.poi.xssf.model.StylesTable; import org.apache.poi.xssf.usermodel.XSSFCellStyle; import org.apache.poi.xssf.usermodel.XSSFRichTextString; import org.xml.sax.Attributes; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.Iterator; import java.util.List; public class Excel2007Reader extends DefaultHandler { //列元素 private static final String C_ELEMENT = "c"; //列中属性r private static final String R_ATTR = "r"; //列中的v元素 private static final String V_ELEMENT = "v"; //列中的t元素 private static final String T_ELEMENT = "t"; //列中属性值 private static final String S_ATTR_VALUE = "s"; //列中属性值 private static final String T_ATTR_VALUE = "t"; //sheet r:Id前缀 private static final String RID_PREFIX = "rId"; //行元素 private static final String ROW_ELEMENT = "row"; //时间格式化字符串 private static final String DATE_FORMAT = "yyyy-MM-dd HH:mm:ss"; //saxParser private static final String CLASS_SAXPARSER = "org.apache.xerces.parsers.SAXParser"; //填充字符串 private static final String CELL_FILL_STR = "@"; //列的最大位数 private static final int MAX_CELL_BIT = 3; //excel 2007 的共享字符串表,对应sharedString.xml private SharedStringsTable sharedStringsTable; private final DataFormatter dataFormatter = new DataFormatter(); //当前行 private int curRow; //当前列 private int curCell; //上一次的内容 private String lastContent; //是否是String类型的 private boolean nextIsString; //单元数据类型 private CellDataType cellDataType; //当前列坐标, 如A1,B5 private String curCoordinate; //前一个列的坐标 private String preCoordinate; //行的最大列坐标 private String maxCellCoordinate; //单元格的格式表,对应style.xml private StylesTable stylesTable; //单元格存储格式的索引,对应style.xml中的numFmts元素的子元素索引 private int numFmtIndex; //单元格存储的格式化字符串,nmtFmt的formateCode属性的值 private String numFmtString; //sheet的索引 private int sheetIndex = -1; //存储每行的列元素 List<String> rowCellList = new ArrayList<String>(); //单元格的数据格式 enum CellDataType { BOOL("b"), ERROR("e"), FORMULA("str"), INLINESTR("inlineStr"), SSTINDEX("s"), NUMBER(""), DATE("m/d/yy"), NULL(""); private String name; CellDataType(String name) { this.name = name; } public String getName() { return name; } public void setName(String name) { this.name = name; } } private IExcelRowReader rowReader; public void setRowReader(IExcelRowReader rowReader) { this.rowReader = rowReader; } /** * 读取excel中的制定索引sheet * * @param path * @param sheetIndex * @throws OpenXML4JException * @throws IOException * @throws SAXException */ public void load(String path, int sheetIndex) throws OpenXML4JException, IOException, SAXException { OPCPackage opcPackage = OPCPackage.open(path); XSSFReader xssfReader = new XSSFReader(opcPackage); //获取styleTable stylesTable = xssfReader.getStylesTable(); //获取共享字符串表 SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable(); this.sharedStringsTable = sharedStringsTable; //获取解析器 XMLReader xmlReader = fetchSheetReader(sharedStringsTable); // 根据 rId# 或 rSheet# 查找sheet InputStream sheetInputStream = xssfReader.getSheet(RID_PREFIX + sheetIndex); InputSource sheetInputSource = new InputSource(sheetInputStream); xmlReader.parse(sheetInputSource); //关闭流 closeStream(sheetInputStream); } /** * 遍历所有的sheet * * @param path * @throws IOException * @throws OpenXML4JException * @throws SAXException */ public void load(String path) throws IOException, OpenXML4JException, SAXException { OPCPackage opcPackage = OPCPackage.open(path); XSSFReader xssfReader = new XSSFReader(opcPackage); stylesTable = xssfReader.getStylesTable(); SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable(); this.sharedStringsTable = sharedStringsTable; XMLReader xmlReader = fetchSheetReader(sharedStringsTable); Iterator<InputStream> sheetsInputStream = xssfReader.getSheetsData(); while (sheetsInputStream.hasNext()) { curRow = 0; sheetIndex++; InputStream sheetInputStream = sheetsInputStream.next(); InputSource sheetInputSource = new InputSource(sheetInputStream); xmlReader.parse(sheetInputSource); closeStream(sheetInputStream); } } /** * 获取sheet的解析器 * * @param sharedStringsTable * @return * @throws SAXException */ public XMLReader fetchSheetReader(SharedStringsTable sharedStringsTable) throws SAXException { XMLReader xmlReader = XMLReaderFactory.createXMLReader(CLASS_SAXPARSER); xmlReader.setContentHandler(this); return xmlReader; } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { //c 表示列 if (C_ELEMENT.equals(qName)) { //获取当前列坐标 String tempCurCoordinate = attributes.getValue(R_ATTR); //前一列为null,则将其设置为"@",A为第一列,ascii码为65,前一列即为@,ascii码64 if (preCoordinate == null) { preCoordinate = CELL_FILL_STR; } else { //存在,则前一列要设置为上一列的坐标 preCoordinate = curCoordinate; } //重置当前列 curCoordinate = tempCurCoordinate; //设置单元格类型 setCellType(attributes); //t 属性值 String type = attributes.getValue(T_ATTR_VALUE); //s 表示该列为字符串 if (S_ATTR_VALUE.equals(type)) { nextIsString = true; } } lastContent = ""; } /** * 设置单元格的类型 * * @param attribute */ private void setCellType(Attributes attribute) { //默认是数字类型 cellDataType = CellDataType.NUMBER; //重置numFmtIndex,numFmtString的值 numFmtIndex = 0; numFmtString = ""; //单元格的格式类型 String cellType = attribute.getValue(T_ATTR_VALUE); //获取单元格的xf索引,对应style.xml中cellXfs的子元素xf String xfIndexStr = attribute.getValue(S_ATTR_VALUE); if (CellDataType.BOOL.getName().equals(cellType)) { cellDataType = CellDataType.BOOL; } else if (CellDataType.ERROR.getName().equals(cellType)) { cellDataType = CellDataType.ERROR; } else if (CellDataType.INLINESTR.getName().equals(cellType)) { cellDataType = CellDataType.INLINESTR; } else if (CellDataType.FORMULA.getName().equals(cellType)) { cellDataType = CellDataType.FORMULA; } else if (CellDataType.SSTINDEX.getName().equals(cellType)) { cellDataType = CellDataType.SSTINDEX; } if (xfIndexStr != null) { int xfIndex = Integer.parseInt(xfIndexStr); XSSFCellStyle xssfCellStyle = stylesTable.getStyleAt(xfIndex); numFmtIndex = xssfCellStyle.getDataFormat(); numFmtString = xssfCellStyle.getDataFormatString(); if (CellDataType.DATE.getName().equals(numFmtString)) { cellDataType = CellDataType.DATE; numFmtString = DATE_FORMAT; } if (numFmtString == null) { cellDataType = CellDataType.NULL; numFmtString = BuiltinFormats.getBuiltinFormat(numFmtIndex); } } } @Override public void endElement(String uri, String localName, String qName) throws SAXException { String value = StringUtils.trim(lastContent); if (T_ELEMENT.equals(qName)) { rowCellList.add(curCell++, value); } else if (C_ELEMENT.equals(qName)) { value = getDataValue(value); //补全单元格之间的空格 fillBlackCell(curCoordinate, preCoordinate, false); rowCellList.add(curCell++, value); } else { //如果是row标签,说明已经到了一行的结尾 if (ROW_ELEMENT.equals(qName)) { //最大列坐标以第一行的为准 if (curRow == 0) { maxCellCoordinate = curCoordinate; } //补全一行尾部可能缺失的单元格 if (maxCellCoordinate != null) { fillBlackCell(maxCellCoordinate, curCoordinate, true); } rowReader.getRows(sheetIndex, curRow, rowCellList); //一行结束 //清空rowCellList, rowCellList.clear(); //行数增加 curRow++; //当前列置0 curCell = 0; //置空当前列坐标和前一列坐标 curCoordinate = null; preCoordinate = null; } } } /** * 填充空白单元格 * * @param curCoordinate * @param preCoordinate */ private void fillBlackCell(String curCoordinate, String preCoordinate, boolean isEnd) { if (!curCoordinate.equals(preCoordinate)) { int len = calNullCellCount(curCoordinate, preCoordinate, isEnd); for (int i = 0; i < len; i++) { rowCellList.add(curCell++, ""); } } } /** * 计算当前单元格和前一个单元格之间的空白单元格数量 * 如果是尾部则不减1 * * @param curCoordinate * @param preCoordinate * @return */ private int calNullCellCount(String curCoordinate, String preCoordinate, boolean isEnd) { // excel2007最大行数是1048576,最大列数是16384,最后一列列名是XFD String curCellCoordinate = curCoordinate.replaceAll("\\d+", ""); String preCellCoordinate = preCoordinate.replaceAll("\\d+", ""); curCellCoordinate = fillChar(curCellCoordinate, MAX_CELL_BIT, CELL_FILL_STR); preCellCoordinate = fillChar(preCellCoordinate, MAX_CELL_BIT, CELL_FILL_STR); char[] cur = curCellCoordinate.toCharArray(); char[] pre = preCellCoordinate.toCharArray(); int len = (cur[0] - pre[0]) * 26 * 26 + (cur[1] - pre[1]) * 26 + (cur[2] - pre[2]); if (!isEnd) { len = len - 1; } return len; } /** * 将不足指定位数的字符串补全,高位补上指定字符串 * * @param cellCoordinate * @param maxLen * @param fillChar * @return */ private String fillChar(String cellCoordinate, int maxLen, String fillChar) { int coordinateLen = cellCoordinate.length(); if (coordinateLen < maxLen) { for (int i = 0; i < (maxLen - coordinateLen); i++) { cellCoordinate = fillChar + cellCoordinate; } } return cellCoordinate; } private String getDataValue(String lastContent) { String value = ""; XSSFRichTextString xssfRichTextString = null; switch (cellDataType) { case BOOL: char first = lastContent.charAt(0); value = first == '0' ? "FALSE" : "TRUE"; break; case ERROR: value = "\"ERROR:" + lastContent + '"'; break; case FORMULA: value = '"' + lastContent + '"'; break; case INLINESTR: xssfRichTextString = new XSSFRichTextString(lastContent); value = xssfRichTextString.getString(); xssfRichTextString = null; break; case SSTINDEX: try { int index = Integer.parseInt(lastContent); xssfRichTextString = new XSSFRichTextString(sharedStringsTable.getEntryAt (index)); value = xssfRichTextString.getString(); xssfRichTextString = null; } catch (NumberFormatException e) { value = lastContent; } break; case NUMBER: if (numFmtString != null) { value = dataFormatter.formatRawCellContents(Double.parseDouble(lastContent), numFmtIndex, numFmtString); } else { value = lastContent; } value = value.replace("_", ""); break; case DATE: value = dataFormatter.formatRawCellContents(Double.parseDouble(lastContent), numFmtIndex, numFmtString); break; default: value = ""; break; } return value; } @Override public void characters(char[] ch, int start, int length) throws SAXException { // 得到单元格内容的值 lastContent += new String(ch, start, length); } /** * 关闭流 * * @param inputStream */ public void closeStream(InputStream inputStream) { if (inputStream != null) { try { inputStream.close(); } catch (IOException e) { e.printStackTrace(); } } } public static void main(String[] args) { // System.out.println("m/d/yy" == CellDataType.DATE.getName()); // System.out.println('"' + "ss" + '"'); // char[] letter = "@@A".toCharArray(); // char[] letter_1 = "@@B".toCharArray(); // int res = (letter[0] - letter_1[0]) * 26 * 26 + (letter[1] - letter_1[1]) * 26 + // (letter[2] - letter_1[2]); // System.out.println(res); // System.out.println(Integer.valueOf('@')); IExcelRowReader rowReader = new ExcelRowReader(); try { // ExcelReaderUtil.readExcel(rowReader, // "E://2016-07-04-011940a.xls"); System.out.println("**********************************************"); ExcelReaderUtil.read2007Excel(rowReader, "F:\\studygit\\test.xlsx"); } catch (Exception e) { e.printStackTrace(); } } }