Excel 2007 底层实现方式

一、EXCEL的底层实现

能力有限,了解的比较浅,有不足之处望指正,首先看下图:

一、 excel2007是使用xml格式来存储的,把一个excel文件后缀改为.zip,打开之后就直接可以看到一个excel文件对应的xml格式的文件了。

1.1、docProps目录

1.1.1、 core.xml文件

<?xml version="1.0" encoding="utf-8"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" 
  xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <!-- 创建人 -->
  <dc:creator>Administrator</dc:creator>
  <!-- 最后修改人 -->
  <cp:lastModifiedBy>Administrator</cp:lastModifiedBy>
  <!-- 创建时间 -->
  <dcterms:created xsi:type="dcterms:W3CDTF">2008-09-11T17:22:52Z</dcterms:created>
  <!-- 修改时间 -->
  <dcterms:modified xsi:type="dcterms:W3CDTF">2017-03-31T15:29:39Z</dcterms:modified>
</cp:coreProperties>
View Code

1.1.2、 app.xml文件

<?xml version="1.0" encoding="utf-8"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" 
  xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
    <!-- 文档类型 -->
    <Application>Microsoft Excel</Application>
    <!-- 是否加密 -->
    <DocSecurity>0</DocSecurity>
    <ScaleCrop>false</ScaleCrop>
    <HeadingPairs>
        <vt:vector size="2" baseType="variant">
            <vt:variant>
                <vt:lpstr>工作表</vt:lpstr>
            </vt:variant>
            <vt:variant>
                <vt:i4>3</vt:i4>
            </vt:variant>
        </vt:vector>
    </HeadingPairs>
    <TitlesOfParts>
        <vt:vector size="3" baseType="lpstr">
            <vt:lpstr>Sheet1</vt:lpstr>
            <vt:lpstr>Sheet2</vt:lpstr>
            <vt:lpstr>Sheet3</vt:lpstr>
        </vt:vector>
    </TitlesOfParts>
    <Company/>
    <LinksUpToDate>false</LinksUpToDate>
    <!-- 是否共享 -->
    <SharedDoc>false</SharedDoc>
    <HyperlinksChanged>false</HyperlinksChanged>
    <!-- 版本 -->
    <AppVersion>12.0000</AppVersion>
</Properties>
View Code

 1.2、 xl目录,该目录保存的就是excel中的详细内容了

1.2.1、workbook.xml文件

  文件包含一对<sheets>标签,其中的每个<sheet>元素都代表Excel 2007文件中的一个sheet,工作表的名称就是其name属性的值,这里有三个sheet

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <fileVersion appName="xl" lastEdited="4" lowestEdited="4" rupBuild="4506"/>
    <workbookPr defaultThemeVersion="124226"/>
    <bookViews>
        <workbookView xWindow="0" yWindow="105" windowWidth="12765" windowHeight="5715"/>
    </bookViews>
    <sheets>
        <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
        <sheet name="Sheet2" sheetId="2" r:id="rId2"/>
        <sheet name="Sheet3" sheetId="3" r:id="rId3"/>
    </sheets>
    <calcPr calcId="125725"/>
</workbook>
View Code

1.2.2、 _rels/workbook.xml.rels

  定义每个sheetid对应的sheet内容文件sheet1.xml,共享的单元格内容文件sharedstring.xml,样式文件style.xml是当前单元格的样式字体,颜色等样式的xml配置。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet3.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet2.xml"/>
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
    <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
    <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
    <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
</Relationships>
View Code

1.2.3、 worksheets文件夹,保存的就是excel中的sheet

1.2.4、sheet.xml文件

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <!-- 
        容积,即该sheet中以供占用了多大的地方,哪个单元格到哪个单元格,矩形的左上角单元格和右下角单元格
        此属性可以判断列数,如下就是:(M-A+1)
    -->
    <dimension ref="A1:M8"/>
    <!-- sheet视图的配置信息,如打开时默认选中的单元格是哪个 -->
    <sheetViews>
        <sheetView tabSelected="1" workbookViewId="0">
            <selection activeCell="G12" sqref="G12"/>
        </sheetView>
    </sheetViews>
    <!-- sheet的格式参数 -->
    <sheetFormatPr defaultRowHeight="14.25"/>
    <!-- 行中每列的配置参数 -->
    <cols>
        <col min="1" max="1" width="7.875" customWidth="1"/>
        <col min="2" max="2" width="6.75" customWidth="1"/>
        <col min="3" max="3" width="6" customWidth="1"/>
        <col min="4" max="4" width="4.5" customWidth="1"/>
        <col min="5" max="5" width="8.5" customWidth="1"/>
        <col min="6" max="6" width="9.375" customWidth="1"/>
        <col min="7" max="7" width="13.5" customWidth="1"/>
        <col min="8" max="8" width="7" customWidth="1"/>
        <col min="9" max="9" width="7.625" customWidth="1"/>
        <col min="10" max="10" width="4.625" customWidth="1"/>
        <col min="11" max="11" width="9.5" bestFit="1" customWidth="1"/>
    </cols>
    <!-- 数据区域,这个里面就是我们的单元格的相关数据信息了,是以行(Row)的方式记录的 -->
    <sheetData>
        <!-- excel中的行 -->
        <row r="1" spans="1:13">
            <!-- 
                excel中的列
                    r:单元格坐标
                    s:单元格的style,对应style.xml中cellXfs元素下的某个元素xf元素,
                      此处就是对应索引为12的元素,从0开始计算,即为第13个xf元素。
                    t:元素类型,s表示是字符串类型,对应到sharedStrings.xml中的sst元素
            -->
            <c r="A1" s="12" t="s">
                <!-- 
                    如果是t="s" 则存储的是sharedStrings.xml中sst元素下的某个si元素,此处对应索引为0的元素,即第一个si元素
                    否则,这里存储的就是单元格的值 
                -->
                <v>0</v>
            </c><c r="B1" s="12" t="s">
                <v>10</v>
            </c><c r="C1" s="12" t="s">
                <v>1</v>
            </c><c r="D1" s="12" t="s">
                <v>2</v>
            </c><c r="E1" s="12" t="s">
                <v>3</v>
            </c><c r="F1" s="12" t="s">
                <v>4</v>
            </c><c r="G1" s="12" t="s">
                <v>5</v>
            </c><c r="H1" s="12" t="s">
                <v>6</v>
            </c><c r="I1" s="12" t="s">
                <v>7</v>
            </c><c r="J1" s="12" t="s">
                <v>8</v>
            </c><c r="K1" s="12" t="s">
                <v>9</v>
            </c>
        </row>
        <row r="2" spans="1:13">
            <c r="A2" t="s">
                <v>11</v>
            </c><c r="B2" s="10" t="s">
                <v>12</v>
            </c><c r="C2" s="1">
                <v>2</v>
            </c><c r="D2" s="2" t="s">
                <v>13</v>
            </c><c r="E2" s="3" t="s">
                <v>14</v>
            </c><c r="F2" s="4">
                <v>89</v>
            </c><c r="G2" s="5">
                <v>10</v>
            </c><c r="H2" s="6">
                <v>12</v>
            </c><c r="I2" s="7">
                <v>0.2</v>
            </c><c r="J2" s="8">
                <v>0.5</v>
            </c><c r="K2" s="9">
                <v>4564645</v>
            </c><c r="L2" s="10"/>
            <c r="M2" s="11"/>
        </row>
        <row r="3" spans="1:13">
            <!-- 
                此处为什么没有A3?
                因为excel中常规列如果没有值,此处就不记录,因此在读取excel是应该特别注意,否则可能导致列错乱
            -->
            <c r="B3" s="10"/>
            <c r="C3" s="1"/>
            <c r="D3" s="2"/>
            <c r="E3" s="3"/>
            <c r="F3" s="4"/>
            <c r="G3" s="5"/>
            <c r="H3" s="6"/>
            <c r="I3" s="7"/>
            <c r="J3" s="8"/>
            <c r="K3" s="9"/>
        </row>
        
        <!-- 该行每列都是常规列,因此没有值的就不记录在这里 -->
        <row r="4" spans="1:13">
            <c r="A4" s="12" t="s">
                <v>15</v>
            </c><c r="E4" t="s">
                <v>16</v>
            </c>
        </row>
        <!-- 5,6,7 行都是合并再一起的单元格 -->
        <row r="5" spans="1:13">
            <!-- 合并单元格的值记录在起始单元格里,其余单元格置空 -->
            <c r="A5" s="14" t="s">
                <v>17</v>
            </c><c r="B5" s="13"/>
            <c r="C5" s="13"/>
            <c r="D5" s="13"/>
            <c r="E5" s="13"/>
            <c r="F5" s="13"/>
            <c r="G5" s="13"/>
            <c r="H5" s="13"/>
            <c r="I5" s="13"/>
            <c r="J5" s="13"/>
            <c r="K5" s="13"/>
        </row>
        <row r="6" spans="1:13">
            <c r="A6" s="13"/>
            <c r="B6" s="13"/>
            <c r="C6" s="13"/>
            <c r="D6" s="13"/>
            <c r="E6" s="13"/>
            <c r="F6" s="13"/>
            <c r="G6" s="13"/>
            <c r="H6" s="13"/>
            <c r="I6" s="13"/>
            <c r="J6" s="13"/>
            <c r="K6" s="13"/>
        </row>
        <row r="7" spans="1:13">
            <c r="A7" s="13"/>
            <c r="B7" s="13"/>
            <c r="C7" s="13"/>
            <c r="D7" s="13"/>
            <c r="E7" s="13"/>
            <c r="F7" s="13"/>
            <c r="G7" s="13"/>
            <c r="H7" s="13"/>
            <c r="I7" s="13"/>
            <c r="J7" s="13"/>
            <c r="K7" s="13"/>
        </row>
        <row r="8" spans="1:13">
            <c r="A8" s="13"/>
            <c r="B8" s="13"/>
            <c r="C8" s="13"/>
            <c r="D8" s="13"/>
            <c r="E8" s="13"/>
            <c r="F8" s="13"/>
            <c r="G8" s="13"/>
            <c r="H8" s="13"/>
            <c r="I8" s="13"/>
            <c r="J8" s="13"/>
            <c r="K8" s="13"/>
        </row>
    </sheetData>
    <!-- 这个是记录哪个到哪个是合并单元格的信息的 -->
    <mergeCells count="1">
        <!-- A5到K8被合并了 -->
        <mergeCell ref="A5:K8"/>
    </mergeCells>
    <!-- -->
    <phoneticPr fontId="1" type="noConversion"/>
    <!-- 页的相关属性设置 -->
    <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
    <!-- 分页设置 -->
    <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
    <!-- 图形信息,由图可以看出,这里引用了draw的文档了-->
    <drawing/>
</worksheet>
View Code

1.2.5、sharedStrings.xml文件

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="18" uniqueCount="18">
    <!-- si存储的是字符串的值 -->
    <!-- 此处存储的是sheet.xml中c(列)中属性带有t="s"的列的值,
        带有t="s"字符串列的定位是通过c元素下的v元素中的值来定位的
        此处对应的就是v的值为0的单元格的实际值
    -->
    <si>
        <t>常规</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <!-- sheet.xml中c元素中属性t="s"的子元素<v>1</v>的单元格的实际值 -->
    <si>
        <t>数值</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>货币</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>会计专用</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>短日期</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>长日期</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>时间</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>百分比</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>分数</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>科学计数</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>文本</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>common</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>String</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>$</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>D1</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>常规A</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <r>
            <rPr>
                <sz val="11"/>
                <color theme="1"/>
                <rFont val="宋体"/>
                <family val="3"/>
                <charset val="134"/>
            </rPr><t>常规</t>
        </r><r>
            <rPr>
                <sz val="11"/>
                <color theme="1"/>
                <rFont val="Tahoma"/>
                <family val="2"/>
                <charset val="134"/>
            </rPr><t>D</t>
        </r>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
    <si>
        <t>说明: 为了简单,行数就设置的少点  
   第一行为标题行(本身是常规行),分别标注了每列的属性。
   第三行和第二行一样的列格式
   第四行为常规行</t>
        <phoneticPr fontId="1" type="noConversion"/>
    </si>
</sst>
View Code

1.2.6、style.xml文件

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <!-- 数字格式化显示的定义 -->
    <numFmts count="5">
        <!-- 此处的numFmtId和cellXfs中的numFmtId对应 -->
        <!-- 会计专用 -->
        <!-- 该元素表示cellXfs中xf元素的numFmtId="44"对应的格式化样式 -->
        <numFmt numFmtId="44" formatCode="_ &quot;¥&quot;* #,##0.00_ ;_ &quot;¥&quot;* \-#,##0.00_ ;_ &quot;¥&quot;* &quot;-&quot;??_ ;_ @_ "/>
        <!-- 数值格式 -->
        <numFmt numFmtId="176" formatCode="0.00_);[Red]\(0.00\)"/>
        <!-- 货币 -->
        <numFmt numFmtId="177" formatCode="&quot;¥&quot;#,##0.00_);[Red]\(&quot;¥&quot;#,##0.00\)"/>
        <!-- 长日期 -->
        <numFmt numFmtId="178" formatCode="[$-F800]dddd\,\ mmmm\ dd\,\ yyyy"/>
        <!-- 时间 -->
        <numFmt numFmtId="179" formatCode="[$-F400]h:mm:ss\ AM/PM"/>
    </numFmts>
    <!-- 字体样式 -->
    <fonts count="3">
        <font>
            <sz val="11"/>
            <color theme="1"/>
            <name val="Tahoma"/>
            <family val="2"/>
            <charset val="134"/>
        </font>
        <font>
            <sz val="9"/>
            <name val="Tahoma"/>
            <family val="2"/>
            <charset val="134"/>
        </font>
        <font>
            <sz val="11"/>
            <color theme="1"/>
            <name val="宋体"/>
            <family val="3"/>
            <charset val="134"/>
        </font>
    </fonts>
    <!-- 填充样式,包括背景色 -->
    <fills count="2">
        <fill>
            <patternFill patternType="none"/>
        </fill>
        <fill>
            <patternFill patternType="gray125"/>
        </fill>
    </fills>
    <!-- 边框 -->
    <borders count="1">
        <border>
            <left/>
            <right/>
            <top/>
            <bottom/>
            <diagonal/>
        </border>
    </borders>
    <!-- 提供给cellXfs来调用 -->
    <cellStyleXfs count="1">
        <!-- 此处的numFmtId和cellXfs中的numFmtId对应 -->
        <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
    </cellStyleXfs>
    <!-- 真正的样式定义 -->
    <cellXfs count="15">
        <!-- 此处的numFmtId和cellStyleXfs、numFmts等中的numFmtId对应 -->
        <!-- sheet.xml中的c(列)元素属性中s的属性值都对应这里一个xf元素,s的属性值存储的就是xf的索引-->
        <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
        <!-- 该元素表示sheet.xml中c元素中s="1"对应的样式 -->
        <xf numFmtId="176" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <!-- 该元素表示sheet.xml中c元素中s="2"对应的样式 -->
        <xf numFmtId="177" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="44" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="178" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="179" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="10" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="12" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="11" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="49" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="13" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1"/>
        <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1">
            <alignment horizontal="left" vertical="top"/>
        </xf>
        <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1">
            <alignment horizontal="left" vertical="top" wrapText="1"/>
        </xf>
    </cellXfs>
    <cellStyles count="1">
        <cellStyle name="常规" xfId="0" builtinId="0"/>
    </cellStyles>
    <dxfs count="0"/>
    <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
</styleSheet>
View Code

二、使用java读取EXCEL

 1. 依赖的jar包

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi</artifactId>
  <version>3.15</version>
</dependency>
<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi-ooxml</artifactId>
  <version>3.15</version>
</dependency>
<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi-ooxml-schemas</artifactId>
  <version>3.15</version>
</dependency>
<dependency>
  <groupId>org.apache.xerces</groupId>
  <artifactId>com.springsource.org.apache.xerces</artifactId>
  <version>2.8.1</version>
</dependency>
<dependency>
  <groupId>commons-lang</groupId>
  <artifactId>commons-lang</artifactId>
  <version>2.6</version>
</dependency>
View Code

 

2.  IExcelRowReader.java

package com.wms.xml;

import java.util.List;

public interface IExcelRowReader {
    void getRows(int sheetIndex, int curRow, List<String> rowlist);
}
View Code

 

3. ExcelReaderUtil.java

package com.wms.xml;

import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.xml.sax.SAXException;

import java.io.IOException;

public class ExcelReaderUtil {
    public static void read2007Excel(IExcelRowReader rowReader, String path) throws
            OpenXML4JException, SAXException, IOException {
        Excel2007Reader excel2007Reader = new Excel2007Reader();
        excel2007Reader.setRowReader(rowReader);
        excel2007Reader.load(path);
    }

}
View Code

 

4. Excel2007Reader.java

package com.wms.xml;

import org.apache.commons.lang.StringUtils;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.BuiltinFormats;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

public class Excel2007Reader extends DefaultHandler {

    //列元素
    private static final String C_ELEMENT = "c";
    //列中属性r
    private static final String R_ATTR = "r";
    //列中的v元素
    private static final String V_ELEMENT = "v";
    //列中的t元素
    private static final String T_ELEMENT = "t";
    //列中属性值
    private static final String S_ATTR_VALUE = "s";
    //列中属性值
    private static final String T_ATTR_VALUE = "t";

    //sheet r:Id前缀
    private static final String RID_PREFIX = "rId";

    //行元素
    private static final String ROW_ELEMENT = "row";

    //时间格式化字符串
    private static final String DATE_FORMAT = "yyyy-MM-dd HH:mm:ss";

    //saxParser
    private static final String CLASS_SAXPARSER = "org.apache.xerces.parsers.SAXParser";

    //填充字符串
    private static final String CELL_FILL_STR = "@";
    //列的最大位数
    private static final int MAX_CELL_BIT = 3;


    //excel 2007 的共享字符串表,对应sharedString.xml
    private SharedStringsTable sharedStringsTable;

    private final DataFormatter dataFormatter = new DataFormatter();

    //当前行
    private int curRow;

    //当前列
    private int curCell;

    //上一次的内容
    private String lastContent;

    //是否是String类型的
    private boolean nextIsString;

    //单元数据类型
    private CellDataType cellDataType;

    //当前列坐标, 如A1,B5
    private String curCoordinate;

    //前一个列的坐标
    private String preCoordinate;

    //行的最大列坐标
    private String maxCellCoordinate;

    //单元格的格式表,对应style.xml
    private StylesTable stylesTable;

    //单元格存储格式的索引,对应style.xml中的numFmts元素的子元素索引
    private int numFmtIndex;
    //单元格存储的格式化字符串,nmtFmt的formateCode属性的值
    private String numFmtString;

    //sheet的索引
    private int sheetIndex = -1;

    //存储每行的列元素
    List<String> rowCellList = new ArrayList<String>();

    //单元格的数据格式
    enum CellDataType {
        BOOL("b"),
        ERROR("e"),
        FORMULA("str"),
        INLINESTR("inlineStr"),
        SSTINDEX("s"),
        NUMBER(""),
        DATE("m/d/yy"),
        NULL("");

        private String name;

        CellDataType(String name) {
            this.name = name;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }
    }

    private IExcelRowReader rowReader;

    public void setRowReader(IExcelRowReader rowReader) {
        this.rowReader = rowReader;
    }

    /**
     * 读取excel中的制定索引sheet
     *
     * @param path
     * @param sheetIndex
     * @throws OpenXML4JException
     * @throws IOException
     * @throws SAXException
     */
    public void load(String path, int sheetIndex) throws OpenXML4JException, IOException,
            SAXException {
        OPCPackage opcPackage = OPCPackage.open(path);
        XSSFReader xssfReader = new XSSFReader(opcPackage);
        //获取styleTable
        stylesTable = xssfReader.getStylesTable();
        //获取共享字符串表
        SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
        this.sharedStringsTable = sharedStringsTable;
        //获取解析器
        XMLReader xmlReader = fetchSheetReader(sharedStringsTable);
        // 根据 rId# 或 rSheet# 查找sheet
        InputStream sheetInputStream = xssfReader.getSheet(RID_PREFIX + sheetIndex);
        InputSource sheetInputSource = new InputSource(sheetInputStream);
        xmlReader.parse(sheetInputSource);
        //关闭流
        closeStream(sheetInputStream);
    }

    /**
     * 遍历所有的sheet
     *
     * @param path
     * @throws IOException
     * @throws OpenXML4JException
     * @throws SAXException
     */
    public void load(String path) throws IOException, OpenXML4JException, SAXException {
        OPCPackage opcPackage = OPCPackage.open(path);
        XSSFReader xssfReader = new XSSFReader(opcPackage);
        stylesTable = xssfReader.getStylesTable();
        SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
        this.sharedStringsTable = sharedStringsTable;
        XMLReader xmlReader = fetchSheetReader(sharedStringsTable);
        Iterator<InputStream> sheetsInputStream = xssfReader.getSheetsData();
        while (sheetsInputStream.hasNext()) {
            curRow = 0;
            sheetIndex++;
            InputStream sheetInputStream = sheetsInputStream.next();
            InputSource sheetInputSource = new InputSource(sheetInputStream);
            xmlReader.parse(sheetInputSource);
            closeStream(sheetInputStream);
        }

    }

    /**
     * 获取sheet的解析器
     *
     * @param sharedStringsTable
     * @return
     * @throws SAXException
     */
    public XMLReader fetchSheetReader(SharedStringsTable sharedStringsTable) throws SAXException {
        XMLReader xmlReader = XMLReaderFactory.createXMLReader(CLASS_SAXPARSER);
        xmlReader.setContentHandler(this);
        return xmlReader;
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes)
            throws SAXException {
        //c 表示列
        if (C_ELEMENT.equals(qName)) {

            //获取当前列坐标
            String tempCurCoordinate = attributes.getValue(R_ATTR);
            //前一列为null,则将其设置为"@",A为第一列,ascii码为65,前一列即为@,ascii码64
            if (preCoordinate == null) {
                preCoordinate = CELL_FILL_STR;
            } else {
                //存在,则前一列要设置为上一列的坐标
                preCoordinate = curCoordinate;
            }
            //重置当前列
            curCoordinate = tempCurCoordinate;

            //设置单元格类型
            setCellType(attributes);
            //t 属性值
            String type = attributes.getValue(T_ATTR_VALUE);
            //s 表示该列为字符串
            if (S_ATTR_VALUE.equals(type)) {
                nextIsString = true;
            }
        }

        lastContent = "";
    }

    /**
     * 设置单元格的类型
     *
     * @param attribute
     */
    private void setCellType(Attributes attribute) {
        //默认是数字类型
        cellDataType = CellDataType.NUMBER;
        //重置numFmtIndex,numFmtString的值
        numFmtIndex = 0;
        numFmtString = "";

        //单元格的格式类型
        String cellType = attribute.getValue(T_ATTR_VALUE);
        //获取单元格的xf索引,对应style.xml中cellXfs的子元素xf
        String xfIndexStr = attribute.getValue(S_ATTR_VALUE);

        if (CellDataType.BOOL.getName().equals(cellType)) {
            cellDataType = CellDataType.BOOL;
        } else if (CellDataType.ERROR.getName().equals(cellType)) {
            cellDataType = CellDataType.ERROR;
        } else if (CellDataType.INLINESTR.getName().equals(cellType)) {
            cellDataType = CellDataType.INLINESTR;
        } else if (CellDataType.FORMULA.getName().equals(cellType)) {
            cellDataType = CellDataType.FORMULA;
        } else if (CellDataType.SSTINDEX.getName().equals(cellType)) {
            cellDataType = CellDataType.SSTINDEX;
        }

        if (xfIndexStr != null) {
            int xfIndex = Integer.parseInt(xfIndexStr);
            XSSFCellStyle xssfCellStyle = stylesTable.getStyleAt(xfIndex);
            numFmtIndex = xssfCellStyle.getDataFormat();
            numFmtString = xssfCellStyle.getDataFormatString();

            if (CellDataType.DATE.getName().equals(numFmtString)) {
                cellDataType = CellDataType.DATE;
                numFmtString = DATE_FORMAT;
            }

            if (numFmtString == null) {
                cellDataType = CellDataType.NULL;
                numFmtString = BuiltinFormats.getBuiltinFormat(numFmtIndex);
            }
        }

    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        String value = StringUtils.trim(lastContent);
        if (T_ELEMENT.equals(qName)) {
            rowCellList.add(curCell++, value);
        } else if (C_ELEMENT.equals(qName)) {
            value = getDataValue(value);
            //补全单元格之间的空格
            fillBlackCell(curCoordinate, preCoordinate, false);
            rowCellList.add(curCell++, value);
        } else {
            //如果是row标签,说明已经到了一行的结尾
            if (ROW_ELEMENT.equals(qName)) {
                //最大列坐标以第一行的为准
                if (curRow == 0) {
                    maxCellCoordinate = curCoordinate;
                }

                //补全一行尾部可能缺失的单元格
                if (maxCellCoordinate != null) {
                    fillBlackCell(maxCellCoordinate, curCoordinate, true);
                }

                rowReader.getRows(sheetIndex, curRow, rowCellList);

                //一行结束
                //清空rowCellList,
                rowCellList.clear();
                //行数增加
                curRow++;
                //当前列置0
                curCell = 0;
                //置空当前列坐标和前一列坐标
                curCoordinate = null;
                preCoordinate = null;
            }
        }
    }

    /**
     * 填充空白单元格
     *
     * @param curCoordinate
     * @param preCoordinate
     */
    private void fillBlackCell(String curCoordinate, String preCoordinate, boolean isEnd) {
        if (!curCoordinate.equals(preCoordinate)) {
            int len = calNullCellCount(curCoordinate, preCoordinate, isEnd);
            for (int i = 0; i < len; i++) {
                rowCellList.add(curCell++, "");
            }
        }
    }

    /**
     * 计算当前单元格和前一个单元格之间的空白单元格数量
     * 如果是尾部则不减1
     *
     * @param curCoordinate
     * @param preCoordinate
     * @return
     */
    private int calNullCellCount(String curCoordinate, String preCoordinate, boolean isEnd) {
        // excel2007最大行数是1048576,最大列数是16384,最后一列列名是XFD
        String curCellCoordinate = curCoordinate.replaceAll("\\d+", "");
        String preCellCoordinate = preCoordinate.replaceAll("\\d+", "");

        curCellCoordinate = fillChar(curCellCoordinate, MAX_CELL_BIT, CELL_FILL_STR);
        preCellCoordinate = fillChar(preCellCoordinate, MAX_CELL_BIT, CELL_FILL_STR);

        char[] cur = curCellCoordinate.toCharArray();
        char[] pre = preCellCoordinate.toCharArray();

        int len = (cur[0] - pre[0]) * 26 * 26 + (cur[1] - pre[1]) * 26 + (cur[2] - pre[2]);
        if (!isEnd) {
            len = len - 1;
        }
        return len;
    }

    /**
     * 将不足指定位数的字符串补全,高位补上指定字符串
     *
     * @param cellCoordinate
     * @param maxLen
     * @param fillChar
     * @return
     */
    private String fillChar(String cellCoordinate, int maxLen, String fillChar) {
        int coordinateLen = cellCoordinate.length();
        if (coordinateLen < maxLen) {
            for (int i = 0; i < (maxLen - coordinateLen); i++) {
                cellCoordinate = fillChar + cellCoordinate;
            }
        }
        return cellCoordinate;
    }

    private String getDataValue(String lastContent) {
        String value = "";
        XSSFRichTextString xssfRichTextString = null;
        switch (cellDataType) {
            case BOOL:
                char first = lastContent.charAt(0);
                value = first == '0' ? "FALSE" : "TRUE";
                break;
            case ERROR:
                value = "\"ERROR:" + lastContent + '"';
                break;
            case FORMULA:
                value = '"' + lastContent + '"';
                break;
            case INLINESTR:
                xssfRichTextString = new XSSFRichTextString(lastContent);
                value = xssfRichTextString.getString();
                xssfRichTextString = null;
                break;
            case SSTINDEX:
                try {
                    int index = Integer.parseInt(lastContent);
                    xssfRichTextString = new XSSFRichTextString(sharedStringsTable.getEntryAt
                            (index));
                    value = xssfRichTextString.getString();
                    xssfRichTextString = null;
                } catch (NumberFormatException e) {
                    value = lastContent;
                }
                break;
            case NUMBER:
                if (numFmtString != null) {
                    value = dataFormatter.formatRawCellContents(Double.parseDouble(lastContent),
                            numFmtIndex, numFmtString);
                } else {
                    value = lastContent;
                }
                value = value.replace("_", "");
                break;
            case DATE:
                value = dataFormatter.formatRawCellContents(Double.parseDouble(lastContent),
                        numFmtIndex, numFmtString);
                break;
            default:
                value = "";
                break;
        }

        return value;
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        // 得到单元格内容的值
        lastContent += new String(ch, start, length);
    }

    /**
     * 关闭流
     *
     * @param inputStream
     */
    public void closeStream(InputStream inputStream) {
        if (inputStream != null) {
            try {
                inputStream.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) {
//        System.out.println("m/d/yy" == CellDataType.DATE.getName());
//        System.out.println('"' + "ss" + '"');
//        char[] letter = "@@A".toCharArray();
//        char[] letter_1 = "@@B".toCharArray();
//        int res = (letter[0] - letter_1[0]) * 26 * 26 + (letter[1] - letter_1[1]) * 26 +
//                (letter[2] - letter_1[2]);
//        System.out.println(res);
//        System.out.println(Integer.valueOf('@'));
        IExcelRowReader rowReader = new ExcelRowReader();
        try {
            // ExcelReaderUtil.readExcel(rowReader,
            // "E://2016-07-04-011940a.xls");
            System.out.println("**********************************************");
            ExcelReaderUtil.read2007Excel(rowReader, "F:\\studygit\\test.xlsx");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


}
View Code

 

posted @ 2017-03-31 23:49  请叫我大表哥  阅读(5342)  评论(5编辑  收藏  举报