PHP读取超大的excel文件数据的方案

场景和痛点

说明

今天因为一个老同学找我,说自己公司的物流业务都是现在用excel处理,按月因为数据量大,一个excel差不多有百万数据,文件有接近100M,打开和搜索就相当的慢

联想到场景:要导入数据,可能excel数据量很大,这里利用常用的一些方法比如phpexcel会常有时间和内存限制问题

下面我们就利用一个利用流处理的类库SpreadsheetReader来做大excel的读取

编写过程

说明

关键具体在代码里注释

代码


<?php
/**
 * Created by PhpStorm.
 * User: qkl
 * Date: 2018/7/11
 * Time: 15:14
 */

set_time_limit(0);   // 设置脚本最大执行时间 为0 永不过期
//ini_set('memory_limit','200M');    // 临时设置最大内存占用

function convert($size)
{
    $unit = array('b', 'kb', 'mb', 'gb', 'tb', 'pb');
    return @round($size / pow(1024, ($i = floor(log($size, 1024)))), 2) . ' ' . $unit[$i];
}

require '../vendor/autoload.php';

$start = memory_get_usage();
echo convert($start) . PHP_EOL;
//$inputFileName = './11111111.xlsx';
$inputFileName = './example1.xlsx';

// If you need to parse XLS files, include php-excel-reader

$startTime = microtime(true);

$Reader = new SpreadsheetReader($inputFileName);

//获取当前文件所有的工作表
$sheets = $Reader->Sheets();
if (!$sheets) {
    die("没有工作表");
}

//改变当前处理的工作表
$Reader->ChangeSheet(0);

//打印当前所在工作表的当前所在行数据
var_dump($Reader->current());

//因为reader类集成了Iter所以可以用迭代方式处理
//这里提醒 如果文件超大,这边的处理速度会过慢,不过不会引发内存性能问题
//$i = 0;
//foreach ($Reader as $Row)
//{
//    if ($i>=3) {
//        break;
//    }
//
//    echo $i . PHP_EOL;
//    print_r($Row);
//
//    $i++;
//}

$endTime = microtime(true);
$memoryUse = memory_get_usage();

echo "内存占用:" . convert($memoryUse) . "; 用时:" . ($endTime - $startTime) . PHP_EOL;

结果

测试说明

上面读取的example1.xlsx文件有100M左右,读写过慢,测试只开了读取当前默认工作表的当前所在行数据
因数据敏感,已做屏蔽

日志记录内存使用率


147.77 kb
array (size=50)
  0 => string 'xxxxxxxxxxxxxx' (length=25)
  1 => string 'xxxxxxxxxxxxxx' (length=15)
  2 => string 'xxxxxxxxxxxxxx' (length=18)
  3 => string 'xxxxxxxxxxxxxx' (length=12)
  4 => string 'xxxxxxxxxxxxxx' (length=12)
  5 => string 'xxxxxxxxxxxxxx' (length=12)
  6 => string 'xxxxxxxxxxxxxx' (length=24)
  7 => string 'xxxxxxxxxxxxxx' (length=12)
  8 => string 'xxxxxxxxxxxxxx' (length=27)
  9 => string 'xxxxxxxxxxxxxx' (length=12)
  10 => string 'xxxxxxxxxxxxxx' (length=15)
  11 => string 'xxxxxxxxxxxxxx' (length=28)
  12 => string 'xxxxxxxxxxxxxx' (length=9)
  13 => string 'xxxxxxxxxxxxxx' (length=12)
  14 => string 'xxxxxxxxxxxxxx' (length=9)
  15 => string 'xxxxxxxxxxxxxx' (length=6)
  16 => string 'xxxxxxxxxxxxxx' (length=9)
  17 => string 'xxxxxxxxxxxxxx' (length=3)
  18 => string 'xxxxxxxxxxxxxx' (length=6)
  19 => string 'xxxxxxxxxxxxxx' (length=3)
  20 => string 'xxxxxxxxxxxxxx' (length=15)
  21 => string 'xxxxxxxxxxxxxx' (length=15)
  22 => string 'xxxxxxxxxxxxxx' (length=19)
  23 => string 'xxxxxxxxxxxxxx' (length=13)
  24 => string 'xxxxxxxxxxxxxx' (length=19)
  25 => string 'xxxxxxxxxxxxxx' (length=12)
  26 => string 'xxxxxxxxxxxxxx' (length=12)
  27 => string 'xxxxxxxxxxxxxx' (length=12)
  28 => string 'xxxxxxxxxxxxxx' (length=6)
  29 => string 'xxxxxxxxxxxxxx' (length=12)
  30 => string 'xxxxxxxxxxxxxx' (length=6)
  31 => string 'xxxxxxxxxxxxxx' (length=15)
  32 => string 'xxxxxxxxxxxxxx' (length=24)
  33 => string 'xxxxxxxxxxxxxx' (length=18)
  34 => string 'xxxxxxxxxxxxxx' (length=18)
  35 => string 'xxxxxxxxxxxxxx' (length=24)
  36 => string 'xxxxxxxxxxxxxx' (length=12)
  37 => string 'xxxxxxxxxxxxxx' (length=18)
  38 => string 'xxxxxxxxxxxxxx' (length=21)
  39 => string 'xxxxxxxxxxxxxx' (length=9)
  40 => string 'xxxxxxxxxxxxxx' (length=9)
  41 => string 'xxxxxxxxxxxxxx' (length=18)
  42 => string 'xxxxxxxxxxxxxx' (length=21)
  43 => string 'xxxxxxxxxxxxxx' (length=15)
  44 => string 'xxxxxxxxxxxxxx' (length=12)
  45 => string 'xxxxxxxxxxxxxx' (length=6)
  46 => string 'xxxxxxxxxxxxxx' (length=12)
  47 => string 'xxxxxxxxxxxxxx' (length=22)
  48 => string 'xxxxxxxxxxxxxx' (length=22)
  49 => string '' (length=0)

内存占用:207.55 kb; 用时:9.5835480690002

原文地址:https://segmentfault.com/a/1190000015601758

posted @ 2018-11-18 22:29  sfornt  阅读(4918)  评论(0编辑  收藏  举报