XMLREADER/DOM/SIMPLEXML 解析大文件

DOM和simplexml处理xml非常的灵活方便,它们的内存组织结构与xml文件格式很相近。但是同时它们也有一个缺点,对于大文件处理起来力不从心,太耗内存了。

还好有xmlreader,基于流的解析器,(什么是基于流)。它可以对于xml大文件进行解析,采用一边读取一边解析的方法,而不是一股脑儿都加载到内存去处理。但是它也有缺点,不够灵活方便(这是DOM和simplexml擅长的)。

那些把他们结合起来,不就可以很好的解析大文件了吗? 我写了一个简单的类实现了一点点鸡肋般的功能。

xml文件

<?xml version='1.0' standalone='yes'?>
<movies>
 <movie>
  <title>PHP: Behind the Parser</title>
  <characters>
   <character>
    <name>Ms. Coder</name>
    <actor>Onlivia Actora</actor>
   </character>
   <character>
    <name>Mr. Coder</name>
    <actor>El Act&#211;r</actor>
   </character>
  </characters>
  <plot>
   So, this language. It's like, a programming language. Or is it a
   scripting language? All is revealed in this thrilling horror spoof
   of a documentary.
  </plot>
  <great-lines>
   <line>PHP solves all my web problems</line>
  </great-lines>
  <rating type="thumbs">7</rating>
  <rating type="stars">5</rating>
</movie>
<
movie> <title>PHP: Behind the Parser</title> <characters> <character> <name>Ms. Coder</name> <actor>Onlivia Actora</actor> </character> <character> <name>Mr. Coder</name> <actor>El Act&#211;r</actor> </character> </characters> <plot> So, this language. It's like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary. </plot> <great-lines> <line>PHP solves all my web problems</line> </great-lines> <rating type="thumbs">7</rating> <rating type="stars">5</rating> </movie> </movies>

实现类

class SimpleXmlReader extends XMLReader{

    public function __construct($source, $isfile = false){
        if($isfile){
            $this->open($source);
        }else{
            $this->XML($source);
        }
    }

    public function getElement($nodename, $depth = 0){
        if($this->localName == $nodename && $this->nodeType == self::ELEMENT){
            if(!$depth || ($depth && $depth == $this->depth)){
                $this->next();
            }
        }
        while($this->read()){
            if($this->localName == $nodename && $this->nodeType == self::ELEMENT){
                if(!$depth || ($depth && $depth == $this->depth)){
                    return true;
                }
            }
        }
        return false;
    }

    public function expandNodeToSimpleXml(){
       if($this->nodeType == self::ELEMENT){
           $node = $this->expand();
           $dom = new DomDocument();
           $n = $dom->importNode($node, true);
           $sxe = simplexml_import_dom($n);
           return $sxe;
       }
       return false;
    }
}

 实例代码:

$xmlhl = new SimpleXmlReader('test.xml', true);
while($xmlhl->getElement('movie')){
    $sxe = $xmlhl->expandNodeToSimpleXml();
    foreach($sxe->characters[0] as $character){
        echo "\n name -> "  . $character->name;
        echo "\n actor -> " . $character->actor;
    }
}

 结构:

 name -> Ms. Coder
 actor -> Onlivia Actora
 name -> Mr. Coder
 actor -> El ActÓr
 name -> Ms. Coder
 actor -> Onlivia Actora
 name -> Mr. Coder
 actor -> El ActÓr

 

posted @ 2013-11-20 12:03  luffy_zhong  阅读(540)  评论(0编辑  收藏  举报