企业搜索引擎开发之连接器connector（八）

接下来继续分析TimedCancelable类及QueryTraverser类

TimedCancelable类的源码如下：

/**
 * A {@link TimedCancelable} for running a {@link Connector} batch using
 * a {@link Traverser}
 */
class CancelableBatch implements TimedCancelable {
  private static final Logger LOGGER =
    Logger.getLogger(CancelableBatch.class.getName());

  final Traverser traverser;
  final String traverserName;
  final BatchResultRecorder batchResultRecorder;
  final BatchTimeout batchTimeout;
  final BatchSize batchSize;

  /**
   * Construct a {@link CancelableBatch}.
   *
   * @param traverser {@link Traverser} for running the batch.
   * @param traverserName traverser name for logging purposes.
   * @param batchResultRecorder {@link BatchResultRecorder} for recording
   *        the result of running the batch.
   * @param batchSize hint and constraints as to the number of documents
   *        to process in the batch.
   */
  public CancelableBatch(Traverser traverser, String traverserName,
      BatchResultRecorder batchResultRecorder, BatchTimeout batchTimeout,
      BatchSize batchSize) {
    this.traverser = traverser;
    this.traverserName = traverserName;
    this.batchResultRecorder = batchResultRecorder;
    this.batchSize = batchSize;
    this.batchTimeout = batchTimeout;
  }

  public void cancel() {
   traverser.cancelBatch();
  }

  public void timeout(TaskHandle taskHandle) {
     batchTimeout.timeout();
  }

  public void run() {
    NDC.push("Traverse " + traverserName);
    try {
      LOGGER.fine("Begin runBatch; traverserName = " + traverserName
          + "  " + batchSize);
      BatchResult batchResult = traverser.runBatch(batchSize);
      LOGGER.fine("Traverser " + traverserName + " batchDone with result = "
          + batchResult);
      batchResultRecorder.recordResult(batchResult);
    } finally {
      NDC.remove();
    }
  }

  @Override
  public String toString() {
    return "CancelableBatch traverser: " + traverser + "  " + batchSize;
  }
}

该类通过提供操作连接器的取消相关方法外，主要就是线程执行的run方法了（cancel方法主要由线程执行句柄TaskHandle类来调用）

在线程执行方法里面调用traverser.runBatch(batchSize)方法（traverser为Traverser接口实现类QueryTraverser对象）

执行完毕后由BatchResultRecorder类对象记录执行结果的信息

继续来分析QueryTraverser类的源码：

/**
 * Traverser for a repository implemented using a TraversalManager
 */
public class QueryTraverser implements Traverser {
  private static final Logger LOGGER =
      Logger.getLogger(QueryTraverser.class.getName());

  private final PusherFactory pusherFactory;
  private final TraversalManager queryTraversalManager;
  private final TraversalStateStore stateStore;
  private final String connectorName;
  private final TraversalContext traversalContext;

  // Synchronize access to cancelWork.
  private final Object cancelLock = new Object();
  private boolean cancelWork = false;

  public QueryTraverser(PusherFactory pusherFactory,
      TraversalManager traversalManager, TraversalStateStore stateStore,
      String connectorName, TraversalContext traversalContext) {
    this.pusherFactory = pusherFactory;
    this.queryTraversalManager = traversalManager;
    this.stateStore = stateStore;
    this.connectorName = connectorName;
    this.traversalContext = traversalContext;
    if (queryTraversalManager instanceof TraversalContextAware) {
      TraversalContextAware contextAware =
          (TraversalContextAware)queryTraversalManager;
      try {
        contextAware.setTraversalContext(traversalContext);
      } catch (Exception e) {
        LOGGER.log(Level.WARNING, "Unable to set TraversalContext", e);
      }
    }
  }

  //@Override
  public void cancelBatch() {
    synchronized(cancelLock) {
      cancelWork = true;
    }
    LOGGER.fine("Cancelling traversal for connector " + connectorName);
  }

  public boolean isCancelled() {
    synchronized(cancelLock) {
      return cancelWork;
    }
  }

  //@Override
  public BatchResult runBatch(BatchSize batchSize) {
    final long startTime = System.currentTimeMillis();
    final long timeoutTime = startTime
      + traversalContext.traversalTimeLimitSeconds() * 1000;

    if (isCancelled()) {
        LOGGER.warning("Attempting to run a cancelled QueryTraverser");
      return new BatchResult(TraversalDelayPolicy.ERROR);
    }
    try {
      queryTraversalManager.setBatchHint(batchSize.getHint());
    } catch (RepositoryException e) {
      LOGGER.log(Level.WARNING, "Unable to set batch hint", e);
    }

    String connectorState;
    try {
      if (stateStore != null) {
        connectorState = stateStore.getTraversalState();
      } else {
        throw new IllegalStateException("null TraversalStateStore");
      }
    } catch (IllegalStateException ise) {
      // We get here if the ConnectorStateStore for connector is disabled.
      // That happens if the connector was deleted while we were asleep.
      // Our connector seems to have been deleted.  Don't process a batch.
      LOGGER.finer("Halting traversal..." + ise.getMessage());
      return new BatchResult(TraversalDelayPolicy.ERROR);
    }

    DocumentList resultSet = null;
    if (connectorState == null) {
      try {
        LOGGER.finer("Starting traversal...");
        resultSet = queryTraversalManager.startTraversal();
      } catch (Exception e) {
        LOGGER.log(Level.WARNING, "startTraversal threw exception: ", e);
        return new BatchResult(TraversalDelayPolicy.ERROR);
      }
    } else {
      try {
        LOGGER.finer("Resuming traversal...");
        resultSet = queryTraversalManager.resumeTraversal(connectorState);
      } catch (Exception e) {
        LOGGER.log(Level.WARNING, "resumeTraversal threw exception: ", e);
        return new BatchResult(TraversalDelayPolicy.ERROR);
      }
    }

    // If the traversal returns null, that means that the repository has
    // no new content to traverse.
    if (resultSet == null) {
      LOGGER.finer("Result set is NULL, no documents returned for traversal.");
      return new BatchResult(TraversalDelayPolicy.POLL, 0);
    }

    Pusher pusher = null;
    BatchResult result = null;
    int counter = 0;
    try {
      // Get a Pusher for feeding the returned Documents.
      pusher = pusherFactory.newPusher(connectorName);

      while (counter < batchSize.getMaximum()) {
        if (Thread.currentThread().isInterrupted() || isCancelled()) {
          LOGGER.fine("Traversal for connector " + connectorName
                      + " has been interrupted...breaking out of batch run.");
          break;
        }
        if (System.currentTimeMillis() >= timeoutTime) {
          LOGGER.fine("Traversal for connector " + connectorName
              + " is completing due to time limit.");
          break;
        }

        Document nextDocument = null;
        String docid = null;
        try {
          LOGGER.finer("Pulling next document from connector " + connectorName);
          nextDocument = resultSet.nextDocument();
          if (nextDocument == null) {
            break;
          } else {
            // Since there are a couple of places below that could throw
            // exceptions but not exit the while loop, the counter should be
            // incremented here to insure it represents documents returned from
            // the list.  Note the call to nextDocument() could also throw a
            // RepositoryDocumentException signaling a skipped document in which
            // case the call will not be counted against the batch maximum.
            counter++;
            // Fetch DocId to use in messages.
            try {
              docid = Value.getSingleValueString(nextDocument,
                                                 SpiConstants.PROPNAME_DOCID);
            } catch (IllegalArgumentException e1) {
                LOGGER.fine("Unable to get document id for document ("
                            + nextDocument + "): " + e1.getMessage());
            } catch (RepositoryException e1) {
                LOGGER.fine("Unable to get document id for document ("
                            + nextDocument + "): " + e1.getMessage());
            }
          }
          LOGGER.finer("Sending document (" + docid + ") from connector "
              + connectorName + " to Pusher");

          if (!pusher.take(nextDocument)) {
            LOGGER.fine("Traversal for connector " + connectorName
                + " is completing at the request of the Pusher.");
            break;
          }

        } catch (SkippedDocumentException e) {
          /* TODO (bmj): This is a temporary solution and should be replaced.
           * It uses Exceptions for non-exceptional cases.
           */
          // Skip this document.  Proceed on to the next one.
          if (LOGGER.isLoggable(Level.FINER)) {
            LOGGER.log(Level.FINER, "Skipping document (" + docid
                + ") from connector " + connectorName + ": " + e.getMessage());
          }
        } catch (RepositoryDocumentException e) {
          // Skip individual documents that fail.  Proceed on to the next one.
          LOGGER.log(Level.WARNING, "Skipping document (" + docid
              + ") from connector " + connectorName, e);
        } catch (RuntimeException e) {
          // Skip individual documents that fail.  Proceed on to the next one.
          LOGGER.log(Level.WARNING, "Skipping document (" + docid
              + ") from connector " + connectorName, e);
        }
      }
      // No more documents. Wrap up any accumulated feed data and send it off.
      if (!isCancelled()) {
        pusher.flush();
      }
    } catch (OutOfMemoryError e) {
      pusher.cancel();
      System.runFinalization();
      System.gc();
      result = new BatchResult(TraversalDelayPolicy.ERROR);
      try {
        LOGGER.severe("Out of JVM Heap Space.  Will retry later.");
        LOGGER.log(Level.FINEST, e.getMessage(), e);
      } catch (Throwable t) {
        // OutOfMemory state may prevent us from logging the error.
        // Don't make matters worse by rethrowing something meaningless.
      }
    } catch (RepositoryException e) {
      // Drop the entire batch on the floor.  Do not call checkpoint
      // (as there is a discrepancy between what the Connector thinks
      // it has fed, and what actually has been pushed).
      LOGGER.log(Level.SEVERE, "Repository Exception during traversal.", e);
      result = new BatchResult(TraversalDelayPolicy.ERROR);
    } catch (PushException e) {
      LOGGER.log(Level.SEVERE, "Push Exception during traversal.", e);
      // Drop the entire batch on the floor.  Do not call checkpoint
      // (as there is a discrepancy between what the Connector thinks
      // it has fed, and what actually has been pushed).
      result = new BatchResult(TraversalDelayPolicy.ERROR);
    } catch (FeedException e) {
      LOGGER.log(Level.SEVERE, "Feed Exception during traversal.", e);
      // Drop the entire batch on the floor.  Do not call checkpoint
      // (as there is a discrepancy between what the Connector thinks
      // it has fed, and what actually has been pushed).
      result = new BatchResult(TraversalDelayPolicy.ERROR);
    } catch (Throwable t) {
      LOGGER.log(Level.SEVERE, "Uncaught Exception during traversal.", t);
      // Drop the entire batch on the floor.  Do not call checkpoint
      // (as there is a discrepancy between what the Connector thinks
      // it has fed, and what actually has been pushed).
      result = new BatchResult(TraversalDelayPolicy.ERROR);
   } finally {
      // If we have cancelled the work, abandon the batch.
      if (isCancelled()) {
        result = new BatchResult(TraversalDelayPolicy.ERROR);
      }

      // Checkpoint completed work as well as skip past troublesome documents
      // (e.g. documents that are too large and will always fail).
      if ((result == null) && (checkpointAndSave(resultSet) == null)) {
        // Unable to get a checkpoint, so wait a while, then retry batch.
        result = new BatchResult(TraversalDelayPolicy.ERROR);
      }
    }
    if (result == null) {
      result = new BatchResult(TraversalDelayPolicy.IMMEDIATE, counter,
                               startTime, System.currentTimeMillis());
    } else if (pusher != null) {
      // We are returning an error from this batch. Cancel any feed that
      // might be in progress.
      pusher.cancel();
    }
    return result;
  }

  private String checkpointAndSave(DocumentList pm) {
    String connectorState = null;
    LOGGER.finest("Checkpointing for connector " + connectorName + " ...");
    try {
      connectorState = pm.checkpoint();
    } catch (RepositoryException re) {
      // If checkpoint() throws RepositoryException, it means there is no
      // new checkpoint.
      return null;
    } catch (Exception e) {
      // If checkpoint() throws some general Exception, it is probably
      // an older connector that doesn't understand the newer empty
      // DocumentList and Exception handling from runBatch() model.
      return null;
    }
    try {
      if (connectorState != null) {
        if (stateStore != null) {
          stateStore.storeTraversalState(connectorState);
        } else {
          throw new IllegalStateException("null TraversalStateStore");
        }
        LOGGER.finest("...checkpoint " + connectorState + " created.");
      }
      return connectorState;
    } catch (IllegalStateException ise) {
      // We get here if the ConnectorStateStore for connector is disabled.
      // That happens if the connector was deleted while we were working.
      // Our connector seems to have been deleted.  Don't save a checkpoint.
      LOGGER.finest("...checkpoint " + connectorState + " discarded.");
    }
    return null;
  }
}

void cancelBatch()方法用去取消执行，我们上文看到是通过CancelableBatch类调用的，而Taskhandle再调用CancelableBatch类的cancel()方法，最终委托给ConnectorCoordinatorImpl类的void resetBatch()方法

还是分析一下关键方法BatchResult runBatch(BatchSize batchSize)，这里是调用具体连接器的TraversalManager接口实现类来轮询数据的

首先判断任务是否取消，否则直接返回

接着获取存储的任务的状态信息，并调用queryTraversalManager.startTraversal();或queryTraversalManager.resumeTraversal(connectorState);获取数据结果集DocumentList resultSet

接下来通过PusherFactory工厂获取Pusher类型对象，并调用Pusher类型对象的take方法想应用中心发送数据

最后保存数据结果集的状态信息并返回连接器数据采集的结果信息

new BatchResult(TraversalDelayPolicy.IMMEDIATE, counter,startTime, System.currentTimeMillis());

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处博客园刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/03/19/2968411.html

posted on 2013-03-19 01:48 刺猬的温驯阅读(310) 评论(0) 编辑收藏举报

刷新页面返回顶部

君子博学而日参省乎己则知明而行无过矣

公告

君子博学而日参省乎己 则知明而行无过矣

公告

君子博学而日参省乎己则知明而行无过矣