君子博学而日参省乎己 则知明而行无过矣

博客园 首页 新随笔 联系 订阅 管理

Heritrix3.1.0系统里面Frontier组件管理链接队列,采用的是BDB数据库,利用BDB数据库来存储CrawlURI对象,首先我们来看Heritrix3.1.0是怎么实现BDB模块的

我们知道,创建BDB数据库首先要构建数据库环境,Heritrix3.1.0的BDB模块里面EnhancedEnvironment类实现了对BDB数据库环境的封装(继承自je的Environment),如果你不熟悉BDB数据库,可以先google一下吧

EnhancedEnvironment类的源码如下:

/**
 * Version of BDB_JE Environment with additional convenience features, such as
 * a shared, cached StoredClassCatalog. (Additional convenience caching of 
 * Databases and StoredCollections may be added later.)
 * 
 * @author gojomo
 */
public class EnhancedEnvironment extends Environment {
    StoredClassCatalog classCatalog; 
    Database classCatalogDB;
    
    /**
     * Constructor
     * 
     * @param envHome directory in which to open environment
     * @param envConfig config options
     * @throws DatabaseException
     */
    public EnhancedEnvironment(File envHome, EnvironmentConfig envConfig) throws DatabaseException {
        super(envHome, envConfig);
    }

    /**
     * Return a StoredClassCatalog backed by a Database in this environment,
     * either pre-existing or created (and cached) if necessary.
     * 
     * @return the cached class catalog
     */
    public StoredClassCatalog getClassCatalog() {
        if(classCatalog == null) {
            DatabaseConfig dbConfig = new DatabaseConfig();
            dbConfig.setAllowCreate(true);
            dbConfig.setReadOnly(this.getConfig().getReadOnly());
            try {
                classCatalogDB = openDatabase(null, "classCatalog", dbConfig);
                classCatalog = new StoredClassCatalog(classCatalogDB);
            } catch (DatabaseException e) {
                // TODO Auto-generated catch block
                throw new RuntimeException(e);
            }
        }
        return classCatalog;
    }

    @Override
    public synchronized void close() throws DatabaseException {
        if(classCatalogDB!=null) {
            classCatalogDB.close();
        }
        super.close();
    }

    /**
     * Create a temporary test environment in the given directory.
     * @param dir target directory
     * @return EnhancedEnvironment
     */
    public static EnhancedEnvironment getTestEnvironment(File dir) {
        EnvironmentConfig envConfig = new EnvironmentConfig();
        envConfig.setAllowCreate(true);
        envConfig.setTransactional(false);
        EnhancedEnvironment env;
        try {
            env = new EnhancedEnvironment(dir, envConfig);
        } catch (DatabaseException e) {
            throw new RuntimeException(e);
        } 
        return env;
    }
}

从该类源码可以看到,除了实现je的Environment功能外,还增加了StoredClassCatalog getClassCatalog()方法,是BDB存储自定义对象需要用到的,里面同时创建了classCatalogDB库用来构建StoredClassCatalog对象

那么 我们要创建以及操作BDB数据库是哪里实现的呢,接下来就是要分析的BdbModule类了(BdbModule类实现了一系列的接口,这部分暂时不具体解释)

BdbModule类的源码有点长,我这里就不贴出来了,只在分析时贴出相关代码

    private static class DatabasePlusConfig implements Serializable {
        private static final long serialVersionUID = 1L;
        public transient Database database;
        public BdbConfig config;
    }
    
    
    /**
     * Configuration object for databases.  Needed because 
     * {@link DatabaseConfig} is not serializable.  Also it prevents invalid
     * configurations.  (All databases opened through this module must be
     * deferred-write, because otherwise they can't sync(), and you can't
     * run a checkpoint without doing sync() first.)
     * 
     * @author pjack
     *
     */
    public static class BdbConfig implements Serializable {
        private static final long serialVersionUID = 1L;

        boolean allowCreate;
        boolean sortedDuplicates;
        boolean transactional;
        boolean deferredWrite = true; 

        public BdbConfig() {
        }


        public boolean isAllowCreate() {
            return allowCreate;
        }


        public void setAllowCreate(boolean allowCreate) {
            this.allowCreate = allowCreate;
        }


        public boolean getSortedDuplicates() {
            return sortedDuplicates;
        }


        public void setSortedDuplicates(boolean sortedDuplicates) {
            this.sortedDuplicates = sortedDuplicates;
        }

        public DatabaseConfig toDatabaseConfig() {
            DatabaseConfig result = new DatabaseConfig();
            result.setDeferredWrite(deferredWrite);
            result.setTransactional(transactional);
            result.setAllowCreate(allowCreate);
            result.setSortedDuplicates(sortedDuplicates);
            return result;
        }


        public boolean isTransactional() {
            return transactional;
        }


        public void setTransactional(boolean transactional) {
            this.transactional = transactional;
        }


        public void setDeferredWrite(boolean b) {
            this.deferredWrite = true; 
        }
    }

上面部分是静态类DatabasePlusConfig和BdbConfig,前者是私有的,只能在BdbModule类创建,后者是公有的,可以在外部创建 

显然,静态类DatabasePlusConfig除了Database database成员变量外,还有静态类BdbConfig的成员变量BdbConfig config

静态类BdbConfig是对BDB数据库配置的封装,我们从它的属性可以看到,通过设置里面的属性后,从它的DatabaseConfig toDatabaseConfig()方法返回BDB数据库配置对象

  public DatabaseConfig toDatabaseConfig() {
            DatabaseConfig result = new DatabaseConfig();
            result.setDeferredWrite(deferredWrite);
            result.setTransactional(transactional);
            result.setAllowCreate(allowCreate);
            result.setSortedDuplicates(sortedDuplicates);
            return result;
        }

 BdbModule源码下面部分为BDB数据库环境属性设置,在后面的BDB数据库环境实例化方法里面用到了这些参数

protected ConfigPath dir = new ConfigPath("bdbmodule subdirectory","state");
    public ConfigPath getDir() {
        return dir;
    }
    public void setDir(ConfigPath dir) {
        this.dir = dir;
    }
    
    int cachePercent = -1;
    public int getCachePercent() {
        return cachePercent;
    }
    public void setCachePercent(int cachePercent) {
        this.cachePercent = cachePercent;
    }

    boolean useSharedCache = true; 
    public boolean getUseSharedCache() {
        return useSharedCache;
    }
    public void setUseSharedCache(boolean useSharedCache) {
        this.useSharedCache = useSharedCache;
    }
    
    /**
     * Expected number of concurrent threads; used to tune nLockTables
     * according to JE FAQ
     * http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
     */
    int expectedConcurrency = 64;
    public int getExpectedConcurrency() {
        return expectedConcurrency;
    }
    public void setExpectedConcurrency(int expectedConcurrency) {
        this.expectedConcurrency = expectedConcurrency;
    }
    
    /**
     * Whether to use hard-links to log files to collect/retain
     * the BDB log files needed for a checkpoint. Default is true. 
     * May not work on Windows (especially on pre-NTFS filesystems). 
     * If false, the BDB 'je.cleaner.expunge' value will be set to 
     * 'false', as well, meaning BDB will *not* delete obsolete JDB
     * files, but only rename the '.DEL'. They will have to be 
     * manually deleted to free disk space, but .DEL files referenced
     * in any checkpoint's 'jdbfiles.manifest' should be retained to
     * keep the checkpoint valid. 
     */
    boolean useHardLinkCheckpoints = true;
    public boolean getUseHardLinkCheckpoints() {
        return useHardLinkCheckpoints;
    }
    public void setUseHardLinkCheckpoints(boolean useHardLinkCheckpoints) {
        this.useHardLinkCheckpoints = useHardLinkCheckpoints;
    }
    
    private transient EnhancedEnvironment bdbEnvironment;
        
    private transient StoredClassCatalog classCatalog;

下面需要注意的是两个成员变量比较重要

@SuppressWarnings("rawtypes")
    private Map<String,ObjectIdentityCache> oiCaches = 
        new ConcurrentHashMap<String,ObjectIdentityCache>();

    private Map<String,DatabasePlusConfig> databases =
        new ConcurrentHashMap<String,DatabasePlusConfig>();

两者都是map类型的变量成员,可以理解为map容器,前者保存的是缓存管理的对象(BdbFrontier模块里面用来管理工作队列缓存),后者是DatabasePlusConfig对象,对外提供BDB数据库实例

我们看它的初始化方法start(该方法是spring框架里面的Lifecycle接口方法,BdbModule实现了该接口)

public synchronized void start() {
        if (isRunning()) {
            return;
        }
        
        isRunning = true;
        
        try {
            boolean isRecovery = false; 
            if(recoveryCheckpoint!=null) {
                isRecovery = true; 
                doRecover(); 
            }
   
            setup(getDir().getFile(), !isRecovery);
        } catch (DatabaseException e) {
            throw new IllegalStateException(e);
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

doRecover()方法用于从断点恢复,setup(getDir().getFile(), !isRecovery);用于实初始化数据库环境的封装对象EnhancedEnvironment和StoredClassCatalog对象

protected void setup(File f, boolean create) 
    throws DatabaseException, IOException {
        EnvironmentConfig config = new EnvironmentConfig();
        config.setAllowCreate(create);
        config.setLockTimeout(75, TimeUnit.MINUTES); // set to max
        if(getCachePercent()>0) {
            config.setCachePercent(getCachePercent());
        }
        config.setSharedCache(getUseSharedCache());
        
        // we take the advice literally from...
        // http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
        long nLockTables = getExpectedConcurrency()-1;
        while(!BigInteger.valueOf(nLockTables).isProbablePrime(Integer.MAX_VALUE)) {
            nLockTables--;
        }
        config.setConfigParam("je.lock.nLockTables", Long.toString(nLockTables));
        
        // triple this value to 6K because stats show many faults
        config.setConfigParam("je.log.faultReadSize", "6144"); 

        if(!getUseHardLinkCheckpoints()) {
            // to support checkpoints by textual manifest only, 
            // prevent BDB's cleaner from deleting log files
            config.setConfigParam("je.cleaner.expunge", "false");
        } // else leave whatever other setting was already in place

        org.archive.util.FileUtils.ensureWriteableDirectory(f);
        this.bdbEnvironment = new EnhancedEnvironment(f, config);
        this.classCatalog = this.bdbEnvironment.getClassCatalog();
        if(!create) {
            // freeze last log file -- so that originating checkpoint isn't fouled
            DbBackup dbBackup = new DbBackup(bdbEnvironment);
            dbBackup.startBackup();
            dbBackup.endBackup();
        }
    }

打开数据库的方法是openDatabase(String name, BdbConfig config, boolean usePriorData) 

/**
     * Open a Database inside this BdbModule's environment, and 
     * remember it for automatic close-at-module-stop. 
     * 
     * @param name
     * @param config
     * @param usePriorData
     * @return
     * @throws DatabaseException
     */
    public Database openDatabase(String name, BdbConfig config, boolean usePriorData) 
    throws DatabaseException {
        if (bdbEnvironment == null) {
            // proper initialization hasn't occurred
            throw new IllegalStateException("BdbModule not started");
        }
        if (databases.containsKey(name)) {
            DatabasePlusConfig dpc = databases.get(name);
            if(dpc.config == config) {
                // object-identical configs: OK to share DB
                return dpc.database;
            }
            // unshared config object: might be name collision; error
            throw new IllegalStateException("Database already exists: " +name);
        }
        
        DatabasePlusConfig dpc = new DatabasePlusConfig();
        if (!usePriorData) {
            try {
                bdbEnvironment.truncateDatabase(null, name, false);
            } catch (DatabaseNotFoundException e) {
                // Ignored
            }
        }
        dpc.database = bdbEnvironment.openDatabase(null, name, config.toDatabaseConfig());
        dpc.config = config;
        databases.put(name, dpc);
        return dpc.database;
    }

 在调用该方法时先判断Map<String,DatabasePlusConfig> databases成员变量里面有没有保存,然后再创建

下面的方法是返回StoredQueue队列,StoredQueue队列里面保存的类型为参数里面的Class<K> clazz,数据库配置是StoredQueue.databaseConfig()(StoredQueue本身的)

 public <K extends Serializable> StoredQueue<K> getStoredQueue(String dbname, Class<K> clazz, boolean usePriorData) {
        try {
            Database queueDb;
            queueDb = openDatabase(dbname,
                    StoredQueue.databaseConfig(), usePriorData);
            return new StoredQueue<K>(queueDb, clazz, getClassCatalog());
        } catch (DatabaseException e) {
            throw new RuntimeException(e);
        }
        
    }

在实例化StoredQueue队列时,传入的StoredClassCatalog对象用于创建EntryBinding<E>类型的对象(比如Heritrix里面有KryoBinding<K>类型的)(用于可序列化化类到BDB数据类型的转换,K为可序列化类型对象 <K extends Serializable>)

这里有必要看来一段插曲,进去看看StoredQueue类的源码,StoredQueue继承自AbstractQueue<E>,实现了用BDB数据库存储队列成员的队列操作

/**
 * Queue backed by a JE Collections StoredSortedMap. 
 * 
 * @author gojomo
 *
 * @param <E>
 */
public class StoredQueue<E extends Serializable> extends AbstractQueue<E>  {
    @SuppressWarnings("unused")
    private static final Logger logger =
        Logger.getLogger(StoredQueue.class.getName());

    transient StoredSortedMap<Long,E> queueMap; // Long -> E
    transient Database queueDb; // Database
    AtomicLong tailIndex; // next spot for insert
    transient volatile E peekItem = null;
    
    /**
     * Create a StoredQueue backed by the given Database. 
     * 
     * The Class of values to be queued may be provided; there is only a 
     * benefit when a primitive type is specified. A StoredClassCatalog
     * must be provided if a primitive type is not supplied. 
     * 
     * @param db
     * @param clsOrNull 
     * @param classCatalog
     */
    public StoredQueue(Database db, Class<E> clsOrNull, StoredClassCatalog classCatalog) {
        hookupDatabase(db, clsOrNull, classCatalog);
        tailIndex = new AtomicLong(queueMap.isEmpty() ? 0L : queueMap.lastKey()+1);
    }

    /**
     * @param db
     * @param clsOrNull
     * @param classCatalog
     */
    public void hookupDatabase(Database db, Class<E> clsOrNull, StoredClassCatalog classCatalog) {
        EntryBinding<E> valueBinding = TupleBinding.getPrimitiveBinding(clsOrNull);
        if(valueBinding == null) {
            valueBinding = new SerialBinding<E>(classCatalog, clsOrNull);
        }
        queueDb = db;
        queueMap = new StoredSortedMap<Long,E>(
                db,
                TupleBinding.getPrimitiveBinding(Long.class),
                valueBinding,
                true);
    }

    @Override
    public Iterator<E> iterator() {
        return queueMap.values().iterator();
    }

    @Override
    public int size() {
        try {
            return Math.max(0, 
                    (int)(tailIndex.get() 
                          - queueMap.firstKey())); 
        } catch (IllegalStateException ise) {
            return 0; 
        } catch (NoSuchElementException nse) {
            return 0;
        } catch (NullPointerException npe) {
            return 0;
        }
    }
    
    @Override
    public boolean isEmpty() {
        if(peekItem!=null) {
            return false;
        }
        try {
            return queueMap.isEmpty();
        } catch (IllegalStateException de) {
            return true;
        }
    }

    public boolean offer(E o) {
        long targetIndex = tailIndex.getAndIncrement();
        queueMap.put(targetIndex, o);
        return true;
    }

    public synchronized E peek() {
        if(peekItem == null) {
            if(queueMap.isEmpty()) {
                return null; 
            }
            peekItem = queueMap.remove(queueMap.firstKey());
        }
        return peekItem; 
    }

    public synchronized E poll() {
        E head = peek();
        peekItem = null;
        return head; 
    }

    /**
     * A suitable DatabaseConfig for the Database backing a StoredQueue. 
     * (However, it is not necessary to use these config options.)
     * 
     * @return DatabaseConfig suitable for queue
     */
    public static BdbModule.BdbConfig databaseConfig() {
        BdbModule.BdbConfig dbConfig = new BdbModule.BdbConfig();
        dbConfig.setTransactional(false);
        dbConfig.setAllowCreate(true);
        return dbConfig;
    }
    
    public void close() {
        try {
            queueDb.sync();
            queueDb.close();
        } catch (DatabaseException e) {
            throw new RuntimeException(e);
        }
    }
}

je封装了StoredSortedMap<Long,E>类型的类用于操作管理BDB数据库里面的数据,至此,我们可以将StoredQueue对象理解为数据存储在BDB数据库(里面经过StoredSortedMap的封装)的队列(queue)

后面的部分为缓存管理(管理实现了IdentityCacheable接口的对象的缓存,如BdbWorkQueue类间接实现了该接口,从而实现了工作队列对象的缓存的管理;其实ObjectIdentityBdbManualCache对象本身的缓存也是通过BDB数据库存储的)

 /**
     * Get an ObjectIdentityBdbCache, backed by a BDB Database of the 
     * given name, with the given value class type. If 'recycle' is true,
     * reuse values already in the database; otherwise start with an 
     * empty cache. 
     *  
     * @param <V>
     * @param dbName
     * @param recycle
     * @param valueClass
     * @return
     * @throws DatabaseException
     */
    public <V extends IdentityCacheable> ObjectIdentityBdbManualCache<V> getOIBCCache(String dbName, boolean recycle,
            Class<? extends V> valueClass) 
    throws DatabaseException {
        if (!recycle) {
            try {
                bdbEnvironment.truncateDatabase(null, dbName, false);
            } catch (DatabaseNotFoundException e) {
                // ignored
            }
        }
        ObjectIdentityBdbManualCache<V> oic = new ObjectIdentityBdbManualCache<V>();
        oic.initialize(bdbEnvironment, dbName, valueClass, classCatalog);
        oiCaches.put(dbName, oic);
        return oic;
    }
  
    public <V extends IdentityCacheable> ObjectIdentityCache<V> getObjectCache(String dbName, boolean recycle,
            Class<V> valueClass) 
    throws DatabaseException {
        return getObjectCache(dbName, recycle, valueClass, valueClass);
    }
    
    /**
     * Get an ObjectIdentityCache, backed by a BDB Database of the given 
     * name, with objects of the given valueClass type. If 'recycle' is
     * true, reuse values already in the database; otherwise start with 
     * an empty cache. 
     * 
     * @param <V>
     * @param dbName
     * @param recycle
     * @param valueClass
     * @return
     * @throws DatabaseException
     */
    public <V extends IdentityCacheable> ObjectIdentityCache<V> getObjectCache(String dbName, boolean recycle,
            Class<V> declaredClass, Class<? extends V> valueClass) 
    throws DatabaseException {
        @SuppressWarnings("unchecked")
        ObjectIdentityCache<V> oic = oiCaches.get(dbName);
        if(oic!=null) {
            return oic; 
        }
        oic =  getOIBCCache(dbName, recycle, valueClass);
        return oic; 
    }

再后面部分为设置断点及从断点恢复

public void doCheckpoint(Checkpoint checkpointInProgress) throws IOException {
        // First sync objectCaches
        for (@SuppressWarnings("rawtypes") ObjectIdentityCache oic : oiCaches.values()) {
            oic.sync();
        }

        try {
            // sync all databases
            for (DatabasePlusConfig dbc: databases.values()) {
                dbc.database.sync();
            }
        
            // Do a force checkpoint.  Thats what a sync does (i.e. doSync).
            CheckpointConfig chkptConfig = new CheckpointConfig();
            chkptConfig.setForce(true);
            
            // Mark Hayes of sleepycat says:
            // "The default for this property is false, which gives the current
            // behavior (allow deltas).  If this property is true, deltas are
            // prohibited -- full versions of internal nodes are always logged
            // during the checkpoint. When a full version of an internal node
            // is logged during a checkpoint, recovery does not need to process
            // it at all.  It is only fetched if needed by the application,
            // during normal DB operations after recovery. When a delta of an
            // internal node is logged during a checkpoint, recovery must
            // process it by fetching the full version of the node from earlier
            // in the log, and then applying the delta to it.  This can be
            // pretty slow, since it is potentially a large amount of
            // random I/O."
            // chkptConfig.setMinimizeRecoveryTime(true);
            bdbEnvironment.checkpoint(chkptConfig);
            LOGGER.fine("Finished bdb checkpoint.");
        
            DbBackup dbBackup = new DbBackup(bdbEnvironment);
            try {
                dbBackup.startBackup();
                
                File envCpDir = new File(dir.getFile(),checkpointInProgress.getName());
                org.archive.util.FileUtils.ensureWriteableDirectory(envCpDir);
                File logfilesList = new File(envCpDir,"jdbfiles.manifest");
                String[] filedata = dbBackup.getLogFilesInBackupSet();
                for (int i=0; i<filedata.length;i++) {
                    File f = new File(dir.getFile(),filedata[i]);
                    filedata[i] += ","+f.length();
                    if(getUseHardLinkCheckpoints()) {
                        File hardLink = new File(envCpDir,filedata[i]);
                        if (!FilesystemLinkMaker.makeHardLink(f.getAbsolutePath(), hardLink.getAbsolutePath())) {
                            LOGGER.log(Level.SEVERE, "unable to create required checkpoint link "+hardLink); 
                        }
                    }
                }
                FileUtils.writeLines(logfilesList,Arrays.asList(filedata));
                LOGGER.fine("Finished processing bdb log files.");
            } finally {
                dbBackup.endBackup();
            }
        } catch (DatabaseException e) {
            throw new IOException(e);
        }
    }
    
    @SuppressWarnings("unchecked")
    protected void doRecover() throws IOException {
        File cpDir = new File(dir.getFile(),recoveryCheckpoint.getName());
        File logfilesList = new File(cpDir,"jdbfiles.manifest");
        List<String> filesAndLengths = FileUtils.readLines(logfilesList);
        HashMap<String,Long> retainLogfiles = new HashMap<String,Long>();
        for(String line : filesAndLengths) {
            String[] fileAndLength = line.split(",");
            long expectedLength = Long.valueOf(fileAndLength[1]);
            retainLogfiles.put(fileAndLength[0],expectedLength);
            
            // check for files in checkpoint directory; relink to environment as necessary
            File cpFile = new File(cpDir, line);
            File destFile = new File(dir.getFile(), fileAndLength[0]);
            if(cpFile.exists()) {
                if(cpFile.length()!=expectedLength) {
                    LOGGER.warning(cpFile.getName()+" expected "+expectedLength+" actual "+cpFile.length());
                    // TODO: is truncation necessary? 
                }
                if(destFile.exists()) {
                    if(!destFile.delete()) {
                        LOGGER.log(Level.SEVERE, "unable to delete obstructing file "+destFile);  
                    }
                }
                int status = CLibrary.INSTANCE.link(cpFile.getAbsolutePath(), destFile.getAbsolutePath());
                if (status!=0) {
                    LOGGER.log(Level.SEVERE, "unable to create required restore link "+destFile); 
                }
            }
            
        }
        
        IOFileFilter filter = FileFilterUtils.orFileFilter(
                FileFilterUtils.suffixFileFilter(".jdb"), 
                FileFilterUtils.suffixFileFilter(".del"));
        filter = FileFilterUtils.makeFileOnly(filter);
        
        // reverify environment directory is as it was at checkpoint time, 
        // deleting any extra files
        for(File f : dir.getFile().listFiles((FileFilter)filter)) {
            if(retainLogfiles.containsKey(f.getName())) {
                // named file still exists under original name
                long expectedLength = retainLogfiles.get(f.getName());
                if(f.length()!=expectedLength) {
                    LOGGER.warning(f.getName()+" expected "+expectedLength+" actual "+f.length());
                    // TODO: truncate? this unexpected length mismatch
                    // probably only happens if there was already a recovery
                    // where the affected file was the last of the set, in 
                    // which case BDB appends a small amount of (harmless?) data
                    // to the previously-undersized file
                }
                retainLogfiles.remove(f.getName()); 
                continue;
            }
            // file as now-named not in restore set; check if un-".DEL" renaming needed
            String undelName = f.getName().replace(".del", ".jdb");
            if(retainLogfiles.containsKey(undelName)) {
                // file if renamed matches desired file name
                long expectedLength = retainLogfiles.get(undelName);
                if(f.length()!=expectedLength) {
                    LOGGER.warning(f.getName()+" expected "+expectedLength+" actual "+f.length());
                    // TODO: truncate to expected size?
                }
                if(!f.renameTo(new File(f.getParentFile(),undelName))) {
                    throw new IOException("Unable to rename " + f + " to " +
                            undelName);
                }
                retainLogfiles.remove(undelName); 
            }
            // file not needed; delete/move-aside
            if(!f.delete()) {
                LOGGER.warning("unable to delete "+f);
                org.archive.util.FileUtils.moveAsideIfExists(f);
            }
            // TODO: log/warn of ruined later checkpoints? 
        }
        if(retainLogfiles.size()>0) {
            // some needed files weren't present
            LOGGER.severe("Checkpoint corrupt, needed log files missing: "+retainLogfiles);
        }
        
    }

最后还有getStoredMap(String dbName, Class<K> keyClass, Class<V> valueClass, boolean allowDuplicates, boolean usePriorData)方法。用于创建临时的DisposableStoredSortedMap<K,V>对象(继承自je的StoredSortedMap,可以理解为存储在BDB数据库的(经过StoredSortedMap封装)临时的map容器, Class<K> keyClass, Class<V> valueClass参数为key和value的类型)

/**
     * Creates a database-backed TempStoredSortedMap for transient 
     * reporting requirements. Calling the returned map's destroy()
     * method when done discards the associated Database. 
     * 
     * @param <K>
     * @param <V>
     * @param dbName Database name to use; if null a name will be synthesized
     * @param keyClass Class of keys; should be a Java primitive type
     * @param valueClass Class of values; may be any serializable type
     * @param allowDuplicates whether duplicate keys allowed
     * @return
     */
    public <K,V> DisposableStoredSortedMap<K, V> getStoredMap(String dbName, Class<K> keyClass, Class<V> valueClass, boolean allowDuplicates, boolean usePriorData) {
        BdbConfig config = new BdbConfig(); 
        config.setSortedDuplicates(allowDuplicates);
        config.setAllowCreate(!usePriorData); 
        Database mapDb;
        if(dbName==null) {
            dbName = "tempMap-"+System.identityHashCode(this)+"-"+sn;
            sn++;
        }
        final String openName = dbName; 
        try {
            mapDb = openDatabase(openName,config,usePriorData);
        } catch (DatabaseException e) {
            throw new RuntimeException(e); 
        } 
        EntryBinding<V> valueBinding = TupleBinding.getPrimitiveBinding(valueClass);
        if(valueBinding == null) {
            valueBinding = new SerialBinding<V>(classCatalog, valueClass);
        }
        DisposableStoredSortedMap<K,V> storedMap = new DisposableStoredSortedMap<K, V>(
                mapDb,
                TupleBinding.getPrimitiveBinding(keyClass),
                valueBinding,
                true) {
                    @Override
                    public void dispose() {
                        super.dispose();
                        DatabasePlusConfig dpc = BdbModule.this.databases.remove(openName);
                        if (dpc == null) {
                            BdbModule.LOGGER.log(Level.WARNING,"No such database: " + openName);
                        }
                    }
        };
        return storedMap; 
    }
    

经过本文分析,我们还有很多疑问,待后文再继续吧

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/14/3019757.html 

posted on 2013-04-19 04:37  刺猬的温驯  阅读(1666)  评论(0编辑  收藏  举报