POCO C++库学习和分析 -- Cache

1. Cache概述

在STL::map或者STL::set中，容器的尺寸是没有上限的，数目可以不断的扩充。并且在STL的容器中，元素是不会自动过期的，除非显式的被删除。Poco的Cache可以被看成是STL中容器的一个扩充，容器中的元素会自动过期(即失效)。在Poco实现的Cache框架中，基础的过期策略有两种。一种是LRU(Last Recent Used)，另外一种是基于时间的过期(Time based expiration)。在上述两种过期策略之上，还提供了两者之间的混合。

  下面是相关的类：
1. LRUCache: 最近使用Cache。在内部维护一个Cache的最大容量M，始终只保存M个元素于Cache内部，当第M+1元素插入Cache中时，最先被放入Cache中的元素将失效。
2. ExpireCache: 时间过期Cache。在内部统一管理失效时间T，当元素插入Cache后，超过时间T，则删除。
3. AccessExpireCache: 时间过期Cache。同ExpireCache不同的是，当元素被访问后，重新开始计算该元素的超时时间，而不是只从元素插入时开始计时。
  4. UniqueExpireCache: 时间过期Cache。同ExpireCache不同的是，每一个元素都有自己单独的失效时间。
5. UniqueAccessExpireCache：时间过期Cache。同AccessExpireCache不同的是，每一个元素都有自己单独的失效时间。
  6. ExpireLRUCache：时间过期和LRU策略的混合体。当时间过期和LRU任一过期条件被触发时，容器中的元素失效。
7. AccessExpireLRUCache：时间过期和LRU策略的混合体。同ExpireLRUCache相比，当元素被访问后，重新开始计算该元素的超时时间，而不是只从元素插入时开始计时。
8. UniqueExpireLRUCache：时间过期和LRU策略的混合体。同ExpireLRUCache相比，每一个元素都有自己单独的失效时间。
9. UniqueAccessExpireLRUCache：时间过期和LRU策略的混合体。同UniqueExpireLRUCache相比，当元素被访问后，重新开始计算该元素的超时时间，而不是只从元素插入时开始计时。

2. Cache的内部结构

2.1 Cache类

下面是Poco中Cache的类图：

从类图中我们可以看到所有的Cache都有一个对应的strategy类。事实上strategy类负责快速搜索Cache中的过期元素。Cache和strategy采用了Poco中的同步事件机制(POCO C++库学习和分析 -- 通知和事件（四） )。

让我们来看AbstractCache的定义：

template <class TKey, class TValue, class TStrategy, class TMutex = FastMutex, class TEventMutex = FastMutex> 
class AbstractCache
	/// An AbstractCache is the interface of all caches. 
{
public:
	FIFOEvent<const KeyValueArgs<TKey, TValue >, TEventMutex > Add;
	FIFOEvent<const KeyValueArgs<TKey, TValue >, TEventMutex > Update;
	FIFOEvent<const TKey, TEventMutex>                         Remove;
	FIFOEvent<const TKey, TEventMutex>                         Get;
	FIFOEvent<const EventArgs, TEventMutex>                    Clear;

	typedef std::map<TKey, SharedPtr<TValue > > DataHolder;
	typedef typename DataHolder::iterator       Iterator;
	typedef typename DataHolder::const_iterator ConstIterator;
	typedef std::set<TKey>                      KeySet;

	AbstractCache()
	{
		initialize();
	}

	AbstractCache(const TStrategy& strat): _strategy(strat)
	{
		initialize();
	}

	virtual ~AbstractCache()
	{
		uninitialize();
	}

        // ...........

protected:
	mutable FIFOEvent<ValidArgs<TKey> > IsValid;
	mutable FIFOEvent<KeySet>           Replace;

	void initialize()
		/// Sets up event registration.
	{
		Add		+= Delegate<TStrategy, const KeyValueArgs<TKey, TValue> >(&_strategy, &TStrategy::onAdd);
		Update	+= Delegate<TStrategy, const KeyValueArgs<TKey, TValue> >(&_strategy, &TStrategy::onUpdate);
		Remove	+= Delegate<TStrategy, const TKey>(&_strategy, &TStrategy::onRemove);
		Get		+= Delegate<TStrategy, const TKey>(&_strategy, &TStrategy::onGet);
		Clear	+= Delegate<TStrategy, const EventArgs>(&_strategy, &TStrategy::onClear);
		IsValid	+= Delegate<TStrategy, ValidArgs<TKey> >(&_strategy, &TStrategy::onIsValid);
		Replace	+= Delegate<TStrategy, KeySet>(&_strategy, &TStrategy::onReplace);
	}

	void uninitialize()
		/// Reverts event registration.
	{
		Add		-= Delegate<TStrategy, const KeyValueArgs<TKey, TValue> >(&_strategy, &TStrategy::onAdd );
		Update	-= Delegate<TStrategy, const KeyValueArgs<TKey, TValue> >(&_strategy, &TStrategy::onUpdate);
		Remove	-= Delegate<TStrategy, const TKey>(&_strategy, &TStrategy::onRemove);
		Get		-= Delegate<TStrategy, const TKey>(&_strategy, &TStrategy::onGet);
		Clear	-= Delegate<TStrategy, const EventArgs>(&_strategy, &TStrategy::onClear);
		IsValid	-= Delegate<TStrategy, ValidArgs<TKey> >(&_strategy, &TStrategy::onIsValid);
		Replace	-= Delegate<TStrategy, KeySet>(&_strategy, &TStrategy::onReplace);
	}

	void doAdd(const TKey& key, const TValue& val)
		/// Adds the key value pair to the cache.
		/// If for the key already an entry exists, it will be overwritten.
	{
		Iterator it = _data.find(key);
		doRemove(it);


		KeyValueArgs<TKey, TValue> args(key, val);
		Add.notify(this, args);
		_data.insert(std::make_pair(key, SharedPtr<TValue>(new TValue(val))));
		
		doReplace();
	}

	void doAdd(const TKey& key, SharedPtr<TValue>& val)
		/// Adds the key value pair to the cache.
		/// If for the key already an entry exists, it will be overwritten.
	{
		Iterator it = _data.find(key);
		doRemove(it);


		KeyValueArgs<TKey, TValue> args(key, *val);
		Add.notify(this, args);
		_data.insert(std::make_pair(key, val));
		
		doReplace();
	}

	void doUpdate(const TKey& key, const TValue& val)
		/// Adds the key value pair to the cache.
		/// If for the key already an entry exists, it will be overwritten.
	{
		KeyValueArgs<TKey, TValue> args(key, val);
		Iterator it = _data.find(key);
		if (it == _data.end())
		{
			Add.notify(this, args);
			_data.insert(std::make_pair(key, SharedPtr<TValue>(new TValue(val))));
		}
		else
		{
			Update.notify(this, args);
			it->second = SharedPtr<TValue>(new TValue(val));
		}
		
		doReplace();
	}

	void doUpdate(const TKey& key, SharedPtr<TValue>& val)
		/// Adds the key value pair to the cache.
		/// If for the key already an entry exists, it will be overwritten.
	{
		KeyValueArgs<TKey, TValue> args(key, *val);
		Iterator it = _data.find(key);
		if (it == _data.end())
		{
			Add.notify(this, args);
			_data.insert(std::make_pair(key, val));
		}
		else
		{
			Update.notify(this, args);
			it->second = val;
		}
		
		doReplace();
	}

	void doRemove(Iterator it) 
		/// Removes an entry from the cache. If the entry is not found
		/// the remove is ignored.
	{
		if (it != _data.end())
		{
			Remove.notify(this, it->first);
			_data.erase(it);
		}
	}

	bool doHas(const TKey& key) const
		/// Returns true if the cache contains a value for the key
	{
		// ask the strategy if the key is valid
		ConstIterator it = _data.find(key);
		bool result = false;


		if (it != _data.end())
		{
			ValidArgs<TKey> args(key);
			IsValid.notify(this, args);
			result = args.isValid();
		}

		return result;
	}

	SharedPtr<TValue> doGet(const TKey& key) 
		/// Returns a SharedPtr of the cache entry, returns 0 if for
		/// the key no value was found
	{
		Iterator it = _data.find(key);
		SharedPtr<TValue> result;

		if (it != _data.end())
		{	
			// inform all strategies that a read-access to an element happens
			Get.notify(this, key);
			// ask all strategies if the key is valid
			ValidArgs<TKey> args(key);
			IsValid.notify(this, args);

			if (!args.isValid())
			{
				doRemove(it);
			}
			else
			{
				result = it->second;
			}
		}

		return result;
	}

	void doClear()
	{
		static EventArgs _emptyArgs;
		Clear.notify(this, _emptyArgs);
		_data.clear();
	}

	void doReplace()
	{
		std::set<TKey> delMe;
		Replace.notify(this, delMe);
		// delMe contains the to be removed elements
		typename std::set<TKey>::const_iterator it    = delMe.begin();
		typename std::set<TKey>::const_iterator endIt = delMe.end();

		for (; it != endIt; ++it)
		{
			Iterator itH = _data.find(*it);
			doRemove(itH);
		}
	}

	TStrategy          _strategy;
	mutable DataHolder _data;
	mutable TMutex  _mutex;

private:
	// ....
};

从上面的定义中，可以看到AbstractCache是一个value的容器，采用map保存数据，

mutable std::map<TKey, SharedPtr<TValue > > _data;

另外AbstractCache中还定义了一个TStrategy对象，

TStrategy          _strategy;

  并且在AbstractCache的initialize()函数中，把Cache的一些函数操作委托给TStrategy对象。其函数操作接口为：
  1. Add : 向容器中添加元素
  2. Update : 更新容器中元素
  3. Remove : 删除容器中元素
  4. Get : 获取容器中元素
  5. Clear : 清除容器中所有元素
  6. IsValid: 容器中是否某元素
  7. Replace: 按照策略从strategy中获取过期元素，并从Cache和Strategy中同时删除。将触发一系列的Remove函数。

  这几个操作中最复杂的是Add操作，其中包括了Remove、Insert和Replace操作。

void doAdd(const TKey& key, SharedPtr<TValue>& val)
	/// Adds the key value pair to the cache.
	/// If for the key already an entry exists, it will be overwritten.
{
	Iterator it = _data.find(key);
	doRemove(it);


	KeyValueArgs<TKey, TValue> args(key, *val);
	Add.notify(this, args);
	_data.insert(std::make_pair(key, val));
		
	doReplace();
}

而Replace操作可被Add、Update、Get操作触发。这是因为Cache并不是一个主动对象(POCO C++库学习和分析 -- 线程（四）)，不会自动的把元素标志为失效，需要外界也就是调用方触发进行。

在Cache类中另外一个值得注意的地方是，保存的是TValue的SharedPtr。之所以这么设计，是为了线程安全，由于replace操作可能被多个线程调用，所以解决的方法，要么是返回TValue的SharedPtr，要么是返回TValue的拷贝。同拷贝方法相比，SharedPtr的方法要更加廉价。

2.2 Strategy类

Strategy类完成了对_data中保存的<key-value>pair中key的排序工作。每个Strategy中都存在一个key的容器，其中LRUStrategy中是std::list<TKey>，ExpireStrategy、UniqueAccessExpireStrategy、UniqueExpireStrategy中是std::multimap<Timestamp, TKey>。

对于LRU策略，这么设计我是可以理解的。每次访问都会使key被重置于list的最前端。为了实现对list快速访问，增加一个std::map<TKey, Iterator>容器，每次对list容器进行插入操作时，把插入位的itorator保存入map中，这样对于list的访问效率可以从O(n)变成O(log(n)),因为不需要遍历了。下面是相关的代码：

void onReplace(const void*, std::set<TKey>& elemsToRemove)
{
	// Note: replace only informs the cache which elements
	// it would like to remove!
	// it does not remove them on its own!
	std::size_t curSize = _keyIndex.size();

	if (curSize < _size)
	{
		return;
	}

	std::size_t diff = curSize - _size;
	Iterator it = --_keys.end(); //--keys can never be invoked on an empty list due to the minSize==1 requirement of LRU
	std::size_t i = 0;

	while (i++ < diff) 
	{
		elemsToRemove.insert(*it);
		if (it != _keys.begin())
		{
			--it;
		}
	}
}

LRUStrategy的replace操作是，只在curSize超过设定的访问上限_size时触发，把list容器中排在末尾的(curSize-_size)个元素标志为失效。

而对于Time base expired策略，还如此设计，我觉得不太合适。在时间策略的strategy类中，存在着两个容器，一个是std::map<TKey, IndexIterator>，另外一个是std::multimap<Timestamp, TKey>。进行插入操作时，代码为：

void onAdd(const void*, const KeyValueArgs <TKey, TValue>& args)
{
	Timestamp now;
	IndexIterator it = _keyIndex.insert(typename TimeIndex::value_type(now, args.key()));
	std::pair<Iterator, bool> stat = _keys.insert(typename Keys::value_type(args.key(), it));
	if (!stat.second)
	{
		_keyIndex.erase(stat.first->second);
		stat.first->second = it;
	}
}

可以看到map容器中保存的是multimap中pair对的itorator。其replace操作如下：

void onReplace(const void*, std::set<TKey>& elemsToRemove)
{
	// Note: replace only informs the cache which elements
	// it would like to remove!
	// it does not remove them on its own!
	IndexIterator it = _keyIndex.begin();
	while (it != _keyIndex.end() && it->first.isElapsed(_expireTime))
	{
		elemsToRemove.insert(it->second);
		++it;
	}
}

  可以看到这是对multimap的遍历，效率为O(n)。

  如果这样的话，我觉得完全可以把std::map<TKey, IndexIterator>和std::multimap<Timestamp, TKey>合二为一，定义成为std::map<TKey, Timestamp>，replace的操作仍然采用遍历，效率为O(n).
  对于基于时间的策略，O(n)的效率可能不能接受。我觉得可能的解决方法有两种。第一，把Cache变成主动对象，内部定期的收集失效元素，而不由外部触发。这样虽然并没有提高replace操作效率，但把replace操作和外部接口的add等操作分开了。外部调用接口的效率提高了。第二，在内部实现多个map容器，分组管理不同过期时间的对象。

3. 开销

Poco中的Cache类比std::map要慢，其中开销最大的操作为add操作。采用Time Expire策略的Cache要比采用LRU策略的Cache更慢。并且由于Cache类引入了SharePtr和Strategy，其空间花费也要大于std::map。所以在没有必要使用Cache的情况下，还是使用map较好。

4. 例子

下面是Cache的一个示例：

#include "Poco/LRUCache.h"
int main()
{
	Poco::LRUCache<int, std::string> myCache(3);
	myCache.add(1, "Lousy"); // |-1-| -> first elem is the most popular one
	Poco::SharedPtr<std::string> ptrElem = myCache.get(1); // |-1-|
	myCache.add(2, "Morning"); // |-2-1-|
	myCache.add(3, "USA"); // |-3-2-1-|
	// now get rid of the most unpopular entry: "Lousy"
	myCache.add(4, "Good"); // |-4-3-2-|
	poco_assert (*ptrElem == "Lousy"); // content of ptrElem is still valid
	ptrElem = myCache.get(2); // |-2-4-3-|
	// replace the morning entry with evening
	myCache.add(2, "Evening"); // 2 Events: Remove followed by Add
}

posted @ 2013-03-26 11:16 在天与地之间阅读(970) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

在天与地之间

POCO C++库学习和分析 -- Cache

POCO C++库学习和分析 -- Cache

1. Cache概述

2. Cache的内部结构

2.1 Cache类

2.2 Strategy类

3. 开销

4. 例子

公告