C++ Internals: STL之Map

概述

Map是标准关联式容器（associative container）之一，一个map是一个键值对序列，即（key ,value）对。它提供基于key的快速检索能力，在一个map中key值是唯一的。map提供双向迭代器，即有从前往后的（iterator），也有从后往前的（reverse_iterator）。

map要求能对key进行<操作，且保持按key值递增有序，因此map上的迭代器也是递增有序的。如果对于元素并不需要保持有序，可以使用hash_map。

map中key值是唯一的，如果马匹中已存在一个键值对(昵称,密码):("skynet",407574364)，而我们还想插入一个键值对("skynet",472687789)则会报错（不是报错，准确的说是，返回插入不成功！）。而我们又的确想这样做，即一个键对应多个值，幸运的是multimap可是实现这个功能。

下面我们用实例来深入介绍map、multimap，主要内容如下：

1、例子引入
2、map中的类型定义
3、map中的迭代器和键值对
4、map中的构造函数与析构函数
5、map中的操作方法
6、再议map的插入操作
7、[]不仅插入
8、multimap
9、总结

1、例子引入

有一个服务器manager维护着接入服务器的client信息，包括clinetId、scanRate、socketAddr等等。我们定义一个结构体保存scanRate、socketAddr信息。如下：

1
typedef    int    clientId; 

2
typedef struct{ 

3
    int scanRate; 

4
    string socketAddr; 

5
}clientInfo;

我们用map保存这些信息：clientId为键key，clientInfo为值。这样我们可以通过clientId快速检索到client的相关信息，我们可以这样定义：

`1`	`map<clientId,clientInfo> clientMap;`

这样我们定义了一个clientMap，如果我们要定义多个这样的map，需要多次写map<clientId,clientInfo> 变量名。为了避免这样情况，我们通常为map<clientId,clientInfo>定义个别名，如：

`1`	`typedef` `map<clientId,clientInfo> clientEdp;`

`2`	`clientEdp clientMap;`

之后我们就可以像定义clientMap一样定义map<clientId,clientInfo>对象，这样的好处还有：如果我们需要修改map的定义，只需要在一处修改即可，避免修改不彻底造成的不一致现象。

我们这就完成了需要的map的定义，如果不定义或没有在它上面的操作的话，就像定义类而没有方法一样，意义不大或毫无意义。幸运的是，STL提供了这些常用操作：排序（注：map是不能也不要排序的，因为map本身已经排好序了）、打印、提取子部分、移除元素、添加元素、查找对象，就像数据库的增删改查操作！现在我们详细介绍这些操作，并逐步引入hash_map、multimap。

2、map中的类型定义

关联数组（associative array）是最有用的用户定义类型之一，经常内置在语言中用于文本处理等。一个关联数组通常也称为map，有时也称字典（dictionary），保存一对值。第一个值称为key、第二个称为映射值mapped-value。

标准map是定义在std命名空间中的一个模板，并表示为<map>。它首先定义了一组标准类型名字：

01
template<class Key,class T,class Cmp=less<key>, 

02
    class A=allocator<pair<const Key,T>> 

03
class std::map 

04
{ 

05
public: 

06
    //types 

07
    typedef Key    key_type; 

08
    typedef T    mapped_type; 

09
  
10
    typedef pair<const Key,T>    value_type; 

11
  
12
    typedef    Cmp    key_compare; 

13
    typedef A    allocator_type; 

14
  
15
    typedef    typename    A::reference    reference; 

16
    typedef    typename    A::const_reference    const_reference; 

17
  
18
    typedef    implementation_define1    iterator; 

19
    typedef implementation_define2    const_iterator; 

20
  
21
    typedef    typename    A::size_type    size_type; 

22
    typedef    typename    A::difference_type    difference_type; 

23
  
24
    typedef    std::reverse_iterator<iterator>    reverse_iterator; 

25
    typedef    std::reverse_iterator<const_iterator>    const_reverse_iterator; 

26
    //... 

27
}

注意：map的value_type是一个(key,value)对，映射值的被认为是mapped_type。因此，一个map是一个pair<const Key,mapped_type>元素的序列。从const Key可以看出，map中键key是不可修改的。

不得不提的是map定义中Cmp和A都是可选项。Cmp是定义在元素之间的比较方法，默认是<操作；A即allocator用来分配和释放map总键值对所需使用的内存，没有指定的话即默认使用的是STL提供的，也可以自定义allocator来管理内存的使用。多数情况，我们不指定这两个选项而使用默认值，这样我们定义map就像下面这样：

`1`	`map<int,clientInfo> clientMap;`

Cmp和A都缺省。通常，实际的迭代器是实现定义的，因为map很像使用了树的形式，这些迭代器通常提供树遍历的某种形式。逆向迭代器是使用标准的reverse_iterator模板构造的。

3、map中的迭代器和键值对

map提供惯常的返回迭代器的一组函数，如下所示：

01
template<class Key,class T,class Cmp=less<key>, 

02
    class A=allocator<pair<const Key,T>> 

03
class std::map 

04
{ 

05
public:     

06
    //... 

07
    //iterators 

08
    iterator    begin(); 

09
    const_iterator    begin()    const; 

10
  
11
    iterator    end(); 

12
    const_iterator    end()    const; 

13
  
14
    reverse_iterator    rbegin(); 

15
    const_reverse_iterator    rbegin()    const; 

16
  
17
    reverse_iterator    rend(); 

18
    const_reverse_iterator    rend()    const; 

19
    //... 

20
}

map上的迭代器是pair<const Key,mapped_type>元素序列上简单的迭代。例如，我们可能需要打印出所有的客户端信息，像下面的程序这样。为了实现这个，我们首先向《例子引入》中定义的clientEdp中插入数据，然后打印出来：

01
#include<iostream> 

02
#include<map> 

03
#include<string> 

04
using namespace std; 

05
  
06
typedef    int    clientId; 

07
typedef struct{ 

08
    int scanRate; 

09
    string socketAddr; 

10
}clientInfo; 

11
  
12
int main(int argc,char** argv) 

13
{ 

14
    typedef map<clientId,clientInfo> clientEdp; 

15
    typedef map<clientId,clientInfo>::const_iterator iterator; 

16
  
17
    clientEdp clients; 

18
    clientInfo client[100]; 

19
    char str[10]; 

20
    string strAddr("socket addr client "); 

21
  
22
    for(int i=0;i<100;i++) 

23
    { 

24
        client[i].scanRate=i+1;     

25
        //convert int to char* 

26
        itoa(i+1,str,10); 

27
        //concatenate strAddr and str 

28
        client[i].socketAddr=strAddr+str;  

29
        cout<<client[i].socketAddr<<endl; 

30
        clients.insert( 

31
            make_pair(i+1,client[i]));     

32
    } 

33
    delete str; 

34
    for(iterator i=clients.begin();i!=clients.end();i++) 

35
    { 

36
        cout<<"clientId:"<<i->first<<endl; 

37
        cout<<"scanRate:"<<i->second.scanRate<<endl; 

38
        cout<<"socketAddr:"<<i->second.socketAddr<<endl; 

39
        cout<<endl; 

40
    } 

41
}

一个map迭代器以key升序方式表示元素，因此客户端信息以cliendId升序的方式输出。运行结果可以证明这一点，运行结果如下所示：

图1、程序运行结果

我们以first引用键值对的key，以second引用mapped value，且不用管key和mapped value是什么类型。其实pair在std的模板中是这样定义的：

01
template <class    T1,class T2>struct std::pair{ 

02
    typedef    T1    first_type; 

03
    typedef    T2    second_type; 

04
  
05
    T1    first; 

06
    T2    second; 

07
  
08
    pair():first(T1()),second(T2()){} 

09
    pair(const T1& x,const T2& y):first(x),second(y){} 

10
    template<class U,class V> 

11
    pair(const pair<U,V>& p):first(p.first),second(p.second){} 

12
}

即map中，key是键值对的第一个元素且mapped value是第二个元素。pair的定义可以在<utility>中找到，pair提供了一个方法方便创建键值对：

1
template <class T1,class T2>pair<T1,T2> 

2
    std::make_pair(const T1& t1,const T2& t2) 

3
{ 

4
    return pair<T1,T2>(t1,t2); 

5
}

上面的例子中我们就用到了这个方法来创建(clientId,clientInfo)对，并作为Insert()方法的参数。每个pair默认初始化每个元素的值为对应类型的默认值。

4、map中的构造函数与析构函数

map类惯常提供了构造函数和析构函数，如下所示：

01
template<class Key,class T,class Cmp=less<key>, 

02
    class A=allocator<pair<const Key,T>> 

03
class std::map 

04
{ 

05
    //... 

06
    //construct/copy/destroy 

07
    explicit map(const Cmp&=Cmp(),const A&=A()); 

08
    template<class In>map(In first,In last, 

09
        const Com&=Cmp(),const A&=A()); 

10
    map(const map&); 

11
  
12
    ~map(); 

13
  
14
    map& operator=(const map&); 

15
    //... 

16
}

复制一个容器意味着为它的每个元素分配空间，并拷贝每个元素值。这样做是性能开销是很大的，应该仅当需要的时候才这样做。因此，map传的是引用。

5、map中的操作方法

前面我们已经说过，如果map中仅定义了一些key、mapped value类型的信息而没有操作方法，就如定义个仅有字段的类意义不大甚至毫无意义。由此可见map中定义操作方法非常重要！前面的例子我们就用到了不少方法，如返回迭代器的方法begin()、end()，键值对插入方法insert()。下面我们对map中的操作方法做个全面的介绍：

01
template<class Key,class T,class Cmp=less<key>, 

02
    class A=allocator<pair<const Key,T>> 

03
class std::map 

04
{ 

05
    //... 

06
    //map operations 

07
  
08
    //find element with key k 

09
    iterator find(const key_type& k); 

10
    const_iterator find(const key_type& k) const; 

11
  
12
    //find number of elements with key k 

13
    size_type count() const; 

14
  
15
    //find first element with key k 

16
    iterator lower_bound(const key_type& k); 

17
    const_iterator lower_bound(const key_type& k) const; 

18
  
19
    //find first element with key greater than k 

20
    iterator upper_bound(const key_type& k); 

21
    const_iterator upper_bound(const key_type& k) const; 

22
  
23
    //insert pair(key,value) 

24
    pair<iterator,bool>insert(const value_type& val); 

25
    iterator insert(iterator pos,const value_type& val); 

26
    template<class In>void insert(In first,In last); 

27
  
28
    //erase element 

29
    void erase(iterator pos); 

30
    size_type erase(const key_type& k); 

31
    void erase(iterator first,iterator last); 

32
    void clear(); 

33
  
34
    //number os elements 

35
    size_type size() const; 

36
  
37
    //size of largest possible map 

38
    size_type max_size() const; 

39
  
40
    bool empty() const{return size()==0;} 

41
  
42
    void swap(map&); 

43
    //... 

44
}

上面这些方法基本都能顾名思义（PS.由此可见，命名有多重要，我们平时要养成好的命名习惯，当然注释也必不可少！）。虽然已经非常清楚了了，但我还是想讲解一下以消除不惜要的误解和更好地应用这些方法。

find(k)方法简单地返回键值为k的元素的迭代器；如果没有元素的键值为k，则返回map的end()迭代器。由于map是按键key升序排列，所有查找的复杂度只有O(logN)。因此，我们通常会这样用这个方法：

view source

print ?

01 #include<iostream>

02 #include<map>

03 #include<string>

04 using namespace std;

05

06 typedef int clientId;

07 typedef struct{

08 int scanRate;

09 string socketAddr;

10 }clientInfo;

11

12 int main(int argc,char** argv)

13 {

14 typedef map<clientId,clientInfo> clientEdp;

15 typedef map<clientId,clientInfo>::const_iterator iterator;

16

17 clientEdp clients;

18 clientInfo client[100];

19 char* str=new char[10];

20 string strAddr("socket addr client ");

21

22 for(int i=0;i<100;i++)

23 {

24 client[i].scanRate=i+1;

25 //convert int to char*

26 itoa(i+1,str,10);

27 //concatenate strAddr and str

28 client[i].socketAddr=strAddr+str;

29 clients.insert(

30 make_pair(i+1,client[i]));

31 }

32 delete str;

33  clientId id=10;

34 iterator i=clients.find(id);

35 if(i!=clients.end()){

36 cout<<"clientId: "<<id

37 <<" exists in clients"<<endl;

38 }

39 else{

40 cout<<"clientId: "<<id

41 <<" doesn't exist in clients"<<endl;

42 }

43 }
insert()方法试图将一个（Key,T）键值对加入map。因为键时唯一的，所以仅当map中不存在键值为k的键值对时插入才成功。该方法的返回值为pair<iterator,bool>，如果插入成功bool值为TRUE，iterator指向插入map中后的键值对。如下代码：

view source

print ?

01 #include<iostream>

02 #include<map>

03 #include<string>

04 using namespace std;

05

06 typedef int clientId;

07 typedef struct{

08 int scanRate;

09 string socketAddr;

10 }clientInfo;

11

12 int main(int argc,char** argv)

13 {

14 typedef map<clientId,clientInfo> clientEdp;

15 typedef map<clientId,clientInfo>::const_iterator iterator;

16

17 clientEdp clients;

18

19 clientId id=110;

20 clientInfo cltInfo;

21 cltInfo.scanRate=10;

22 cltInfo.socketAddr="110";

23 pair<clientId,clientInfo> p110(id,cltInfo);

24 pair<iterator,bool> p=clients.insert(p110);

25 if(p.second){

26 cout<<"insert success!"<<endl;

27 }

28 else{

29 cout<<"insert failed!"<<endl;

30 }

31 //i points to clients[110];

32 iterator i=p.first;

33 cout<<i->first<<endl;

34 cout<<i->second.scanRate<<endl;

35 cout<<i->second.socketAddr<<endl;

36 }

上面我们看出，这里我们插入键值对是首先声明一个键值对pair<clientId,clientInfo> p110(id,cltInfo); 然后再插入，这个我们之前make_pair方法不一样，make_pair方法用的比较多。

erase()方法用法比较简单，比如像清除clientId为110的键值对，我们只需要对clients调用erase方法：clients.erase(clients.find(110));或者我们想清除clientId从1到10的键值对，我们可以这样调用erase()方法：clients.erase(clients.finds(1),clients.find(10));简单吧！别得意，你还需要注意，如果find(k)返回的是end()，这样调用erase()方法则是一个严重的错误，会对map造成破坏操作。

6、再议map的插入操作

前面我们介绍了利用map的插入方法insert()，声明键值对pair或make_pair生成键值对然后我们可以轻松的将键值对插入map中。其实map还提供了更方便的插入操作利用下标（subscripting，[]）操作，如下：

1
clientInfo cltInfo; 

2
cltInfo.scanRate=10; 

3
cltInfo.socketAddr="110"; 

4
  
5
<B>clients[110]=cltInfo;</B>

这样我们就可以简单地将键值对插入到map中了。下标操作在map中式这样定义的：

1
template<class Key,class T,class Cmp=less<key>, 

2
    class A=allocator<pair<const Key,T>> 

3
class std::map 

4
{ 

5
    //... 

6
    //access element with key k 

7
    mapped_type& operator[](const key_type& k); 

8
    //... 

9
}

我们来分析一下应用[]操作，插入键值对的过程：检查键k是否已经在map里。如果不，就添加上，以v作为它的对应值。如果k已经在map里，它的关联值被更新成v。这里首先，查找110不在map中则创建一个键为110的键值对，并将映射值设为默认值，这里scanRate为0，socketAddr为空；然后将映射值赋为cltInfo。如果110在map中已经存在的话，则只是更新以110为键的映射值。

从上面的分析可知：如果大量这样插入数据，会严重影响效率！如果你考虑效率问题，请使用insert操作。insert方法，节省了三次函数调用：一个建立临时的默认映射值的对象，一个销毁那个临时的对象和一个对映射值的赋值操作。

Note1：如果k已经存在map中，[]效率反而比insert的效率高，而且更美观！如果能够兼顾这两者那岂不是很美妙！其实我们重写map中的[]操作：首先判断k是否已经在map中，如果没有则调用insert操作，否则调用内置的[]操作。如下列代码：

01
////////////////////////////////////////////// 

02
///@param MapType-map的类型参数 

03
///@param KeyArgType-键的类型参数 

04
///@param ValueArgtype-映射值的类型参数 

05
///@return 迭代器，指向键为k的键值对 

06
////////////////////////////////////////////// 

07
template<typename MapType, 

08
        typename KeyArgType, 

09
        typename ValueArgtype> 

10
typename MapType::iterator 

11
    efficientAddOrUpdate(MapType& m, 

12
            const KeyArgType& k, 

13
            const ValueArgtype& v) 

14
{ 

15
    typename MapType::iterator Ib =    m.lower_bound(k); 

16
    if(Ib != m.end()&&!(m.key_comp()(k,Ib->first))) { 

17
        //key已经存在于map中做更新操作 

18
        Ib->second = v;     

19
        return Ib;             

20
    } 

21
    else{ 

22
        //key不存在map中做插入操作 

23
        typedef typename MapType::value_type MVT; 

24
        return m.insert(Ib, MVT(k, v));     

25
    }                     

26
}

Note2：我们视乎还忽略了一点，如果映射值mapped value的类型没有默认值，怎么办？这种情况请勿使用[]操作插入。

7、[]不仅插入

通过[]操作不仅仅是插入键值对，我们也可以通过键key检索出映射值mapped value。而且我们利用[]操作可以轻松地统计信息，如有这样这样一些键值对（book-name，count）对：

(book1,1)、(book2,2)、(book1,2)、(book3,1)、(book3,5)

我们计算每种book的数量总和。我们可以这样做：将它们读入一个map<string,int>：

01
#include<iostream> 

02
#include<map> 

03
#include<string> 

04
using namespace std; 

05
  
06
int main(int argc,char** argv) 

07
{ 

08
    map<string,int> bookMap; 

09
  
10
    string book; 

11
    int count; 

12
    int total=0; 

13
    while(cin>>book>>count) 

14
        bookMap[book]+=count; 

15
  
16
    map<string,int>::iterator i; 

17
    for(i=bookMap.begin();i!=bookMap.end();i++) 

18
    { 

19
        total+=i->second; 

20
        cout<<i->first<<'\t'<<i->second<<endl; 

21
    } 

22
    cout<<"total count:"<<total<<endl; 

23
}

结果如下所示：（注意按住ctrl+z键结束输入）

图2、程序运行结果

8、multimap

前面介绍了map，可以说已经非常清晰了。如果允许clientId重复的话，map就无能为力了，这时候就得multimap上场了！multimap允许键key重复，即一个键对应多个映射值。其实除此之外，multimap跟map是很像的，我们接下来在map的基础上介绍multimap。

multimap在std中的定义跟map一样只是类名为multimap，multimap几乎有map的所有方法和类型定义。

multimap不支持[]操作；但map支持
multimap的insert方法返回的是一个迭代器iterator，没有bool值；而map值（iterator，bool）的元素对

对应equal_range()、方法：

pair<iterator,iterator> equal_range(const key_type& k);pair<const_iterator,const_iterator>equal_range(const key_type& k) const;//find first element with key kiterator lower_bound(const key_type& k);const_iterator lower_bound(const key_type& k) const;//find first element with key greater than kiterator upper_bound(const key_type& k);const_iterator upper_bound(const key_type& k) const;

虽然在map和multimap都有，显然对multimap有更多的意义！equal_range()方法返回一个键key对应的多个映射值的上界和下界的键值对的迭代器、lower_bound()方法返回键multimap中第一个箭为key的键值对迭代器、upper_bound()方法返回比key大的第一个键值对迭代器。

假设我们想取出键为key的所有映射值，我们可以这样做：

01
#include<iostream> 

02
#include<map> 

03
#include<string> 

04
using namespace std; 

05
  
06
typedef int clientId; 

07
typedef struct{ 

08
    int scanRate; 

09
    string socketAddr; 

10
}clientInfo; 

11
  
12
int main(int argc,char** argv) 

13
{ 

14
    typedef multimap<clientId,clientInfo> clientEdp; 

15
    typedef multimap<clientId,clientInfo>::const_iterator iterator; 

16
      
17
    clientEdp clients; 

18
    clientInfo client[20]; 

19
    char* str=new char[10]; 

20
    string strAddr("socket addr client "); 

21
  
22
    for(int i=0;i<10;i++) 

23
    { 

24
        client[i].scanRate=i+1;     

25
        //convert int to char* 

26
        itoa(i+1,str,10); 

27
        //concatenate strAddr and str 

28
        client[i].socketAddr=strAddr+str; 

29
        clients.insert( 

30
            make_pair(10,client[i]));     

31
    } 

32
    for(int i=10;i<20;i++) 

33
    { 

34
        client[i].scanRate=i+1;     

35
        //convert int to char* 

36
        itoa(i+1,str,10); 

37
        //concatenate strAddr and str 

38
        client[i].socketAddr=strAddr+str; 

39
        clients.insert( 

40
            make_pair(i+1,client[i]));     

41
    } 

42
    delete str,strAddr; 

43
      
44
<B>    //find elements with key 10 

45
    iterator lb=clients.lower_bound(10); 

46
    iterator ub=clients.upper_bound(10);</B> 

47
    for(iterator i=lb;i!=ub;i++) 

48
    { 

49
        cout<<"clientId:"<<i->first<<endl; 

50
        cout<<"scanRate:"<<i->second.scanRate<<endl; 

51
        cout<<"socketAddr:"<<i->second.socketAddr<<endl; 

52
        cout<<endl; 

53
    } 

54
}

（说明：实际上，一般是不允许clientId重复的，这里只是为了举例。）这样是不是感觉很丑呢！事实上，我们可以更简单的这样：

1
//find elements with key 10 

2
<B>pair<iterator,iterator> p=clients.equal_range(10);</B> 

3
for(iterator i=p.first;i!=p.second;i++) 

4
{ 

5
    cout<<"clientId:"<<i->first<<endl; 

6
    cout<<"scanRate:"<<i->second.scanRate<<endl; 

7
    cout<<"socketAddr:"<<i->second.socketAddr<<endl; 

8
    cout<<endl; 

9
}

总结

map是一类关联式容器。它的特点是增加和删除节点对迭代器的影响很小，除了那个操作节点，对其他的节点都没有什么影响。对于迭代器来说，可以修改实值，而不能修改key。

map的功能：

自动建立Key －value的对应。key 和value可以是任意你需要的类型。
根据key值快速查找记录，查找的复杂度基本是Log(N)。
快速插入Key - Value 记录。
快速删除记录
根据Key 修改value记录。
遍历所有记录。

展望：本文不知不觉写了不少字了，但仍未深入涉及到map定义的第3个和第4个参数，使用的都是默认值。

template<class Key,class T,class Cmp=less<key>,
class A=allocator<pair<const Key,T>>

感兴趣者，请查找相关资料or下面留言希望看到单独开篇介绍map第3个和第4个参数。您的支持，我的动力！PS：在此文的原因，在与公司做项目用到了map特此总结出来与大家共享，不过在进行个人总结过程中，难免会有疏漏或不当之处，请不吝指出。

参考文献：

【1】《The C++ Programming Language (Special Edition)》

【2】《Effective STL》

posted @ 2010-07-05 22:01 bjf 阅读(721) 评论(0) 编辑收藏举报

刷新页面返回顶部

阳光