关于利用STL实现哈希的问题

下面是map,set,unordered_map,unordered_set的性能分析。

map,内部红黑树,插入复杂度O(logn),查找复杂度O(logn),用键值对应value;

set,内部红黑树,插入复杂度O(logn),查找复杂度O(logn),只有value不存在键值;

unordered_map,内部哈希表,插入复杂度O(1),查找复杂度O(1),但是如果被卡,会很惨,因为最坏情况下会变成链表,两个复杂度都变成O(n);

unordered_set,内部哈希表,插入复杂度O(1),查找复杂度O(1),目前不知道怎么卡,但是有理由相信也可以被卡成O(n);

下面解释下被卡的原因。我们始终假设每个操作(插入,擦除,访问等)的哈希映射为O(1)。但这取决于一个关键的假设,即每个插入或查询平均只会发生O(1)次碰撞。可是当有人在对抗性地为我们的代码设计输入时(又名hack阶段),特别是,如果他们知道我们的哈希函数,则可以轻松地生成大量不同的输入,这些输入全部发生冲突,从而导致O(n^2)爆炸。

我们可以在GitHub上探究gcc的实现:https : //github.com/gcc-mirror/gcc。经过一番搜索后,我们遇到了unordered_map.h。在文件内部,我们可以快速看到unordered_map使用__detail::_Mod_range_hashing和__detail::_Prime_rehash_policy。由此我们可以猜测,映射首先对输入值进行哈希处理,然后对模数进行质数运算,然后将结果用作哈希表中的适当位置。

进一步的搜索_Prime_rehash_policy使我们找到hashtable_c ++ 0x.cc。在这里,我们可以看到有一个名为__prime_list的数组,并且哈希表有一个在大小过大时会自动调整大小的策略。因此,我们只需要查找此素数列表。

此文件上的一个包含将我们引至hashtable-aux.cc。啊哈,这是我们要寻找的清单。

// std::__detail and std::tr1::__detail definitions -*- C++ -*-

// Copyright (C) 2007-2018 Free Software Foundation, Inc.
//
// This file is part of the GNU ISO C++ Library.  This library is free
// software; you can redistribute it and/or modify it under the
// terms of the GNU General Public License as published by the
// Free Software Foundation; either version 3, or (at your option)
// any later version.

// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.

// Under Section 7 of GPL version 3, you are granted additional
// permissions described in the GCC Runtime Library Exception, version
// 3.1, as published by the Free Software Foundation.

// You should have received a copy of the GNU General Public License and
// a copy of the GCC Runtime Library Exception along with this program;
// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
// <http://www.gnu.org/licenses/>.

namespace __detail
{
  // The sentinel value is kept only for abi backward compatibility.
  extern const unsigned long __prime_list[] = // 256 + 1 or 256 + 48 + 1
  {
    2ul, 3ul, 5ul, 7ul, 11ul, 13ul, 17ul, 19ul, 23ul, 29ul, 31ul,
    37ul, 41ul, 43ul, 47ul, 53ul, 59ul, 61ul, 67ul, 71ul, 73ul, 79ul,
    83ul, 89ul, 97ul, 103ul, 109ul, 113ul, 127ul, 137ul, 139ul, 149ul,
    157ul, 167ul, 179ul, 193ul, 199ul, 211ul, 227ul, 241ul, 257ul,
    277ul, 293ul, 313ul, 337ul, 359ul, 383ul, 409ul, 439ul, 467ul,
    503ul, 541ul, 577ul, 619ul, 661ul, 709ul, 761ul, 823ul, 887ul,
    953ul, 1031ul, 1109ul, 1193ul, 1289ul, 1381ul, 1493ul, 1613ul,
    1741ul, 1879ul, 2029ul, 2179ul, 2357ul, 2549ul, 2753ul, 2971ul,
    3209ul, 3469ul, 3739ul, 4027ul, 4349ul, 4703ul, 5087ul, 5503ul,
    5953ul, 6427ul, 6949ul, 7517ul, 8123ul, 8783ul, 9497ul, 10273ul,
    11113ul, 12011ul, 12983ul, 14033ul, 15173ul, 16411ul, 17749ul,
    19183ul, 20753ul, 22447ul, 24281ul, 26267ul, 28411ul, 30727ul,
    33223ul, 35933ul, 38873ul, 42043ul, 45481ul, 49201ul, 53201ul,
    57557ul, 62233ul, 67307ul, 72817ul, 78779ul, 85229ul, 92203ul,
    99733ul, 107897ul, 116731ul, 126271ul, 136607ul, 147793ul,
    159871ul, 172933ul, 187091ul, 202409ul, 218971ul, 236897ul,
    256279ul, 277261ul, 299951ul, 324503ul, 351061ul, 379787ul,
    410857ul, 444487ul, 480881ul, 520241ul, 562841ul, 608903ul,
    658753ul, 712697ul, 771049ul, 834181ul, 902483ul, 976369ul,
    1056323ul, 1142821ul, 1236397ul, 1337629ul, 1447153ul, 1565659ul,
    1693859ul, 1832561ul, 1982627ul, 2144977ul, 2320627ul, 2510653ul,
    2716249ul, 2938679ul, 3179303ul, 3439651ul, 3721303ul, 4026031ul,
    4355707ul, 4712381ul, 5098259ul, 5515729ul, 5967347ul, 6456007ul,
    6984629ul, 7556579ul, 8175383ul, 8844859ul, 9569143ul, 10352717ul,
    11200489ul, 12117689ul, 13109983ul, 14183539ul, 15345007ul,
    16601593ul, 17961079ul, 19431899ul, 21023161ul, 22744717ul,
    24607243ul, 26622317ul, 28802401ul, 31160981ul, 33712729ul,
    36473443ul, 39460231ul, 42691603ul, 46187573ul, 49969847ul,
    54061849ul, 58488943ul, 63278561ul, 68460391ul, 74066549ul,
    80131819ul, 86693767ul, 93793069ul, 101473717ul, 109783337ul,
    118773397ul, 128499677ul, 139022417ul, 150406843ul, 162723577ul,
    176048909ul, 190465427ul, 206062531ul, 222936881ul, 241193053ul,
    260944219ul, 282312799ul, 305431229ul, 330442829ul, 357502601ul,
    386778277ul, 418451333ul, 452718089ul, 489790921ul, 529899637ul,
    573292817ul, 620239453ul, 671030513ul, 725980837ul, 785430967ul,
    849749479ul, 919334987ul, 994618837ul, 1076067617ul, 1164186217ul,
    1259520799ul, 1362662261ul, 1474249943ul, 1594975441ul, 1725587117ul,
    1866894511ul, 2019773507ul, 2185171673ul, 2364114217ul, 2557710269ul,
    2767159799ul, 2993761039ul, 3238918481ul, 3504151727ul, 3791104843ul,
    4101556399ul, 4294967291ul,
    // Sentinel, so we don't have to test the result of lower_bound,
    // or, on 64-bit machines, rest of the table.
#if __SIZEOF_LONG__ != 8
    4294967291ul
#else
    6442450933ul, 8589934583ul, 12884901857ul, 17179869143ul,
    25769803693ul, 34359738337ul, 51539607367ul, 68719476731ul,
    103079215087ul, 137438953447ul, 206158430123ul, 274877906899ul,
    412316860387ul, 549755813881ul, 824633720731ul, 1099511627689ul,
    1649267441579ul, 2199023255531ul, 3298534883309ul, 4398046511093ul,
    6597069766607ul, 8796093022151ul, 13194139533241ul, 17592186044399ul,
    26388279066581ul, 35184372088777ul, 52776558133177ul, 70368744177643ul,
    105553116266399ul, 140737488355213ul, 211106232532861ul, 281474976710597ul,
    562949953421231ul, 1125899906842597ul, 2251799813685119ul,
    4503599627370449ul, 9007199254740881ul, 18014398509481951ul,
    36028797018963913ul, 72057594037927931ul, 144115188075855859ul,
    288230376151711717ul, 576460752303423433ul,
    1152921504606846883ul, 2305843009213693951ul,
    4611686018427387847ul, 9223372036854775783ul,
    18446744073709551557ul, 18446744073709551557ul
#endif
  };
} // namespace __detail

不相信的话可以试着运行如下代码:

 1 #include <ctime>
 2 #include <iostream>
 3 #include <unordered_map>
 4 using namespace std;
 5 const int N = 6e4+3e3;
 6 void insert_numbers(long long x){
 7     clock_t begin = clock();
 8     unordered_map<long long, int> numbers;
 9 
10     for (int i = 1; i <= N; i++)
11         numbers[i * x] = i;
12 
13     long long sum = 0;
14 
15     for (auto &entry : numbers)
16         sum += (entry.first / x) * entry.second;
17 
18     printf("x = %lld: %.3lf seconds, sum = %lld\n", x, (double) (clock() - begin) / CLOCKS_PER_SEC, sum);
19 }
20 
21 int main() {
22     //insert_numbers(107897);
23     insert_numbers(126271);
24 }

关于这些数的链接,可以参见https://github.com/gcc-mirror/gcc/blob/5bea0e90e58d971cf3e67f784a116d81a20b927a/libstdc%2B%2B-v3/src/shared/hashtable-aux.cc里的prime_list;

凌晨被cf卡B,现在心情复杂。

 

posted @ 2019-10-02 02:15  Lovaer  阅读(595)  评论(0编辑  收藏  举报