jemalloc管理块(arena、bin)
arena是jemalloc的总的管理块,一个进程中可以有多个arena,arena的最大个可以通过静态变量narenas_auto,。
可通过静态数组arenas获取进程中所有arena的指针:
(gdb) p narenas_auto $359 = 2 (gdb) p *je_arenas@2 $360 = {0x7f93e02200, 0x7f93f12280}
可知,目前进程中arena的最大个数是2,它们的指针分别为0x7f93e02200,0x7f93f12280。
arena的声明如下:
typedef struct arena_s arena_t; struct arena_s { /* This arena's index within the arenas array. */ unsigned ind; /* * Number of threads currently assigned to this arena. This field is * synchronized via atomic operations. */ unsigned nthreads; /* * There are three classes of arena operations from a locking * perspective: * 1) Thread assignment (modifies nthreads) is synchronized via atomics. * 2) Bin-related operations are protected by bin locks. * 3) Chunk- and run-related operations are protected by this mutex. */ malloc_mutex_t lock; arena_stats_t stats; /* * List of tcaches for extant threads associated with this arena. * Stats from these are merged incrementally, and at exit if * opt_stats_print is enabled. */ ql_head(tcache_t) tcache_ql; uint64_t prof_accumbytes; /* * PRNG state for cache index randomization of large allocation base * pointers. */ uint64_t offset_state; dss_prec_t dss_prec; /* * In order to avoid rapid chunk allocation/deallocation when an arena * oscillates right on the cusp of needing a new chunk, cache the most * recently freed chunk. The spare is left in the arena's chunk trees * until it is deleted. * * There is one spare chunk per arena, rather than one spare total, in * order to avoid interactions between multiple threads that could make * a single spare inadequate. */ arena_chunk_t *spare; /* Minimum ratio (log base 2) of nactive:ndirty. */ ssize_t lg_dirty_mult; /* True if a thread is currently executing arena_purge_to_limit(). */ bool purging; /* Number of pages in active runs and huge regions. */ size_t nactive; /* * Current count of pages within unused runs that are potentially * dirty, and for which madvise(... MADV_DONTNEED) has not been called. * By tracking this, we can institute a limit on how much dirty unused * memory is mapped for each arena. */ size_t ndirty; /* * Unused dirty memory this arena manages. Dirty memory is conceptually * tracked as an arbitrarily interleaved LRU of dirty runs and cached * chunks, but the list linkage is actually semi-duplicated in order to * avoid extra arena_chunk_map_misc_t space overhead. * * LRU-----------------------------------------------------------MRU * * /-- arena ---\ * | | * | | * |------------| /- chunk -\ * ...->|chunks_cache|<--------------------------->| /----\ |<--... * |------------| | |node| | * | | | | | | * | | /- run -\ /- run -\ | | | | * | | | | | | | | | | * | | | | | | | | | | * |------------| |-------| |-------| | |----| | * ...->|runs_dirty |<-->|rd |<-->|rd |<---->|rd |<----... * |------------| |-------| |-------| | |----| | * | | | | | | | | | | * | | | | | | | \----/ | * | | \-------/ \-------/ | | * | | | | * | | | | * \------------/ \---------/ */ arena_runs_dirty_link_t runs_dirty; extent_node_t chunks_cache; /* * Approximate time in seconds from the creation of a set of unused * dirty pages until an equivalent set of unused dirty pages is purged * and/or reused. */ ssize_t decay_time; /* decay_time / SMOOTHSTEP_NSTEPS. */ nstime_t decay_interval; /* * Time at which the current decay interval logically started. We do * not actually advance to a new epoch until sometime after it starts * because of scheduling and computation delays, and it is even possible * to completely skip epochs. In all cases, during epoch advancement we * merge all relevant activity into the most recently recorded epoch. */ nstime_t decay_epoch; /* decay_deadline randomness generator. */ uint64_t decay_jitter_state; /* * Deadline for current epoch. This is the sum of decay_interval and * per epoch jitter which is a uniform random variable in * [0..decay_interval). Epochs always advance by precise multiples of * decay_interval, but we randomize the deadline to reduce the * likelihood of arenas purging in lockstep. */ nstime_t decay_deadline; /* * Number of dirty pages at beginning of current epoch. During epoch * advancement we use the delta between decay_ndirty and ndirty to * determine how many dirty pages, if any, were generated, and record * the result in decay_backlog. */ size_t decay_ndirty; /* * Memoized result of arena_decay_backlog_npages_limit() corresponding * to the current contents of decay_backlog, i.e. the limit on how many * pages are allowed to exist for the decay epochs. */ size_t decay_backlog_npages_limit; /* * Trailing log of how many unused dirty pages were generated during * each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last * element is the most recent epoch. Corresponding epoch times are * relative to decay_epoch. */ size_t decay_backlog[SMOOTHSTEP_NSTEPS]; /* Extant huge allocations. */ ql_head(extent_node_t) huge; /* Synchronizes all huge allocation/update/deallocation. */ malloc_mutex_t huge_mtx; /* * Trees of chunks that were previously allocated (trees differ only in * node ordering). These are used when allocating chunks, in an attempt * to re-use address space. Depending on function, different tree * orderings are needed, which is why there are two trees with the same * contents. */ extent_tree_t chunks_szad_cached; extent_tree_t chunks_ad_cached; extent_tree_t chunks_szad_retained; extent_tree_t chunks_ad_retained; malloc_mutex_t chunks_mtx; /* Cache of nodes that were allocated via base_alloc(). */ ql_head(extent_node_t) node_cache; malloc_mutex_t node_cache_mtx; /* User-configurable chunk hook functions. */ chunk_hooks_t chunk_hooks; /* bins is used to store trees of free regions. */ arena_bin_t bins[NBINS]; /* * Quantized address-ordered trees of this arena's available runs. The * trees are used for first-best-fit run allocation. */ arena_run_tree_t runs_avail[1]; /* Dynamically sized. */ };
其他成员暂时不关注,这里我们先讨论bins这个arena_bin_t数组,数组大小是36,对应36种大小的region和run。
binind表示bins中的偏移,每一个binind对应一个固定大小的region。其对应关系为:
usize = index2size(binind); size_t index2size(szind_t index) { return (index2size_lookup(index)); } size_t index2size_lookup(szind_t index) { size_t ret = (size_t)index2size_tab[index]; return (ret); }
其中index2size_tab是静态变量,用于表示index和size的对应关系。
通过binid,我们还可以通过静态变量arena_bin_info获取对应bin的其他信息:
(gdb) p je_arena_bin_info[5] $375 = { reg_size = 80, redzone_size = 0, reg_interval = 80, run_size = 20480, nregs = 256, bitmap_info = { nbits = 256, ngroups = 4 }, reg0_offset = 0 }
从bin_info中可以看出这个index为5的bin,它对应的region大小(region_size)为80,run大小(run_size)为20K,region个数(nregs)为256,region需要4个bitmap来表示(ngroups)。
arena_bin_t的声明如下:
typedef struct arena_bin_s arena_bin_t; struct arena_bin_s { /* * All operations on runcur, runs, and stats require that lock be * locked. Run allocation/deallocation are protected by the arena lock, * which may be acquired while holding one or more bin locks, but not * vise versa. */ malloc_mutex_t lock; /* * Current run being used to service allocations of this bin's size * class. */ arena_run_t *runcur; /* * Tree of non-full runs. This tree is used when looking for an * existing run when runcur is no longer usable. We choose the * non-full run that is lowest in memory; this policy tends to keep * objects packed well, and it can also help reduce the number of * almost-empty chunks. */ arena_run_tree_t runs; /* Bin statistics. */ malloc_bin_stats_t stats; };
runcur是当前可用于分配的run,
runs是个红黑树,链接当前arena中,所有可用的相同大小region对应的run,如果runcur已满,就会从runs里找可用的run。
stats是当前bin对应的run和region的状态信息。
我们看一下实际运行过程中,bins中的index为5的一个arena_bin_t:
(gdb) p (*je_arenas[0])->bins[5] $365 = { lock = { lock = { __private = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0} } }, runcur = 0x7f68408ad0, runs = { rbt_root = 0x7f78e06e38 }, stats = { nmalloc = 236529, ndalloc = 229379, nrequests = 1181919, curregs = 7150, nfills = 60225, nflushes = 42510, nruns = 64, reruns = 5402, curruns = 31 } }
其中runcur值为:
(gdb) p /x *(*je_arenas[0])->bins[5].runcur $373 = { binind = 0x5, nfree = 0x87, bitmap = {0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd, 0x0, 0x0, 0x0, 0x0} }
其中,nfree表示的是当前run中空闲的region个数。
申请内存时的相关代码如下:
static void * arena_malloc_small(tsd_t *tsd, arena_t *arena, szind_t binind, bool zero) { void *ret; arena_bin_t *bin; size_t usize; arena_run_t *run; assert(binind < NBINS); bin = &arena->bins[binind]; usize = index2size(binind); malloc_mutex_lock(&bin->lock); if ((run = bin->runcur) != NULL && run->nfree > 0) ret = arena_run_reg_alloc(run, &arena_bin_info[binind]); else ret = arena_bin_malloc_hard(arena, bin);
bitmap中的每一bit位表示对应region的空闲状态,0表示已使用,1表示空闲。
由于arena_bin_info[5]中的ngroup的值为4,binind为5的run,它的bitmap数组中的前4个bitmap是有效的。
即上面runcur对应的bitmap为:
0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd,
转成二进制:
0000000000000000000000000000000000000000000000000000000000000000 1110111111110111001000000000000000000000000000000000000000000000 1111111111111101111011111111111111111111111101111111110111110111 1111111111111111111111111111111111111101111011111111111111111101
可见,bitmap中一共有135个1,也就是说free的region有135个,即0x87个。
申请内存的相关代码如下:
JEMALLOC_INLINE_C void * arena_run_reg_alloc(arena_run_t *run, arena_bin_info_t *bin_info) { void *ret; size_t regind; arena_chunk_map_misc_t *miscelm; void *rpages; regind = (unsigned)bitmap_sfu(run->bitmap, &bin_info->bitmap_info); miscelm = arena_run_to_miscelm(run); rpages = arena_miscelm_to_rpages(miscelm); ret = (void *)((uintptr_t)rpages + (uintptr_t)bin_info->reg0_offset + (uintptr_t)(bin_info->reg_interval * regind)); run->nfree--; return (ret); }
其中bitmap_sfu()返回bitmap中第一个1的位置,并且将该位置0。
接下来就是通过run找对应的miscelm,再通过miscelm找到run对应的page,它的起始位置rpages。
rpages的值+regind*reg_interval(同reg_size),就能得出这个空闲的region的实际地址了。
最后再将run的nfree减一,整个内存申请过程就结束了。
中间红色部分内容需要了解chunk header相关内容。