glibc2.27源码分析（0.5版本）

分析工具：understand源码阅读器
源码来源：Ubuntu18.04执行sudo apt source glibc

先去到1060行发现malloc_chunk结构体，这里定义了堆的结构体

struct malloc_chunk {

  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

这里放出源码中对这个结构体细节的部分描述：

被free的chunk的大小分别被存储在chunk头和尾，这使得碎片化的chunk合并为大的chunk非常快，size区域还包括一个比特用来表示上一个堆块是否被使用，并且malloc返回的指针是指向堆块内容的，而fd和bk指针则是指向chunk头的
free的chunk被存储在循环双向链表中，fd指向下一个chunk，bk指向前一个chunk
三个例外：
1. top chunk不包含trailing size也就是下一个chunk的prev size，因为其被分配在堆区域最高的地址，堆是由低到高增长的，因此，topchunk不含有nextchunk
2. M位被用来标识是否使用的是mmap进行分配
3. fastbin的P位始终为1，防止合并，fastbin只有在 bulk 或者 malloc_consolidate函数时才会发生堆合并

用一张图来解释这个结构体
![[a.drawio.png]]

[!NOTE] 注：
无论是哪一种chunk，都必须要16字节对齐
GDB调试时使用heap打印出来的大小是size的内容，也就是real_size+0x10+0x1

看完malloc_chunk结构体，我们接着来看分配区的概念。源码1674行

struct malloc_state
{
  /* Serialize access.  */
  __libc_lock_define (, mutex);

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

高效的内存分配回收机制是大多数语言所追求的目标，包括glibc。不同于其他内存分配器，glibc使用的是ptmalloc2进行堆内存管理，而ptmalloc2为了处理多线程程序，引入了arena(分区)这个名词，程序只含有一个主分区，但可以有多个子分区，主分区可以访问heap区域和mmap区域，而子分区则只能访问mmap区域。
glibc malloc底层通过brk和mmap系统调用来实现内存分配

main_arena
对于main_arena其由sbrk函数创建，并且其malloc_state结构体存储在进程链接的libc.so的数据段。大小可扩展，存储在进程的heap区域
thread_arena
对于thread_arena则是由mmap函数创建，其分区信息由malloc_state和heap_info两个结构体存储。thread_arena的分区信息存储在堆块的头部。其大小不可扩展，用完可以重新申请一个

ptmalloc2使用malloc_state结构体管理分区，每个分区都是一个malloc_state结构体的实例，它们各司其职，分别管理着它们所负责的chunk和bin。

我们来分析malloc_state结构体中几个重要的变量

fastbinsY[NFASTBINS]，用来存储fastbin单链表的头指针的数组
bins[NBINS * 2 - 2]，用来存储unsortedbin，largebin，smallbin的链表表头的数组
top，指向topchunk的指针
last_remainder，指向一个chunk被切割后剩余的部分的指针

再补充一个结构体heap_info

typedef struct _heap_info
{
  mstate ar_ptr; /* Arena for this heap. */
  struct _heap_info *prev; /* Previous heap. */
  size_t size;   /* Current size in bytes. */
  size_t mprotect_size; /* Size in bytes that has been mprotected
                           PROT_READ|PROT_WRITE.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
     MALLOC_ALIGNMENT. */
  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

其主要用于子线程，因为主线程只能有一个堆，只有子线程可以使用mmap创建多个堆，每个堆通过prev指针来连接子线程的其他堆

了解了这几个变量，我们再来看较为重要的几个bins的种类，以及它们的特点

[!NOTE] Tips：
只有smallbin中的chunk，prev_inuse被使用，其他所有类型的bins中的chunk的P位始终为1

fastbin

这里引用一张图，来解释fastbin的存储
![[未命名绘图.drawio (2).png]]

#define NFASTBINS  (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)

/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz) \
  ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)

#define request2size(req)                                         \
  (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
   MINSIZE :                                                      \
   ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)

/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)

在64位系统中，SIZE_SZ的值为8，由此可得MAX_FAST_SIZE的值为160，MALLOC_ALIGN_MASK的值为15，则request2size的返回值为176，最终得到NFASTBINS的值为10。因此fastbinsY数组中最多能存储10个fastbin链表

[!NOTE] 注：
这里的MAX_FAST_SIZE为160，是不包括chunk头的size。并且在计算fastbin_index时，之所以后面要-2，是因为fastbin的最小的CHUNK_SIZE不是从0开始的，而是从0x20开始的。

fastbin的存储方式是单向链表，因此在malloc_chunk结构体中除了chunk头只有fd指针被使用，进入fastbin链表的chunk的P位会始终为1，因此不会有前后空闲的chunk块合并的现象。fastbin所在的单向链表采用先进后出的原则，其中fastbinsY数组中每个链表元素里面存储的chunk都是相同大小的，我们前面分析了fastbinsY数组最多有10个元素，因此chunk大小范围为0x20到0xb0。
但实际堆刚初始化时，fastbinsY数组存储的chunk大小范围为0x20到0x80，这是由于一个全局变量global_max_fast，在堆初始化时，也就是执行malloc_init_state函数时，global_max_fast这个全局变量会被设置为DEFAULT_MXFAST，也就是128，0x80

  #define DEFAULT_MXFAST     (64 * SIZE_SZ / 4)
  if (av == &main_arena)
    set_max_fast (DEFAULT_MXFAST);

因此我们在初始时只能使用0x20到0x80的大小范围，当大于这个数值时就会进入unsortedbin。如果想要使用后面0x90到0xb0的大小，可以使用mallopt函数来设置最大值。当chunk被free时，如果大小在global_max_fast范围内则进入fastbin。

[!NOTE] 注：
这里的0x20-0xb0是指包括chunk头的大小，也就是size的内容

tcache

tcache与其他的bins不同，它不存储在malloc_state结构体，而是单独申请出来的独属于各个线程本身的缓存。
来由：
在glibc中，主分区和子分区形成一个环形链表，在分配内存的时候，每个线程都有一个私有指针变量，来遍历每个分配区，查看哪个分区没有上锁，如果分区都被占用，则建立一个新的分区，这种方式固然可以相对加快多线程分配内存，但指针在遍历分区以及加锁释放锁时，仍然会有时间损耗，因此，在glibc2.26后，引入了tcache来解决线程锁的问题，每个线程都有自己的缓存，多线程之间不再需要竞争全局锁。
结构体：
由于tcache是单独的缓存，因此它由单独的结构体进行创建：

# define TCACHE_MAX_BINS		64

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  struct tcache_perthread_struct *key;
} tcache_entry;

typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

tcache由tcache_entry和tcache_perthread_struct这两个结构体来管理。

tcache_entry由两个成员变量组成
1. next指针，用来存放指向tcachebin中下一个chunk的地址
2. key，用来标记chunk已经被free，避免double free发生
tcache_perthread_struct结构体用来管理tcache bins，每个线程都有一个。它由两个成员变量组成
1. counts[TCACHE_MAX_BINS]，一个字节数组，用来记录各个大小的tcachebin的数量，最大为7，这里下文会解释为啥是7
2. *entries[TCACHE_MAX_BINS]，一个指针数组，用来记录各个大小的tcachebin，每个元素存储相同大小的tcachebin，存储内容为tcache_entry结构体的地址

[!NOTE] Compare tcache to fastbin
相同：tcache和fastbin的size区域的P位都始终为1，链表都采用先进后出
不同：tcache_entry的next指针指向的是userdata部分，而fastbin的fd指针指向的是chunk头

这里用一张图来解释tcache

![[b.xm.drawio (1).png]]
可以看到，一个tcache bin chunk至少0x20字节

bins[NBINS * 2 - 2]

在看unsortedbin、largebin、smallbin之前，我们先来解释一下malloc_state结构体中的bins数组是如何存储这几种类型的bins
首先，顾名思义bins[NBINS * 2 - 2]是一个存储着所有unsortedbin、largebin、以及smallbin链表表头（头结点）的数组。其中每个线程有1个unsortedbin，62个smallbin，63个largebin。
Bin 1 – Unsorted bin
Bin 2 to Bin 63 – Small bin
Bin 64 to Bin 126 – Large bin
![[Pasted image 20240221182218.png]]
（宇宙免责声明：此图非本人所画，原作者为看雪某位大佬）

#define NBINS             128

为了解释为啥数组参数是NBINS * 2 - 2，即254。我们先来思考一下，我们该怎么去存储这126个malloc_chunk结构体，首先在64位下，malloc_chunk结构体需要6*8字节的大小，也就是6个mchunkptr大小的。所以我们一共有126个chunk（126个链表头结点）需要存储，那我们需要多少个mchunkptr呢，答案是126*6 = 756个。但实际上我们真的需要这么多的空间来存储头结点吗，显然不是，我们知道头结点它并不存储真正的数据，它只有fd和bk指针是有效的，那我们是不是就可以将除了fd和bk的其他字段重复利用，实现"空间复用"的效果。如图，实现了复用之后，我们就可以只用126*2 =252个mchunkptr来存储头结点。这里注意我们只是进行了空间复用，并不能将前面bin[0]和bin[1]中浪费的两个字段空间给删除，因此，最终我们所需要的bins数组大小为128*2-2 。

smallbin

#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > 2 * SIZE_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
  ((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)

#define smallbin_index(sz) \
  ((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))\
   + SMALLBIN_CORRECTION)

smallbin是一个双向循环链表，采用先进先出的头插法来插入最新free的chunk。MIN_LARGE_SIZE是最小的largebin大小，大小为，in_smallbin_range用来判断是否小于MIN_LARGE_SIZE，64位大小为1024，smallbin_index通过size来获取对应的index下标。其中SMALLBIN_CORRECTION官方说法是当MALLOC_ALIGNMENT = 16，SIZE_SZ = 4时用来修正下标的。但我MALLOC_ALIGNMENT本身不就等于2*SIZE_SZ吗，暂时不太理解这个修正值的作用，如果有读者知晓，欢迎评论交流！

[!NOTE] 注：
每个smallbin链表和fastbin，tcache相同，都存储着相同大小的chunk。bins中每个相邻的smallbin链表大小相差0x10字节（64位）。

同样我们用一张图来解释smallbin的存储：
![[smallbin.drawio.png]]
简单解释一下这张图，其中第一个chunk块，是双向循环链表中的头结点，正如前文所说我们对头结点及进行了空间复用，因此，其中的prevsize和size字段实际存储了前一个bins的fd和bk字段。

largebin

同样，我们先来看一下存储结构图：
![[largebin.drawio.png]]
largebin不同于smallbin，每个链表都对应着固定的大小，smallbin的分配策略很适合小堆块分配，但当遇到大堆块分配时，我们无法做到每个大的chunk都事先准备一个链表存储起来，因此largebin和smallbin不同的是，每个链表中存储的不是固定大小的堆块，而是固定大小范围的堆块。既然是固定大小范围，我们在malloc的时候该如何找到对应大小的堆块呢。为了加速堆块的寻找，我们可以对largebin加上fd_nextsize和bk_nextsize字段，它们之间也会形成一个双向循环链表来供我们查找size使用。其中fd_nextsize指向比本身大的chunk头，bk_nextsize指向比自己小的chunk头。

同样，我们来看一下源码：

/*
   Indexing

    Bins for sizes < 512 bytes contain chunks of all the same size, spaced
    8 bytes apart. Larger bins are approximately logarithmically spaced:

    64 bins of size       8  
    32 bins of size      64
    16 bins of size     512
     8 bins of size    4096
     4 bins of size   32768
     2 bins of size  262144
     1 bin  of size what's left

    There is actually a little bit of slop in the numbers in bin_index
    for the sake of speed. This makes no difference elsewhere.

    The bins top out around 1MB because we expect to service large
    requests via mmap.

    Bin 0 does not exist.  Bin 1 is the unordered list; if that would be
    a valid chunk size the small bins are bumped up one.
 */

#define largebin_index_32(sz)                                                \
  (((((unsigned long) (sz)) >> 6) <= 38) ?  56 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

#define largebin_index_32_big(sz)                                            \
  (((((unsigned long) (sz)) >> 6) <= 45) ?  49 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

// XXX It remains to be seen whether it is good to keep the widths of
// XXX the buckets the same or whether it should be scaled by a factor
// XXX of two as well.
#define largebin_index_64(sz)                                                \
  (((((unsigned long) (sz)) >> 6) <= 48) ?  48 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

#define largebin_index(sz) \
  (SIZE_SZ == 8 ? largebin_index_64 (sz)                                     \
   : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz)                     \
   : largebin_index_32 (sz))

#define bin_index(sz) \
  ((in_smallbin_range (sz)) ? smallbin_index (sz) : largebin_index (sz))

对于注释的解释：
bins数组中共包含63个largebin链表，这63个bin被分成了6组，每组bin的chunk大小之间的公差一致。

组	数量	步长
0	64	8
1	32	64
2	16	512
3	8	4096
4	4	32768
5	2	262144
6	1	不限制
（其中第0组属于smallbin，smallbin实际有62个）

其次是源码，largebin_index_32和largebin_index_32_big都使用了非常复杂的嵌套运算符，我们将其进行转化得到

 unsigned long largebin_index_32(SIZE_T sz) {
     if(sz <= 2432) {
         return 56 + sz >> 6;
     }else if(sz <= 10240) {
         return 91 + sz >> 9;
     }else if(sz <= 40960) {
         return 110 + sz >> 12;
     }else if(sz <= 131072) {
         return 110 + sz >> 15;
     }else if(sz <= 524288) {
         return 124 + sz >> 18;
     }else {
         return 126;
     }
 }

 unsigned long largebin_index_32_big(SIZE_T sz) {
     if(sz <= 2880) {
         return 49 + sz >> 6;
     }else if(sz <= 10240) {
         return 91 + sz >> 9;
     }else if(sz <= 40960) {
         return 110 + sz >> 12;
     }else if(sz <= 131072) {
         return 110 + sz >> 15;
     }else if(sz <= 524288) {
         return 124 + sz >> 18;
     }else {
         return 126;
     }
 }
 
 unsigned long largebin_index_64(SIZE_T sz) {
     if(sz >> 6 <= 48) {
         return 48 + sz >> 6;
     }else if(sz >> 9 <= 20) {
         return 91 + sz >> 9;
     }else if(sz >> 12 <= 10) {
         return 110 + sz >> 12;
     }else if(sz >> 15 <= 4) {
         return 119 + sz >> 15;
     }else if(sz >> 18 <= 2) {
         return 124 + sz >> 18;
     }else {
         return 126;
     }
 }

下表放出chunk大小和index的关系，供大家参考学习（表格来源github）

Start	End	index
0	7	不存在
8	15	不存在
16	23	2
24	31	3
32	39	4
40	47	5
48	55	6
56	63	7
64	71	8
72	79	9
80	87	10
88	95	11
96	103	12
104	111	13
112	119	14
120	127	15
128	135	16
136	143	17
144	151	18
152	159	19
160	167	20
168	175	21
176	183	22
184	191	23
192	199	24
200	207	25
208	215	26
216	223	27
224	231	28
232	239	29
240	247	30
248	255	31
256	263	32
264	271	33
272	279	34
280	287	35
288	295	36
296	303	37
304	311	38
312	319	39
320	327	40
328	335	41
336	343	42
344	351	43
352	359	44
360	367	45
368	375	46
376	383	47
384	391	48
392	399	49
400	407	50
408	415	51
416	423	52
424	431	53
432	439	54
440	447	55
448	455	56
456	463	57
464	471	58
472	479	59
480	487	60
488	495	61
496	503	62
504	511	63
512	575	64
576	639	65
640	703	66
704	767	67
768	831	68
832	895	69
896	959	70
960	1023	71
1024	1087	72
1088	1151	73
1152	1215	74
1216	1279	75
1280	1343	76
1344	1407	77
1408	1471	78
1472	1535	79
1536	1599	80
1600	1663	81
1664	1727	82
1728	1791	83
1792	1855	84
1856	1919	85
1920	1983	86
1984	2047	87
2048	2111	88
2112	2175	89
2176	2239	90
2240	2303	91
2304	2367	92
2368	2431	93
2432	2495	94
2496	2559	95
2560	3071	96
3072	3583	97
3584	4095	98
4096	4607	99
4608	5119	100
5120	5631	101
5632	6143	102
6144	6655	103
6656	7167	104
7168	7679	105
7680	8191	106
8192	8703	107
8704	9215	108
9216	9727	109
9728	10239	110
10240	10751	111
10752	14847	112
14848	18943	113
18944	23039	114
23040	27135	115
27136	31231	116
31232	35327	117
35328	39423	118
39424	43519	119
43520	76287	120
76288	109055	121
109056	141823	122
141824	174591	123
174592	436735	124
436736	698879	125
698880	2^32或 2^64	126

unsortedbin

unsortedbin可以被当做smallbin和largebin的缓存。它只有一个双向循环链表，遵循先进先出的原则，并且不排序，当chunk被释放时，不会先将其放入smallbin或largebin，而是先检查其物理相邻的前后chunk是否空闲，空闲则可以进行合并，合并后使用头插法将其放入unsortedbin。当malloc时反向遍历unsortedbin，在遍历时如果不是合适大小，会将其放入对应的smallbin或者largebin，如果是恰好合适大小就可以直接拿来使用了。

/*
   Unsorted chunks

    All remainders from chunk splits, as well as all returned chunks,
    are first placed in the "unsorted" bin. They are then placed
    in regular bins after malloc gives them ONE chance to be used before
    binning. So, basically, the unsorted_chunks list acts as a queue,
    with chunks being placed on it in free (and malloc_consolidate),
    and taken off (to be either used or placed in bins) in malloc.

    The NON_MAIN_ARENA flag is never set for unsorted chunks, so it
    does not have to be taken into account in size comparisons.
 */

/* The otherwise unindexable 1-bin is used to hold unsorted chunks. */
#define unsorted_chunks(M)          (bin_at (M, 1))

这里注释的关键信息是所有分割剩下的块以及所有返回的块，都会优先被放置在unsortedbin中，在被分配到常规的bins中前，它们有一次机会被malloc，所以unsortedbin基本上在free或malloc_consolidate中使用，在malloc中被取出。
并且unsortedbin的NON_MAIN_ARENA字段始终为0，因此在比较arena大小时它不被考虑其中。

unsortedbin和smallbin的结构相似，参考上面smallbin的图即可

top chunk

/*
   Top

    The top-most available chunk (i.e., the one bordering the end of
    available memory) is treated specially. It is never included in
    any bin, is used only if no other chunk is available, and is
    released back to the system if it is very large (see
    M_TRIM_THRESHOLD).  Because top initially
    points to its own bin with initial zero size, thus forcing
    extension on the first malloc request, we avoid having any special
    code in malloc to check whether it even exists yet. But we still
    need to do so when getting memory from system, so we make
    initial_top treat the bin as a legal but unusable chunk during the
    interval between initialization and the first call to
    sysmalloc. (This is somewhat delicate, since it relies on
    the 2 preceding words to be zero during this interval as well.)
 */

/* Conveniently, the unsorted bin can be used as dummy top on first call */
#define initial_top(M)              (unsorted_chunks (M))

注释：topchunk是一个特殊的chunk块，它不属于任何bin，只有在没有其他chunk可以使用的时候使用。并且如果非常大（即=-1）时，则会被释放返回系统。应为top chunk初始时大小为0，我们在malloc扩展时，为了避免检查其是否存在，因此我们在初始化和第一次调用sysmallloc之间的时间间隔内使用initial_top将其视为合法但不可用的块。

函数#

解释完各种类型的bins，接下来我们来看glibc中和malloc以及free有关的各种函数

__libc_malloc

首先就是__libc_malloc函数，方便读者阅读，直接在代码中的解释

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr; //定义了一个malloc_state结构体指针变量
  void *victim;
  //在malloc前先执行hook函数，注意在glibc2.34之后的版本已经没有hook函数可以使用了
  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));
  //如果可以从tcachebin中分配chunk则进入此分支，在glibc2.26之前是没有这段代码的
#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;//tbytes用来存储修改后的字节大小
  checked_request2size (bytes, tbytes);
  size_t tc_idx = csize2tidx (tbytes);//使用修改后的字节大小返回对应的下标

  MAYBE_INIT_TCACHE ();//如果tcache不存在则使用tcache_init()函数初始化tcache，关于tcache_init()函数如何初始化的，详见后文
/*
# define MAYBE_INIT_TCACHE() \
  if (__glibc_unlikely (tcache == NULL)) \
    tcache_init();
*/

  DIAG_PUSH_NEEDS_COMMENT;
  /*获取完对应tcache的下标后，判断其是否符合标准，已经tcache和entries对应的下标是否存在，如果满足条件，则使用tcache_get获取tcache，tcache_get也会在后文做详细解释*/
/*
static struct malloc_par mp_ =
{
  .top_pad = DEFAULT_TOP_PAD,
  .n_mmaps_max = DEFAULT_MMAP_MAX,
  .mmap_threshold = DEFAULT_MMAP_THRESHOLD,
  .trim_threshold = DEFAULT_TRIM_THRESHOLD,
#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))
  .arena_test = NARENAS_FROM_NCORES (1)
#if USE_TCACHE
  ,
  .tcache_count = TCACHE_FILL_COUNT,
  .tcache_bins = TCACHE_MAX_BINS,
  .tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),
  .tcache_unsorted_limit = 0 // No limit. 
#endif
};
*/
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif
  //如果是单线程，则直接从main_arena调用_int_malloc分配chunk
  if (SINGLE_THREAD_P)
    {
      victim = _int_malloc (&main_arena, bytes);
      assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
	      &main_arena == arena_for_chunk (mem2chunk (victim)));
      return victim;
    }
  //如果不是单线程，则调用arena_get获取一个可用的arena，其中ar_ptr是一个malloc_state用来存储arena实例
  arena_get (ar_ptr, bytes);
  //分配chunk
  victim = _int_malloc (ar_ptr, bytes);
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
  //如果分配不成功，或者没有获取arena成功，则重新获取arena分配chunk
  if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }
  //释放锁
  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);
  //安全检查
  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}

看完__libc_malloc，我们来看一下其中的tcache_init和tcache_get是如何使用的

static void
tcache_init(void)
{
  mstate ar_ptr;//同样定义一个malloc_state类型的指针
  void *victim = 0;
  /*
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
  */
  //获取tcache_perthread_struct的大小
  const size_t bytes = sizeof (tcache_perthread_struct);
  //检查线程是否在关闭tcache
  if (tcache_shutting_down)
    return;
  //根据字节大小获取一个可用的arena，并分配chunk，这里和前面的__libc_malloc步骤相似
  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }

  
  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  //初始化tcache
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }
}

tcache_get

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{ 
  //根据下标获取对应的tcache链，检查其是否满足条件
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  //将对应的chunk块从链表中移除
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  e->key = NULL;
  return (void *) e;
}

tcache_put

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  //将指向chunk头的chunk指针转为指向user_data字段的指针e
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  //检查是否满足条件
  assert (tc_idx < TCACHE_MAX_BINS);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache;
  //将tcachebin放入tcache链表中，这里注意在glibc2.32之后的smallbin和tcache中，增加了PROTECT_PTR地址保护，这个保护机制不再是直接将e指针放入链表中，而是进行移位异或操作后放入链表中。
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

tcache的三个函数分析完毕，我们接下来就要看glibc中非常重要的函数了
_int_malloc
可以看到这个函数非常地复杂，不过不用担心，我们来分段一点一点分析，分析下来实际上还是比较简单地

初始化：

一开始就是将变量初始化，然后使用checked_request2size函数将用户申请地字节进行矫正，修改为符号标准地字节数赋值给nb。
如果没有可用地arena，则调用sysmalloc，使用mmap来分配内存块。

static void *
_int_malloc (mstate av, size_t bytes)
{
  INTERNAL_SIZE_T nb;               /* normalized request size */
  unsigned int idx;                 /* associated bin index */
  mbinptr bin;                      /* associated bin */

  mchunkptr victim;                 /* inspected/selected chunk */
  INTERNAL_SIZE_T size;             /* its size */
  int victim_index;                 /* its bin index */

  mchunkptr remainder;              /* remainder from a split */
  unsigned long remainder_size;     /* its size */

  unsigned int block;               /* bit map traverser */
  unsigned int bit;                 /* bit map traverser */
  unsigned int map;                 /* current word of binmap */

  mchunkptr fwd;                    /* misc temp for linking */
  mchunkptr bck;                    /* misc temp for linking */

#if USE_TCACHE
  size_t tcache_unsorted_count;	    /* count of unsorted chunks processed */
#endif

  /*
     Convert request size to internal form by adding SIZE_SZ bytes
     overhead plus possibly more to obtain necessary alignment and/or
     to obtain a size of at least MINSIZE, the smallest allocatable
     size. Also, checked_request2size traps (returning 0) request sizes
     that are so large that they wrap around zero when padded and
     aligned.
   */

  checked_request2size (bytes, nb);

  /* There are no usable arenas.  Fall back to sysmalloc to get a chunk from
     mmap.  */
  if (__glibc_unlikely (av == NULL))
    {
      void *p = sysmalloc (nb, av);
      if (p != NULL)
	alloc_perturb (p, bytes);
      return p;
    }
 ...
}

fastbin：

如果小于fastbin的大小，则遍历fastbin进行查找。同时如果fastbin中找到了所需的chunk并且对应的tcachebin有空位，就将其size对应的所有chunk链入对应大小的tcachebin。目的就是利用局部性原理提高内存分配的效率。

static void *
_int_malloc (mstate av, size_t bytes)
{
  ...
  /*
     If the size qualifies as a fastbin, first check corresponding bin.
     This code is safe to execute even if av is not yet initialized, so we
     can try it without checking, which saves some time on this fast path.
   */

#define REMOVE_FB(fb, victim, pp)			\
  do							\
    {							\
      victim = pp;					\
      if (victim == NULL)				\
	break;						\
    }							\
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
	 != victim);					
  //判断是否在fastbin大小范围内，64位为0xa0
  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);//获取对应大小的链表头指针
      mchunkptr pp;
      victim = *fb;//将第一个chunk从链表头中取出，这也对应着fastbin后入先出的存储顺序

      if (victim != NULL)
	{
	  if (SINGLE_THREAD_P)
	    *fb = victim->fd;//将victim从链表中取出
	  else
	    REMOVE_FB (fb, pp, victim);
	  //检查分配的chunk是否属于刚才找到的fastbin链表
	  if (__glibc_likely (victim != NULL))
	    {
	      size_t victim_idx = fastbin_index (chunksize (victim));
	      if (__builtin_expect (victim_idx != idx, 0))
		malloc_printerr ("malloc(): memory corruption (fast)");
	      check_remalloced_chunk (av, victim, nb);
#if USE_TCACHE
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
		  //将大小是nb的smallbin中的chunk放入tcache对应的链中
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;
          //当对应的tcachebin链没有满，并且刚被拿出的chunk的下个chunk不为空
		  /* While bin not empty and tcache not full, copy chunks.  */
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		    //将smallbin中的chunk取出
		      if (SINGLE_THREAD_P)
			*fb = tc_victim->fd;
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
			//将取出的chunk放入tcachebin中
		      tcache_put (tc_victim, tc_idx);
		    }
		}
#endif
          //返回从fastbin中找到的chunk
	      void *p = chunk2mem (victim);
	      alloc_perturb (p, bytes);
	      return p;
	    }
	}
    }
  ...
}

smallbin:

smallbin和fastbin相似，如果size在smallbin的大小区间内，则通过bins双向循环链表取出最后一个chunk，并且在取出的同时将其他相同大小的chunk放入tcachebin中。

static void *
_int_malloc (mstate av, size_t bytes)
{
  ...
  /*
     If a small request, check regular bin.  Since these "smallbins"
     hold one size each, no searching within bins is necessary.
     (For a large request, we need to wait until unsorted chunks are
     processed to find best fit. But for small ones, fits are exact
     anyway, so we can check now, which is faster.)
   */
  //64位下，小于0x400，所以smallbin中的最大字节大小为0x3f0
  if (in_smallbin_range (nb))
    {
      //找到对应的smallbin链表，返回链表头
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);
      //如果当前堆块的链表头的上一个chunk，即最后一个chunk不是表头本身，说明当前链表不为空，则将最后一个chunk作为victim取出，符合smallbin的先进先出的存储顺序
      if ((victim = last (bin)) != bin)
        {
          bck = victim->bk;
      //确保倒数第二个chunk的下一个chunk指向的是我们所取出的victim
	  if (__glibc_unlikely (bck->fd != victim))
	    malloc_printerr ("malloc(): smallbin double linked list corrupted");
	      //将victim物理内存中下一个chunk的prev_inuse位置置为1，即标记victim为使用状态。
          set_inuse_bit_at_offset (victim, nb);
          //将victim从链表中取出
          bin->bk = bck;
          bck->fd = bin;
        //如果不是main_arena，则将non_main_arena置为1
          if (av != &main_arena)
	    set_non_main_arena (victim);
          check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  //这里和fastbin那里相似，同样基于局部性原理，将smallbin中的chunk放入对应的tcachebin链中，与fastbin稍稍不同的地方在于，smallbin的chunk在放入tcachebin之前将prev_inuse位置置为1，这说明fastbin和tcachebin相同，其中的chunk的prev_inuse位置都为1，为了防止合并
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;

		      tcache_put (tc_victim, tc_idx);
	            }
		}
	    }
#endif    
          //返回取出的chunk
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }
  ...
}

largebin1：

为什么说是largebin1，是因为这里不是真正分配largebin的地方，而是在分配前所做的准备工作：

计算出对应的largebin的索引
调用malloc_consolidate函数合并fastbin，为了减少堆中的碎片
- 这里的malloc_consolidate函数我们会在后面进行分析

static void *
_int_malloc (mstate av, size_t bytes)
{
  ...
  /*
     If this is a large request, consolidate fastbins before continuing.
     While it might look excessive to kill all fastbins before
     even seeing if there is space available, this avoids
     fragmentation problems normally associated with fastbins.
     Also, in practice, programs tend to have runs of either small or
     large requests, but less often mixtures, so consolidation is not
     invoked all that often in most programs. And the programs that
     it is called frequently in otherwise tend to fragment.
   */
/*翻译：
	如果是largebin的请求，在继续之前先合并所有的fastbin。虽然在还没有查看是否有空闲的空间的前提下就合并所有的fastbin很过分。但这也有效地避免了由fastbin引起地碎片化问题。并且在实际运行过程中，程序可能会经常申请大内存，或者经常申请小内存，但很少会出现混合的情况，因此呢，在绝大多数的程序中，合并请求并不经常调用。而其他混合调用的情况，程序往往会产生大量的碎片。
*/
  else
    {
      idx = largebin_index (nb);
      //合并fastbin到unsortedbin中
      if (atomic_load_relaxed (&av->have_fastchunks))
        malloc_consolidate (av);
    }
  ...
}

unsortedbin：

在这个过程中，不光在寻找所需大小的chunk，还将unsortedbin中的chunk进行归类，这也是唯一一处对unsortedbni进行分类的地方。并且这里注意一点，在将大小合适的chunk放入tcachebin中以后并没有直接返回，而是continue，继续寻找是否有相同大小的chunk可以放入tcache中。（同局部性原理）

static void *
_int_malloc (mstate av, size_t bytes)
{
  ...
  /*
     Process recently freed or remaindered chunks, taking one only if
     it is exact fit, or, if this a small request, the chunk is remainder from
     the most recent non-exact fit.  Place other traversed chunks in
     bins.  Note that this step is the only place in any routine where
     chunks are placed in bins.

     The outer loop here is needed because we might not realize until
     near the end of malloc that we should have consolidated, so must
     do so and retry. This happens at most once, and only when we would
     otherwise need to expand memory to service a "small" request.
   */
//对tcache变量的定义以及初始化
#if USE_TCACHE
  INTERNAL_SIZE_T tcache_nb = 0;
  size_t tc_idx = csize2tidx (nb);//nb大小对应的tcachebin的下标
  if (tcache && tc_idx < mp_.tcache_bins)
    tcache_nb = nb;
  int return_cached = 0;//用来标记是否由合适大小的chunk在下面的循环中放入tcachebin

  tcache_unsorted_count = 0;//下面大循环处理的unsortedbin中的chunk的数量
#endif
  //这个大循环，也是上面注释里说的outer loop
  for (;; )
    {
      int iters = 0;
      //当unsortedbin不为空时
      while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
        {
          bck = victim->bk;
          //检查victim的大小是否合理
          if (__builtin_expect (chunksize_nomask (victim) <= 2 * SIZE_SZ, 0)
              || __builtin_expect (chunksize_nomask (victim)
				   > av->system_mem, 0))
            malloc_printerr ("malloc(): memory corruption");
          size = chunksize (victim);

          /*
             If a small request, try to use last remainder if it is the
             only chunk in unsorted bin.  This helps promote locality for
             runs of consecutive small requests. This is the only
             exception to best-fit, and applies only when there is
             no exact fit for a small chunk.
        翻译：
		     如果时小块的请求并且unsortedbin中只有一个chunk，尝试使用last remainder。这会帮助促进连续小堆块请求的本地化。这是唯一一个例外去精准匹配，只适用于没有可以精准匹配小堆块的情况
           */
          //如果在smallbin的范围内并且unsortedbin中只有一个chunk，并且这个chunk是last remainder，并且在分割nb字节后仍然可以成为一个堆块。
          if (in_smallbin_range (nb) &&
              bck == unsorted_chunks (av) &&
              victim == av->last_remainder &&
              (unsigned long) (size) > (unsigned long) (nb + MINSIZE))
            {
              /* split and reattach remainder */
              //分割这个堆块，之后将剩余部分重新链接到unsortedbin中
              remainder_size = size - nb;//剩余部分的大小
              remainder = chunk_at_offset (victim, nb);//剩余部分的起始地址
              unsorted_chunks (av)->bk = unsorted_chunks (av)->fd = remainder;
              //将remainder链入unsortedbin链中
              av->last_remainder = remainder;
              remainder->bk = remainder->fd = unsorted_chunks (av);
              //如果剩余部分过大，则将fd_nextsize和bk_nextsize字段置空
              if (!in_smallbin_range (remainder_size))
                {
                  remainder->fd_nextsize = NULL;
                  remainder->bk_nextsize = NULL;
                }
			  //设置victim的头部
              set_head (victim, nb | PREV_INUSE |
                        (av != &main_arena ? NON_MAIN_ARENA : 0));
              //设置剩余部分的头部和尾部
              set_head (remainder, remainder_size | PREV_INUSE);
              set_foot (remainder, remainder_size);
		      //返回victim
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }

          /* remove from unsorted list */
          //将victim移出unsortedbin
          unsorted_chunks (av)->bk = bck;
          bck->fd = unsorted_chunks (av);

          /* Take now instead of binning if exact fit */
		  //如果victim的大小刚好合适，则设置标识位
          if (size == nb)
            {
              set_inuse_bit_at_offset (victim, size);
              if (av != &main_arena)
		set_non_main_arena (victim);
#if USE_TCACHE
	      /* Fill cache first, return to user only if cache fills.
		 We may return one of these chunks later.  */
		 //如果tcache没满，则优先将victim放入tcache，而不是返回给用户
	      if (tcache_nb
		  && tcache->counts[tc_idx] < mp_.tcache_count)
		{
		  tcache_put (victim, tc_idx);
		  return_cached = 1;
		  continue;
		}
	      else
		{
#endif        //返回victim
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
#if USE_TCACHE
		}
#endif
            }

          /* place chunk in bin */
		  //如果大小和nb并不匹配，则将该chunk放入其所属的bin中
		  //如果在smallbin的范围中，就放入smallbin中
          if (in_smallbin_range (size))
            {
              victim_index = smallbin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;
            }
          //否则，则放入largebin中
          else
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index);//bck用于存储bins链表头
              fwd = bck->fd;//fwd用于存储链表中第一个chunk

              /* maintain large bins in sorted order */
              //如果链表不为空
              if (fwd != bck)
                {
                  /* Or with inuse bit to speed comparisons */
                  //将size的PREV_INUSE位置1，加速比较，因为largebin中的chunk的P位都为1
                  size |= PREV_INUSE;
                  /* if smaller than smallest, bypass loop below */
                  assert (chunk_main_arena (bck->bk));
                  //如果chunk的大小小于最小的chunk，则直接插入到最后
                  if ((unsigned long) (size)
		      < (unsigned long) chunksize_nomask (bck->bk))
                    {
                      fwd = bck; //头结点
                      bck = bck->bk; //指向最后一个bin的指针

                      victim->fd_nextsize = fwd->fd; //将第一个bin的地址赋值给victim的fd_nextsize字段，因为largebin是从大到小排列的，所以，第一个bin是最大的bin，fd_nextsize是指向比自己小的bin，但这个链表是双向循环链表，所以，最后一个bin指向了第一个bin，也就是最大的bin
                      victim->bk_nextsize = fwd->fd->bk_nextsize;//第一个bin的bk_nextsize指向比自己大的bin，但同上，这个bin是最小的bin
                      fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;//将别的bin的对应字段设置为victim
                    }
                  else
                    {
                      assert (chunk_main_arena (fwd));
                      //循环遍历largebin链表
                      while ((unsigned long) size < chunksize_nomask (fwd))
                        {
                          fwd = fwd->fd_nextsize;
			  assert (chunk_main_arena (fwd));
                        }

                      if ((unsigned long) size
			  == (unsigned long) chunksize_nomask (fwd))
                        /* Always insert in the second position.  */
                        //如果字节大小相同，则直接将该bin的下一个bin地址赋值给fwd
                        fwd = fwd->fd;
                      else
                        {
                        //如果字节大小不同，则将该bin的地址和该bin的bk_nextsize即比该bin大的bin地址赋值给victim的两个字段
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                        //将别的bin的对应字段设置为victim
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;//bck用来存储比victim大的bin，fwd用来存储比victim小的bin
                    }
                }
              else//如果链表为空，则直接进行赋值
                victim->fd_nextsize = victim->bk_nextsize = victim;
            }
          //这个函数和binmap有关，用来设置binmap中第victim_index+1个bin在binmap中对应的bit位为1，也就是说标记这个bin是否为空闲状态
          mark_bin (av, victim_index);
          //总的对fd和bk字段进行赋值操作
          victim->bk = bck;
          victim->fd = fwd;
          fwd->bk = victim;
          bck->fd = victim;
          
#if USE_TCACHE
      /* If we've processed as many chunks as we're allowed while
	 filling the cache, return one of the cached ones.  */
	 //如果已经将足够多的bin放入tcache就直接从tcache中找到chunk并返回
      ++tcache_unsorted_count;
      if (return_cached
	  && mp_.tcache_unsorted_limit > 0
	  && tcache_unsorted_count > mp_.tcache_unsorted_limit)
	{
	  return tcache_get (tc_idx);
	}
#endif

#define MAX_ITERS       10000
		  //最多循环10000次
          if (++iters >= MAX_ITERS)
            break;
        }

#if USE_TCACHE
      /* If all the small chunks we found ended up cached, return one now.  */
      //循环结束判断是否有大小合适的bin放入tcache，如果有则获取chunk返回
      if (return_cached)
	{
	  return tcache_get (tc_idx);
	}
#endif
  ...
}

largebin2：



      /*
         If a large request, scan through the chunks of current bin in
         sorted order to find smallest that fits.  Use the skip list for this.
       */

      if (!in_smallbin_range (nb))
        {
          bin = bin_at (av, idx);

          /* skip scan if empty or largest chunk is too small */
          //判断largebin是否为空，并且nb的大小不大于largebin中最大的size
          if ((victim = first (bin)) != bin
	      && (unsigned long) chunksize_nomask (victim)
	        >= (unsigned long) (nb))
            {
              victim = victim->bk_nextsize;//此时victim指向最小的chunk
              while (((unsigned long) (size = chunksize (victim)) <
                      (unsigned long) (nb)))
                victim = victim->bk_nextsize;//反向循环，从小到大查找，找到不小于nb的chunk

              /* Avoid removing the first entry for a size so that the skip
                 list does not have to be rerouted.  */
                 //如果找到的该chunk不是最后一个chunk并且该size的chunk不止一个，就使用这个chunk的后一个chunk（原因在前文有说过，相同大小的largebin，只有第一个是被链入到nextsize链表中的，其余的只需在总的链表中链入相同大小的chunk后面，因此我们在取出这个chunk时，优先取出不在nextsize链表中的，减少操作，加快运行）
              if (victim != last (bin)
		  && chunksize_nomask (victim)
		    == chunksize_nomask (victim->fd))
                victim = victim->fd;
			  //计算分配后chunk剩余的大小
              remainder_size = size - nb;
              //将victim块移除链表，这里会疑问传入的bck，fwd有什么用，实际上在后面分析unlink函数会发现后面两个变量，会被覆盖，这里没有什么具体的作用
              unlink (av, victim, bck, fwd);

              /* Exhaust */
              //如果剩余大小小于MINSIZE,则直接设置标识位并返回
              if (remainder_size < MINSIZE)
                {
                  set_inuse_bit_at_offset (victim, size);
                  if (av != &main_arena)
		    set_non_main_arena (victim);
                }
              /* Split */
              //否则将该chunk进行分割
              else
                {
                  //remainder指向切割剩下的chunk
                  remainder = chunk_at_offset (victim, nb);
                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  */
                  //bck指向unsortedbin头结点，fwd指向unsortedbin中第一个chunk
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
            //检验正确性
		  if (__glibc_unlikely (fwd->bk != bck))
		    malloc_printerr ("malloc(): corrupted unsorted chunks");
			      //将剩余的chunk作为第一个chunk插入unsortedbin链表中
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;
                  //如果该chunk属于largebin的范围则将其nextsize字段置为空
                  if (!in_smallbin_range (remainder_size))
                    {
                      remainder->fd_nextsize = NULL;
                      remainder->bk_nextsize = NULL;
                    }
                  //设置标识位
                  set_head (victim, nb | PREV_INUSE |
                            (av != &main_arena ? NON_MAIN_ARENA : 0));
                  set_head (remainder, remainder_size | PREV_INUSE);
                  set_foot (remainder, remainder_size);
                }
              //
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }
        }

unlink

/* Take a chunk off a bin list */

#define unlink(AV, P, BK, FD) {                                            \

    if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0))      \

      malloc_printerr ("corrupted size vs. prev_size");           \
    //获取chunk的前一个chunk和后一个chunk
    FD = P->fd;                     \

    BK = P->bk;                     \

    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))         \

      malloc_printerr ("corrupted double-linked list");           \

    else {                      \
		//将前一个chunk的bk字段赋值为后一个chunk
        FD->bk = BK;                    \
		//将后一个chunk的fd字段赋值为前一个chunk
        BK->fd = FD;                    \
		//如果不在smallbin的范围内，并且P->fd_nextsize不为空，即这个chunk是largebin，并且还是这个size的第一个chunk（也被链入到了nextsize链表中）
		//其中的__builtin_expect (P->fd_nextsize != NULL, 0)的含义是P->fd_nextsize不为空这个条件大概率为假。这个函数存在的意义就是用来进行代码优化的——当我们在编译c文件为汇编代码时，程序在执行汇编代码并不是顺序逐行执行，而是并行执行，因此，指令跳转在底层很大程度上降低性能，而我们加入__builtin_expect函数则能够一定程度上提高性能
        if (!in_smallbin_range (chunksize_nomask (P))           \

            && __builtin_expect (P->fd_nextsize != NULL, 0)) {          \
	  //检验正确性
      if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)        \

    || __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))    \

        malloc_printerr ("corrupted double-linked list (not small)");   \
		    //如果前一个chunk的fd_nextsize字段为空，则说明该size的chunk不止一个
            if (FD->fd_nextsize == NULL) {              \
				//如果nextsize双向循环链表中只有P一个chunk，则直接更新链表
                if (P->fd_nextsize == P)              \

                  FD->fd_nextsize = FD->bk_nextsize = FD;         \
			//因此将FD的nextsize字段重新赋值，然后将被移除的chunk的fd_nextsize和bk_nextsize指向的chunk中的bk_nextsize和fd_nextsize字段设置为FD
                else {                    \

                    FD->fd_nextsize = P->fd_nextsize;           \

                    FD->bk_nextsize = P->bk_nextsize;           \

                    P->fd_nextsize->bk_nextsize = FD;           \

                    P->bk_nextsize->fd_nextsize = FD;           \

                  }                   \

              } else {                    \
                //若前一个chunk的fd_nextsize不为空，即该size的chunk只有一个，则直接将nextsize双向链表中的victim移除
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;         \

                P->bk_nextsize->fd_nextsize = P->fd_nextsize;         \

              }                     \

          }                     \

      }                       \

}

block


      /*
         Search for a chunk by scanning bins, starting with next largest
         bin. This search is strictly by best-fit; i.e., the smallest
         (with ties going to approximately the least recently used) chunk
         that fits is selected.

         The bitmap avoids needing to check that most blocks are nonempty.
         The particular case of skipping all bins during warm-up phases
         when no chunks have been returned yet is faster than it might look.
       */

      ++idx;
      bin = bin_at (av, idx);
      block = idx2block (idx);
      map = av->binmap[block];
      bit = idx2bit (idx);

      for (;; )
        {
          /* Skip rest of block if there are no more set bits in this block.  */
          if (bit > map || bit == 0)
            {
              do
                {
                  if (++block >= BINMAPSIZE) /* out of bins */
                    goto use_top;
                }
              while ((map = av->binmap[block]) == 0);

              bin = bin_at (av, (block << BINMAPSHIFT));
              bit = 1;
            }

          /* Advance to bin with set bit. There must be one. */
          while ((bit & map) == 0)
            {
              bin = next_bin (bin);
              bit <<= 1;
              assert (bit != 0);
            }

          /* Inspect the bin. It is likely to be non-empty */
          victim = last (bin);

          /*  If a false alarm (empty bin), clear the bit. */
          if (victim == bin)
            {
              av->binmap[block] = map &= ~bit; /* Write through */
              bin = next_bin (bin);
              bit <<= 1;
            }

          else
            {
              size = chunksize (victim);

              /*  We know the first chunk in this bin is big enough to use. */
              assert ((unsigned long) (size) >= (unsigned long) (nb));

              remainder_size = size - nb;

              /* unlink */
              unlink (av, victim, bck, fwd);

              /* Exhaust */
              if (remainder_size < MINSIZE)
                {
                  set_inuse_bit_at_offset (victim, size);
                  if (av != &main_arena)
		    set_non_main_arena (victim);
                }

              /* Split */
              else
                {
                  remainder = chunk_at_offset (victim, nb);

                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  */
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
		  if (__glibc_unlikely (fwd->bk != bck))
		    malloc_printerr ("malloc(): corrupted unsorted chunks 2");
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;

                  /* advertise as last remainder */
                  if (in_smallbin_range (nb))
                    av->last_remainder = remainder;
                  if (!in_smallbin_range (remainder_size))
                    {
                      remainder->fd_nextsize = NULL;
                      remainder->bk_nextsize = NULL;
                    }
                  set_head (victim, nb | PREV_INUSE |
                            (av != &main_arena ? NON_MAIN_ARENA : 0));
                  set_head (remainder, remainder_size | PREV_INUSE);
                  set_foot (remainder, remainder_size);
                }
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }
        }

top chunk：


    use_top:
      /*
         If large enough, split off the chunk bordering the end of memory
         (held in av->top). Note that this is in accord with the best-fit
         search rule.  In effect, av->top is treated as larger (and thus
         less well fitting) than any other available chunk since it can
         be extended to be as large as necessary (up to system
         limitations).

         We require that av->top always exists (i.e., has size >=
         MINSIZE) after initialization, so if it would otherwise be
         exhausted by current request, it is replenished. (The main
         reason for ensuring it exists is that we may need MINSIZE space
         to put in fenceposts in sysmalloc.)
       */

      victim = av->top;
      size = chunksize (victim);

      if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
        {
          remainder_size = size - nb;
          remainder = chunk_at_offset (victim, nb);
          av->top = remainder;
          set_head (victim, nb | PREV_INUSE |
                    (av != &main_arena ? NON_MAIN_ARENA : 0));
          set_head (remainder, remainder_size | PREV_INUSE);

          check_malloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }

      /* When we are using atomic ops to free fast chunks we can get
         here for all block sizes.  */
      else if (atomic_load_relaxed (&av->have_fastchunks))
        {
          malloc_consolidate (av);
          /* restore original bin index */
          if (in_smallbin_range (nb))
            idx = smallbin_index (nb);
          else
            idx = largebin_index (nb);
        }

      /*
         Otherwise, relay to handle system-dependent cases
       */
      else
        {
          void *p = sysmalloc (nb, av);
          if (p != NULL)
            alloc_perturb (p, bytes);
          return p;
        }
    }

未完待续....

作者：Esofar

出处：https://www.cnblogs.com/amof/p/18076347

版权：本作品采用「署名-非商业性使用-相同方式共享 4.0 国际」许可协议进行许可。

posted @ 2024-03-15 21:58 Tac1turn 阅读(205) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· Chrome v8漏洞分析

· pwn环境搭建指南（新生向）

· Glibc内存管理Ptmalloc2源代码分析——malloc & free

· 深入理解 malloc【转】

· ptmalloc2涉及的基础知识与基本数据结构

公告

过一个美妙的人生并不难，你选一个公认的世界难题，最好是只用一张纸和一只铅笔的数学难题，比如歌德巴赫猜想或费尔马大定理什么的，或连纸笔都不要的纯自然哲学难题，比如宇宙的本源之类，投入全部身心钻研，只问耕耘不问收获，不知不觉的专注中，一辈子也就过去了。人们常说的寄托，也就是这么回事。或是相反，把挣钱作为惟一的目标，所有的时间都想着怎么挣，也不问挣来干什么用，到死的时候像葛朗台一样抱者一堆金币说：啊，真暖和啊……所以，美妙人生的关键在于你能迷上什么东西。 ———— 刘慈欣《球状闪电》

昵称： Tac1turn
园龄： 1年8个月
粉丝： 3
关注： 7

+加关注

2025年2月

日

一

二

三

四

五

六

Loading