Python内存管理机制

Python的内存管理机制:引入计数、垃圾回收、内存池机制

一、引入计数

1、变量与对象

In sum, variables are created when assigned, can reference any type of object, and must
be assigned before they are referenced. This means that you never need to declare names
used by your script, but you must initialize names before you can update them; counters,
for example, must be initialized to zero before you can add to them.
  • 变量赋值的时候才创建,它可以指向(引用)任何类型的对象
    • python里每一个东西都是对象,它们的核心就是一个结构体:PyObject
  • 变量必须先赋值,再引用。
    • 比如,你定义一个计数器,你必须初始化成0,然后才能自增。
  • 每个对象都包含两个头部字段(类型标识符和引用计数器)

关系图如下:

 Names and objects after running the assignment a = 3. Variable a becomes a reference to
the object 3. Internally, the variable is really a pointer to the object’s memory space created by running
the literal expression 3.

These links from variables to objects are called references in Python—that is, a reference
is a kind of association, implemented as a pointer in memory.1 Whenever the variables
are later used (i.e., referenced), Python automatically follows the variable-to-object
links. This is all simpler than the terminology may imply. In concrete terms:
  • Variables are entries in a system table, with spaces for links to objects.
  • Objects are pieces of allocated memory, with enough space to represent the values for which they stand.
  • References are automatically followed pointers from variables to objects.
  • objects have two header fields, a type designator and a reference counter.

In Python, things work more simply.
Names have no types; as stated earlier, types live with objects, not names. In the preceding
listing, we’ve simply changed a to reference different objects. Because variables
have no type, we haven’t actually changed the type of the variable a; we’ve simply made
the variable reference a different type of object. In fact, again, all we can ever say about
a variable in Python is that it references a particular object at a particular point in time.

  变量名没有类型,类型属于对象(因为变量引用对象,所以类型随对象)在Python中,变量是一种特定类型对象一个特定的时间点的引用。

2、共享引用

>>> a = 3
>>> b = a
>>>
>>> id(a)
1747479616
>>> id(b)
1747479616
>>>
>>> hex(id(a))
'0x68286c40'
>>> hex(id(b))
'0x68286c40'
>>> 

  

This scenario in Python—with multiple names referencing the same object—is usually
called a shared reference (and sometimes just a shared object). Note that the names a
and b are not linked to each other directly when this happens; in fact, there is no way
to ever link a variable to another variable in Python. 
Rather, both variables point to the same object via their references.

1、id() 是 python 的内置函数,用于返回对象的标识,即对象的内存地址。

>>> help(id)
Help on built-in function id in module builtins:

id(obj, /)
    Return the identity of an object.
    
    This is guaranteed to be unique among simultaneously existing objects.
    (CPython uses the object's memory address.)

2、引用所指判断

  通过is进行引用所指判断,is是用来判断两个引用所指的对象是否相同。

整数

>>> a = 256
>>> b = 256
>>> a is b
True
>>> c = 257
>>> d = 257
>>> c is d
False
>>> 

短字符串

>>> e = "Explicit"
>>> f = "Explicit"
>>> e is f
True
>>> 

长字符串

>>> g = "Beautiful is better"
>>> h = "Beautiful is better"
>>> g is h
False
>>> 

列表

>>> lst1 = [1, 2, 3]
>>> lst2 = [1, 2, 3]
>>> lst1 is lst2
False
>>> 

由运行结果可知:

  1、Python缓存整数短字符串,因此每个对象在内存中只存有一份,引用所指对象就是相同的,即使使用赋值

    语句,也只是创造新的引用,而不是对象本身;

  2、Python没有缓存长字符串、列表及其他对象,可以由多个相同的对象,可以使用赋值语句创建出新的对象。

原理:

# 两种优化机制: 代码块内的缓存机制, 小数据池。

# 代码块
代码全都是基于代码块去运行的(好比校长给一个班发布命令),一个文件就是一个代码块。
不同的文件就是不同的代码块。

# 代码块内的缓存机制
Python在执行同一个代码块的初始化对象的命令时,会检查是否其值是否已经存在,如果存在,会将其重用。
换句话说:执行同一个代码块时,遇到初始化对象的命令时,他会将初始化的这个变量与值存储在一个字典中,
在遇到新的变量时,会先在字典中查询记录,
如果有同样的记录那么它会重复使用这个字典中的之前的这个值。
所以在文件执行时(同一个代码块)会把两个变量指向同一个对象,
满足缓存机制则他们在内存中只存在一个,即:id相同。

注意:
# 机制只是在同一个代码块下!!!,才实行。
# 满足此机制的数据类型:int str bool。


# 小数据池(驻留机制,驻村机制,字符串的驻存机制,字符串的缓存机制等等)
不同代码块之间的优化。
# 适应的数据类型:str bool int
int: -5 ~256
str: 一定条件下的str满足小数据池。
bool值 全部。


# 总结:
如果你在同一个代码块中,用同一个代码块中的缓存机制。
如果你在不同代码块中,用小数据池。

# 优点:
1,节省内存。
2,提升性能。

  github上有详细的例子,wtfpython

3、查看对象的引用计数

  在Python中,每个对象都有指向该对象的引用总数 --- 引用计数

  查看对象的引用计数:sys.getrefcount()

 当对变量重新赋值时,它原来引用的值去哪啦?比如下面的例子,给 s 重新赋值 字符串 apple,6 跑哪里去啦?

>>> s = 6
>>> s = 'apple'

答案是:当变量重新赋值时,它原来指向的对象(如果没有被其他变量或对象引用的话)的空间可能被收回(垃圾回收

The answer is that in Python, whenever a name is assigned to a new object, the space
held by the prior object is reclaimed if it is not referenced by any other name or object.
This automatic reclamation of objects’ space is known as garbage collection, and makes
life much simpler for programmers of languages like Python that support it.

普通引用

>>> import sys
>>> 
>>> a = "simple"
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> sys.getrefcount(b)
3
>>> 

  注意:当使用某个引用作为参数,传递给getrefcount()时,参数实际上创建了一个临时的引用。因此,getrefcount()所得到的结果,会比期望的多1

三、垃圾回收

  当Python中的对象越来越多,占据越来越大的内存,启动垃圾回收(garbage collection),将没用的对象清除。

1、原理

  当Python的某个对象的引用计数降为0时,说明没有任何引用指向该对象,该对象就成为要被回收的垃圾。

比如某个新建对象,被分配给某个引用,对象的引用计数变为1。如果引用被删除,对象的引用计数为0,那么该对象就可以被垃圾回收。

Internally, Python accomplishes this feat by keeping a counter in every object that keeps
track of the number of references currently pointing to that object. As soon as (and
exactly when) this counter drops to zero, the object’s memory space is automatically
reclaimed. In the preceding listing, we’re assuming that each time x is assigned to a new
object, the prior object’s reference counter drops to zero, causing it to be reclaimed.

The most immediately tangible benefit of garbage collection is that it means you can
use objects liberally without ever needing to allocate or free up space in your script.
Python will clean up unused space for you as your program runs. In practice, this
eliminates a substantial amount of bookkeeping code required in lower-level languages
such as C and C++.

2、解析del

  del 可以使 对象的引用计数减 1,该表引用计数变为0,用户不可能通过任何方式接触或者动用这个对象,当垃圾回收启动时,Python扫描到这个引用计数为0的对象,就将它所占据的内存清空。

注意

  1、垃圾回收时,Python不能进行其它的任务,频繁的垃圾回收将大大降低Python的工作效率;

  2、Python只会在特定条件下,自动启动垃圾回收(垃圾对象少就没必要回收)

  3、当Python运行时,会记录其中分配对象(object allocation)和取消分配对象(object deallocation)的次数。

  当两者的差值高于某个阈值时,垃圾回收才会启动。

>>> import gc
>>> 
>>> gc.get_threshold() #gc模块中查看垃圾回收阈值的方法
(700, 10, 10)
>>> 

阈值分析:

  700 即是垃圾回收启动的阈值;

  每10 次 0代 垃圾回收,会配合 1次 1代 的垃圾回收;而每10次1代的垃圾回收,才会有1次的2代垃圾回收;

当然也是可以手动启动垃圾回收:

>>> gc.collect()       #手动启动垃圾回收
52
>>> gc.set_threshold(666, 8, 9) # gc模块中设置垃圾回收阈值的方法
>>> 

何为分代回收

  • Python将所有的对象分为0,1,2三代;
  • 所有的新建对象都是0代对象;
  • 当某一代对象经历过垃圾回收,依然存活,就被归入下一代对象。
分代技术是一种典型的以空间换时间的技术,这也正是java里的关键技术。这种思想简单点说就是:对象存在时间越长,越可能不是垃圾,应该越少去收集。
这样的思想,可以减少标记-清除机制所带来的额外操作。分代就是将回收对象分成数个代,每个代就是一个链表(集合),代进行标记-清除的时间与代内对象
存活时间成正比例关系。
从上面代码可以看出python里一共有三代,每个代的threshold值表示该代最多容纳对象的个数。默认情况下,当0代超过700,或1,2代超过10,垃圾回收机制将触发。
0代触发将清理所有三代,1代触发会清理1,2代,2代触发后只会清理自己。

标记-清除

标记-清除机制,顾名思义,首先标记对象(垃圾检测),然后清除垃圾(垃圾回收)。
首先初始所有对象标记为白色,并确定根节点对象(这些对象是不会被删除),标记它们为黑色(表示对象有效)。
将有效对象引用的对象标记为灰色(表示对象可达,但它们所引用的对象还没检查),检查完灰色对象引用的对象后,将灰色标记为黑色。
重复直到不存在灰色节点为止。最后白色结点都是需要清除的对象。

如何解决循环引用可能导致的内存泄露问题呢?

More on Python Garbage Collection
Technically speaking, Python’s garbage collection is based mainly upon reference counters, as described here; however, it also has a component that detects and reclaims objects with cyclic references in time. This component can be disabled if you’re sure that your code doesn’t create cycles, but it is enabled by default.
Circular references are a classic issue in reference count garbage collectors. Because references are implemented as pointers, it’s possible for an object to reference itself, or reference another object that does. For example, exercise 3 at the end of Part I and its solution in Appendix D show how to create a cycle easily by embedding a reference to a list within itself (e.g., L.append(L)). The same phenomenon can occur for assignments to attributes of objects created from user-defined classes. Though relatively rare, because the reference counts for such objects never drop to zero, they must be treated specially.
For more details on Python’s cycle detector, see the documentation for the gc module in Python’s library manual. The best news here is that garbage-collection-based memory management is implemented for you in Python, by people highly skilled at the task.

  答案是:

  1. 弱引用   使用weakref 模块下的 ref 方法
  2. 强制把其中一个引用变成 None
import gc
import objgraph
import sys
import weakref


def quote_demo():
    class Person:
        pass

    p = Person()  # 1
    print(sys.getrefcount(p))  # 2  first

    def log(obj):
        # 4  second 函数执行才计数,执行完释放
        print(sys.getrefcount(obj))

    log(p)  # 3

    p2 = p  # 2
    print(sys.getrefcount(p))  # 3
    del p2
    print(sys.getrefcount(p))  # 3 - 1 = 2


def circle_quote():
    # 循环引用
    class Dog:
        pass

    class Person:
        pass

    p = Person()
    d = Dog()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))

    p.pet = d
    d.master = p

    # 删除 p, d之后, 对应的对象是否被释放掉
    del p
    del d

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def solve_cirecle_quote():
    # 1. 定义了两个类
    class Person:
        def __del__(self):
            print("Person对象, 被释放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog对象, 被释放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = p

    p.pet = None  # 强制置 None
    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def sovle_circle_quote_with_weak_ref():
    # 1. 定义了两个类
    class Person:
        def __del__(self):
            print("Person对象, 被释放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog对象, 被释放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = weakref.ref(p)

    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


if __name__ == "__main__":
    quote_demo()
    circle_quote()
    solve_cirecle_quote()
    sovle_circle_quote_with_weak_ref()

四、内存池机制

  Python中有分为大内存和小内存:(256K为界限分大小内存)

  1. 大内存使用malloc进行分配
  2. 小内存使用内存池进行分配
  3. Python的内存池(金字塔)

  第+3层:最上层,用户对Python对象的直接操作

  第+1层和第+2层:内存池,有Python的接口函数PyMem_Malloc实现

    • 若请求分配的内存在1~256字节之间就使用内存池管理系统进行分配,调用malloc函数分配内存,
    • 但是每次只会分配一块大小为256K的大块内存,不会调用free函数释放内存,将该内存块留在内存池中以便下次使用

  第0层:大内存  -----> 若请求分配的内存大于256K,malloc函数分配内存,free函数释放内存。

  第-1,-2层:操作系统进行操作


posted @ 2019-06-27 22:41  Eagle_Fly  阅读(975)  评论(2编辑  收藏  举报