Python学习笔记——如何实现列表的“扁平化”,如何将多个列表组合在一起
问题
有时候我们会碰到这样的问题:需要将一个含有“子列表”的列表“扁平化,也就是说,消除子列表,将原来的列表变成一个不含子列表的列表,说起来拗口,例子如下:
有这么一个列表:
list_a = [[1, 2, 3, 4], [5, 6, 7], [8, 9], [10, 11]]
flatten(list_a) = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
不同的解决方案
for-loop
方案:
最直接的方案是使用 for-loop
:
flatten = []
for subitem in list:
flatten.extend(subitem}
列表推导式1:
还可以使用下面的列表推导式:
flatten = [sublist[i] for sublist in list_a for i in range(len(sublist))]
上面的写法等价于(但是速度更快):
for sublist in list_a:
for i in range(len(sublist)):
flatten.append(sublist[i])
列表推导式2:
还有更直接的列表推导式可以用:
flatten = [item for sublist in list_a for item in sublist]
上面的推导式等价于(但是速度更快):
for sublist in list_a:
for item in sublist:
flatten.append(item)
运行速度对比
首先测试情形1:在列表中有少量庞大的子列表:结论是:
for-loop
速度最快,列表推导式1最慢
# 定义一个包含四个子列表的列表,每个子列表包含十万个元素
In [14]: a = list(range(100000))
In [15]: b = list(range(100000))
In [16]: c = list(range(100000))
In [17]: d = list(range(100000))
In [19]: e = [a, b, c, d]
# 测试for-loop形式
In [20]: %%timeit
...: f = []
...: for sublist in e:
...: f.extend(sublist)
...:
1.82 ms ± 6.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 测试列表推导式1
In [21]: %timeit [sublist[i] for sublist in e for i in range(len(sublist))]
22.1 ms ± 529 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 测试列表推导式2
In [22]: %timeit [item for sublist in e for item in sublist]
9.7 ms ± 57.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
下面测试列表中存在大量较小子列表的情形,其结论仍然是:
for-loop
最快,列表推导式1最慢
最后还测试了列表推导式2的等价for-loop
形式,这个速度比列表推导式形式更慢
# 定义一个包含十万个子列表的列表,每个子列表包含100个元素
In [24]: a = list(range(100))
In [25]: b = [a] * 100000
# 测试for-loop形式
In [26]: %%timeit
...: flatten = []
...: for sublist in b:
...: flatten.extend(sublist)
...:
67.9 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 测试列表推导式1
In [27]: %timeit [sublist[i] for sublist in b for i in range(len(sublist))]
448 ms ± 6.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 测试列表推导式2
In [28]: %timeit [item for sublist in b for item in sublist]
236 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 测试列表推导式2的等价for-loop形式
In [29]: %%timeit
...: flatten = []
...: for sublist in b:
...: for item in sublist:
...: flatten.append(item)
...:
590 ms ± 4.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)