决策树-信息增益
首先计算Ent,这里最好看一下西瓜书的P75页,结合着来学习
我们首先要计算信息熵Ent,然后再计算信息增益Gain
from math import log
# 数据集
dataSet = [[1, 1, 'yes'],
[1, 1, 'yes'],
[1, 0, 'no'],
[0, 1, 'no'],
[0, 1, 'no']]
# 标记
labels = ['no surfacing', 'flippers']
# 定义香农熵计算函数
def calcShannonEnt(dataSet):
dataNum = len(dataSet)
dict1 = {}
for data in dataSet:
label = data[-1]
if label not in dict1:
dict1[label] = 0
dict1[label] += 1
shannonEnt = 0
for key in dict1:
pi = dict1[key] / dataNum
# log的用法
shannonEnt -= pi * log(pi, 2)
return shannonEnt
result = calcShannonEnt(dataSet)
print(result)
0.9709505944546686