Split to Achieve Gain

Machine Learning - Split to Achieve Gain

Calculate Information Gain.

Task
Given a dataset and a split of the dataset, calculate the information gain using the gini impurity.

The first line of the input is a list of the target values in the initial dataset. The second line is the target values of the left split and the third line is the target values of the right split.

Round your result to 5 decimal places. You can use round(x, 5).

Input Format
Three lines of 1's and 0's separated by spaces

Output Format
Float (rounded to 5 decimal places)

Sample Input
1 0 1 0 1 0
1 1 1
0 0 0

Sample Output
0.5

 

Explanation
The initial set has 3 positive cases and 3 negative cases. Thus the gini impurity is 2*0.5*0.5=0.5.
The left set has 3 positive cases and 0 negative cases. Thus the gini impurity is 2*1*0=0.
The right set has 0 positive cases and 3 negative cases. Thus the gini impurity is 2*0*1=0.
The information gain is 0.5-0-0=0.5

====================================================================================

S = [int(x) for x in input().split()]
A = [int(x) for x in input().split()]
B = [int(x) for x in input().split()]

gini=2*(sum(S)/len(S))*(1-sum(S)/len(S))

ha=2*(sum(A)/len(A))*(1-sum(A)/len(A))
hb=2*(sum(B)/len(B))*(1-sum(B)/len(B))
ig=gini-ha*(len(A)/len(S))-hb*(len(B)/len(S))

print(round(ig,5))

 

posted @ 2023-10-30 21:04  白话  阅读(10)  评论(0编辑  收藏  举报