  • 如果你坦白,而他抵赖,则马上把你释放,他将承担全部罪行,将被判刑10年。
  • 如果你们都坦白,你们的罪行也就证实了。但由于你们有认罪表现,将都判刑4年。
  • 如果你们都不坦白,那么没有证据证明你们的抢劫罪,但我们也会以拒捕罪起诉你们,将被判刑1年。







  • 参与人(玩家/局中人):至少两个,独立
  • 策略(战略,选项):每人可能的选择
  • 回报(收益):不仅取决于自己,还取决于别人


  • 作为参与人,我怎么能得到较大回报?
  • 作为观察者(社会),总体来看,结果会如何?




  1. 个人收益是参与人关注的唯一对象。
  2. 参与人都是利己理性的(rational):追求自己的收益最大化(尽量大)——给定其他人的策略,若自己能通过改变当前策略获得更大收益,则不会采用当前策略。
  3. 每个参与人都对博弈结构(收益矩阵)有充分了解,即信息完整。
  4. 每个参与人都知道其他参与人也了解上述要点——共有知识(common knowledge)。





  • 一方有严格占优策略,另一方能推测出对方的严格占优策略,从而采取“最佳应对策略”。
  • 双方没有严格占优策略,但都有“占优策略”。可以通过排除“严格劣策略”来找出这种均衡。
  • 双方都没有占优策略和严格劣策略,但“互为最佳应对策略”的情况仍然存在,如果某种外界因素促使这一情况实现,那么双方都没有动机改变自己的策略,达到稳定状况。












一个最典型的策略函数是“一报还一报”。如果对方上一次选择合作,亦即if him[i-1]=='C',我这一次就选择合作'C';否则就选择背叛'D'

# 一报还一报的策略函数 def p7(i, me, him): if i == 0: return 'C' else: if him[i-1] == 'C': return 'C' else: return 'D'


# devil的策略函数 def devil(i, me, him): if i == 0: return 'D' else: if him[i-1] == 'C': return 'D' else: return 'C'



# 预测对方策略进而给出自己的策略函数 def p10(i, me, him): if i in [0, 1, 2, 3]: return 'C'



else: if 'D' not in him[2:]: flag = 'innocent' if flag == 'innocent': return 'D'


  • 一个可能的原因是,我上一轮也选了合作。我选择合作的次数存储在cnt1_1,当我上一轮选择合作时对方下一轮选择合作的次数存储在cnt1_2,当cnt1_2 / (cnt1_1 - 1) > 0.9时,我认为我上一轮选择合作能够触发对方下一轮选择合作。这个策略被我称为's-retri'(短时报复策略)。
  • 另一个原因是,我过去经常选择合作,对方就会选择合作。对方选择合作的次数存储在cnt2_1,ta选择合作时,有cnt2_2次,我过去行为的众数也是合作。当cnt2_2 / cnt2_1 > 0.9时,我认为,我过去经常选择合作能够触发对方下一轮选择合作。这个策略被我称为'l-retri'(长时报复策略)。
flag = None cnt1_1 = 0 cnt1_2 = 0 cnt2_1 = 0 cnt2_2 = 0 for j in range(i-1): if me[j-1] == 'C': cnt1_1 += 1 if me[j-1] == him[j]: cnt1_2 += 1 elif him[j] == 'C': cnt2_1 += 1 if j > 2 and max(set(me), key=me.count) == him[j]: cnt2_2 += 1 elif cnt1_2 / (cnt1_1 - 1) > 0.9 and cnt1_2 >= 2: flag = 's-retri' elif cnt2_1 > 0: if cnt2_2 / cnt2_1 > 0.9: flag = 'l-retri'



if flag == None: return 'D' elif flag in ['s-retri', 'l-retri']: return 'C'



import strategy # 作者并未给出全部策略,笔者认为可以自己书写(but 我还没想出这么多...) strategies = [strategy.p1, strategy.p2, strategy.p3, strategy.p4, strategy.p5, strategy.p6, strategy.p7, strategy.p8, strategy.p9, strategy.p10, strategy.p11, strategy.devil] outputF = open('./output.csv', 'w') # for scores outputF2 = open('./output.txt', 'w') # for analysis


# 模拟多轮重复博弈 def game_dual(n, i, j): me = [] me_score = 0 me_str = strategies[j-1] him = [] him_score = 0 him_str = strategies[j-1] for k in range(n): # actions me_react = me_str(k, me, him) him_react= him_str(k, him, me) me.append(me_react) him.append(him_react) # scores if me_react == 'D' and him_react == 'D': me_score += 1 him_score += 1 elif me_react == 'C' and him_react == 'D': me_score += 0 him_score += 5 elif me_react == 'D' and him_react == 'C': me_score += 5 him_score += 0 else: me_score += 3 him_score += 3 return me_score, him_score



# 循环赛设定 def game(n, devil=False): score_dict = {} print('Game on... (%i rounds in all)' % n) if devil == False: end = 12 else: end = 13 scores = [0 for i in range(end)] for i in range(1, end): for j in range(i+1, end): me_score, him_score = game_dual(n, i, j) score_dict[i, j] = [me_score, him_score] print('%i v.s. %i: %i – %i' % (i, j, me_score, him_score)) scores[i] += me_score scores[j] += him_score outputF.write('devil = %s and n = %i\n' % (str(devil), n)) outputF2.write('*** devil = %s and n = %i ***\n' % (str(devil), n)) for i in range(1, end): # total scores of each one outputF.write('%i,%i\n' % (i, scores[i])) print('%i: %i' % (i, scores[i]))


for key, value in score_dict.items(): for i in range(1, end): if i in key: i_index = key.index(i) j = key[1 - i_index] j_score = value[1 - i_index] scores_individuals[i][j] = j_score


best_react_dict = {} for key, value in scores_individuals.items(): best_reacted = [0, 0] # [j, j_score] (j reacted best against key, with a score of j_score) for j, j_score in value: if j_score > best_reacted[1]: best_reacted = [j, j_score]


elif j_score == best_reacted[1]: best_reacted.append(j) best_reacted.append(j_score)


if best_reacted[0:2] == [0, 0]: del best_reacted[0:2]


best_react_dict[key] = best_reacted print('* 对%i,p%s是最佳应答策略。' % (key, best_reacted[0:-1:2])) outputF2.write('* 对%i,p%s是最佳应答策略。\n' % (key, best_reacted[0:-1:2])) print()


# 非严格占优策略 dominant = [1] + best_react_dict[1][0:-1:2] for i in range(2, end): next_range = best_react_dict[i][0:-1:2] + [i] dominant = [x for x in dominant if x in next_range]


# 互为对方的最佳占优策略 reciprocal = [] for key, value in best_react_dict.items(): for j in range(0, len(value), 2): if value[j] > key: for k in range(0, len(best_react_dict[value[j]]), 2): if key == best_react_dict[value[j]][k]: reciprocal.append([key, value[j]])


return score_dict


# 主程序 score_dict_whole = {0: {}, 1: {}} for n in [1, 10, 100, 1000]: print('无devil, n = %i' % n) score_dict_whole[0][n] = game(n) print('有devil, n = %i' % n) score_dict_whole[1][n] = game(n, True)




n=1000, devil=True时,输出结果如下:

>>> *** devil = True and n = 1000 *** >>> * 对1,p[10]是最佳应答策略。 >>> * 对2,p[1, 3, 4, 5, 6, 7, 8, 9, 11]是最佳应答策略。 >>> * 对3,p[1, 2, 4, 5, 6, 7, 8, 9, 11]是最佳应答策略。 >>> * 对4,p[1, 2, 3, 5, 6, 7, 8, 9, 11]是最佳应答策略。 >>> * 对5,p[1, 2, 3, 4, 6, 7, 8, 9, 10, 11]是最佳应答策略。 >>> * 对6,p[1, 2, 3, 4, 5, 7, 8, 9, 11]是最佳应答策略。 >>> * 对7,p[1, 2, 3, 4, 5, 6, 8, 9, 11]是最佳应答策略。 >>> * 对8,p[1, 2, 3, 4, 5, 6, 7, 9, 11]是最佳应答策略。 >>> * 对9,p[1, 2, 3, 4, 5, 6, 7, 8, 11]是最佳应答策略。 >>> * 对10,p[12]是最佳应答策略。 >>> * 对11,p[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]是最佳应答策略。 >>> * 对12,p[5]是最佳应答策略。 >>> (非严格)占优策略:无。 >>> 互为最佳应答策略:23 | 24 | 25 | 26 | 27 | 28 | 29 | 211 | 34 | 35 | 36 | 37 | 38 | 39 | 311 | 45 | 46 | 47 | 48 | 49 | 411 | 56 | 57 | 58 | 59 | 511 | 67 | 68 | 69 | 611 | 78 | 79 | 711 | 89 | 811 | 911 |
1 2 3 4 5 6 7 8 9 10 11 devil
69100 68918 69136 75545 71928 68622 69136 69129 69136 61366 68695 27003


  • 在p4面前失分是由于对方的严格设置,即每次都遵循“一报还一报”的策略,p4才会选择与我合作。我虽然倾向于与一报还一报者合作,却没有设置得那么严格,有的时候甚至会试探性地采取背叛,因此对方不愿意和我合作。
  • 在devil面前失分,是由于devil策略被我判断成了“长时报复策略”。尽管恶魔选择与我合作仅仅是为了调戏我,但这被我判断成了,由于我过去多是合作,他才与我合作。我的判断的重点在于,合作的选择是难的,不像背叛那么安全,我要理解一个人为什么下定决心合作。但devil选择合作的理由恰恰相反——他不care丢分,只要能激怒对方就行。我的预测算法没有考虑到这种奇葩的存在。




import random # 策略集合 C合作 D背叛 def p1(i, me, him): cooperate = True if i > 0 and him[0] == 'D': cooperate = False # If the partner betrays at first, cooperation will not exist. elif i > 3: # I will cooperate in the first 4 games as long as the partner does not betray at first. check, count_twenty, count_fifty, count_hundred = True, 0, 0, 0 for j in range(i - 2): count = 0 for k in range(j, j + 2): if him[k] == 'D': count += 1 if count > 1: cooperate, check = False, False break # If the partner betrays two or more times in any three consecutive games, cooperation will no longer continue. if i > 20 and check: for k in range(i - 20, i): if him[k] == 'D': count_twenty += 1 if count_twenty > 1: cooperate, check = False, False # If the partner betrays two or more times in the last 20 games, cooperation will not continue until 20 games later. if i > 50 and check: for k in range(i - 50, i): if him[k] == 'D': count_fifty += 1 if count_fifty > 2: cooperate, check = False, False # If the partner betrays three or more times in the last 50 games, cooperation will not continue until 50 games later. if i > 100 and check: for k in range(i - 100, i): if him[k] == 'D': count_hundred += 1 if count_hundred > 4: cooperate = False # If the partner betrays five or more times in the last 100 games, cooperation will not continue until 100 games later. if cooperate: return 'C' # If cooperation, I will cooperate. else: return him[i - 1] # If no cooperation, I will do TFT. def p2(i, me, him): if i == 0: return 'C' elif him[i - 1] == 'C': return 'C' elif random.random() < 0.15: return 'C' else: return 'D' def p3(i, me, him): if i == 0: return 'C' else: if him[i - 1] == 'C': return 'C' else: return 'D' def p4(i, me, him): if i == 0: return 'C' else: test = 0 for tt in range(2, i): if him[tt - 1] != me[tt - 2]: test = 1 # 不是正向复读机 if test == 0: return 'C' if test == 1: for t in range(2, i): if him[t - 1] == me[t - 2]: test = 2 # 不是负向复读机 if test == 1: return 'D' if test == 2: for ttt in range(1, i - 1): if him[ttt] == 'D': # 不是一直合作者 test = 3 if test == 2: return 'D' if test == 3: for tttt in range(1, i - 1): if him[tttt] == 'C': # 不是一直欺骗者 test = 4 if test == 3: return 'D' if test == 4: p = 0 p = random.randint(0, 4) if p == 1: return 'C' else: return 'D' def p5(i, me, him): def threshold(me, him): if len(me) == 0: return 0.5 else: ccnt = 0 dcnt = 0 if 'C' in me: for i in range(len(me)): if me[i] == 'C': if him[i] == 'C': ccnt += 1 else: dcnt += 1 if ccnt + dcnt > 30: return ccnt / (ccnt + dcnt) return 0.5 if i == 0: return 'C' else: if him[i - 1] == 'C' and me[i - 1] == 'C': return 'C' elif him[i - 1] == 'C' and me[i - 1] == 'D': return 'D' elif him[i - 1] == 'D' and me[i - 1] == 'D': a = random.random() if a < threshold(me, him): return 'C' else: return 'D' else: return 'D' def p6(i, me, him): if i == 0 or i == 1: return 'C' else: if him[i - 1] == him[i - 2] == 'D': return 'D' else: return 'C' def p7(i, me, him): if i == 0: return 'C' else: if him[i - 1] == 'C': return 'C' else: return 'D' def p8(i, me, him): if i == 0 or i == 1: return 'C' else: return him[i - 1] def p9(i, me, him): if i == 0: return 'C' else: if him[i - 1] == 'C': return 'C' else: return 'D' def p10(i, me, him): # guess his ploy if i > 3: flag = None cnt1_1 = 0 cnt1_2 = 0 cnt2_1 = 0 cnt2_2 = 0 for j in range(i - 1): if me[j - 1] == 'C': cnt1_1 += 1 if me[j - 1] == him[j]: cnt1_2 += 1 elif him[j] == 'C': cnt2_1 += 1 if j > 2 and max(set(me), key=me.count) == him[j]: cnt2_2 += 1 if 'D' not in him[2:]: flag = 'innocent' elif cnt1_2 / (cnt1_1 - 1) > 0.9 and cnt1_2 >= 2: flag = 's-retri' elif cnt2_1 > 0: if cnt2_2 / cnt2_1 > 0.9: flag = 'l-retri' # my reaction if i in [0, 1, 2, 3]: return 'C' # elicit his ploy else: if flag == None: return 'D' elif flag == 'innocent': return 'D' elif flag in ['s-retri', 'l-retri']: return 'C' def p11(i, me, him): if i <= 2: return 'C' # 初始三次均合作 else: kindness = him.count('C') / len(him) # 计算对方历史友善度 if kindness > 0.5: # 友善度较高,以牙还牙,倾向于合作 if him[-1] == 'D': # 前一次背叛,根据前三次友善度随机背叛 betray_him = him[-3::].count('D') / 3 choice = random.random() + 0.1 if choice > betray_him: return 'C' else: return 'D' else: # 前一次合作,选择合作 return 'C' elif kindness > 0.1: # 友善度较低,尝试合作,不行就永不合作 if him[-1] == 'D': # 以牙还牙,比我更过分的就丧失信任 betray_him = him[-3::].count('D') / 3 betray_me = me[-3::].count('D') / 3 if betray_me < betray_him: # 我前三次友善度更高,我不再信任 return 'D' else: # 我前三次友善度更低,我根据前三次友善度随机背叛 choice = random.random() + 0.1 if choice > betray_him: return 'C' else: return 'D' if him[-1] == 'C': # 如果示好得到回应,根据前两次友善度合作 if me[-2] == 'C': betray_him = him[-2::].count('D') / 3 choice = random.random() + 0.1 if choice > betray_him: return 'C' else: return 'D' if me[-2] == 'D': # 如果对方主动合作,前一次合作就继续合作 return 'C' else: # 友善度过低,对方可能按照占优策略无脑背叛,选择背叛,小概率合作试探 if me[-3] == 'C' and 'D' not in him[-2::]: # 试探合作得到极友善回应,继续合作 return 'C' else: luck = random.random() if luck > 0.1: if 'C' in him[-3::]: # 存在友善表现,1/3概率合作去试探 if luck > 0.7: return 'C' else: return 'D' else: # 十分不友善,选择背叛 return 'D' else: # 0.1的概率随机合作试探 return 'C' def devil(i, me, him): if i == 0: return 'D' else: if him[i - 1] == 'C': return 'D' else: return 'C'


import Peking_University.Game_theory.strategy as strategy strategies = [strategy.p1, strategy.p2, strategy.p3, strategy.p4, strategy.p5, strategy.p6, strategy.p7, strategy.p8, strategy.p9, strategy.p10, strategy.p11, strategy.devil] outputF = open('./output.csv', 'w') # for scores outputF2 = open('./output.txt', 'w') # for analysis # 模拟多轮重复博弈 def game_dual(n, i, j): me = [] me_score = 0 me_str = strategies[i-1] him = [] him_score = 0 him_str = strategies[j-1] for k in range(n): # actions me_react = me_str(k, me, him) him_react = him_str(k, him, me) me.append(me_react) him.append(him_react) # scores if me_react == 'D' and him_react == 'D': me_score += 1 him_score += 1 elif me_react == 'C' and him_react == 'D': me_score += 0 him_score += 5 elif me_react == 'D' and him_react == 'C': me_score += 5 him_score += 0 else: me_score += 3 him_score += 3 return me_score, him_score # 循环赛设定 def game(n, devil=False): score_dict = {} print('Game on... (%i rounds in all)' % n) if devil == False: end = 12 else: end = 13 scores = [0 for i in range(end)] for i in range(1, end): for j in range(i+1, end): me_score, him_score = game_dual(n, i, j) score_dict[i, j] = [me_score, him_score] print('%i v.s. %i: %i – %i' % (i, j, me_score, him_score)) scores[i] += me_score scores[j] += him_score outputF.write('devil = %s and n = %i\n' % (str(devil), n)) outputF2.write('*** devil = %s and n = %i ***\n' % (str(devil), n)) for i in range(1, end): # total scores of each one outputF.write('%i,%i\n' % (i, scores[i])) print('%i: %i' % (i, scores[i])) scores_individuals = {} for i in range(1, end): scores_individuals[i] = {} # store how many points each one get from i for key, value in score_dict.items(): for i in range(1, end): if i in key: i_index = key.index(i) j = key[1 - i_index] j_score = value[1 - i_index] scores_individuals[i][j] = j_score best_react_dict = {} for key, value in scores_individuals.items(): best_reacted = [0, 0] # [j, j_score] (j reacted best against key, with a score of j_score) for j, j_score in value: if j_score > best_reacted[1]: best_reacted = [j, j_score] elif j_score == best_reacted[1]: best_reacted.append(j) best_reacted.append(j_score) if best_reacted[0:2] == [0, 0]: del best_reacted[0:2] best_react_dict[key] = best_reacted print('* 对%i,p%s是最佳应答策略。' % (key, best_reacted[0:-1:2])) outputF2.write('* 对%i,p%s是最佳应答策略。\n' % (key, best_reacted[0:-1:2])) print() # 非严格占优策略 dominant = [1] + best_react_dict[1][0:-1:2] for i in range(2, end): next_range = best_react_dict[i][0:-1:2] + [i] dominant = [x for x in dominant if x in next_range] print('(非严格)占优策略:', end='') print(dominant) outputF2.write('(非严格)占优策略:') if dominant == []: outputF2.write('无。\n') else: for item in dominant: outputF2.write('%i, ' % item) outputF2.write('\n') # 互为对方的最佳占优策略 reciprocal = [] for key, value in best_react_dict.items(): for j in range(0, len(value), 2): if value[j] > key: for k in range(0, len(best_react_dict[value[j]]), 2): if key == best_react_dict[value[j]][k]: reciprocal.append([key, value[j]]) print('互为最佳应答策略:', end = ' ') print(reciprocal) outputF2.write('互为最佳应答策略:') for item in reciprocal: a = item[0] b = item[1] outputF2.write('%i – %i | ' % (a,b)) outputF2.write('\n') print('- ' * 20) outputF2.write('- ' * 20) outputF2.write('\n') return score_dict # 主程序 score_dict_whole = {0: {}, 1: {}} for n in [1, 10, 100, 1000]: print('无devil, n = %i' % n) score_dict_whole[0][n] = game(n) print('有devil, n = %i' % n) score_dict_whole[1][n] = game(n, True)
