需求

作业需求：
1、生成一副扑克牌（自己设计扑克牌的结构，小王和大王可以分别用14、15表示 ）

2、3个玩家(玩家也可以自己定义)
user_list = ["zhangkai","likai","wangkai"]

3、发牌规则
默认先给用户发一张牌，其中 J、Q、K、小王、大王代表的值为0.5，其他就是则就是当前的牌面值。
用户根据自己的情况判断是否继续要牌。
    要，则再给他发一张。（可以一直要牌，但是如果自己手中的牌总和超过11点，你的牌就爆掉了(牌面变成0)）
    不要，则开始给下个玩家发牌。（没有牌则则牌面默认是0）
如果用户手中的所有牌相加大于11，则表示爆了，此人的分数为0，并且自动开始给下个人发牌。

4、最终计算并获得每个玩家的分值，例如：
result = {
    "zhangkai":8,
    "likai":9,
    "wangkai":0
}

必备技术点：随机抽排
import random

total_poke_list = [("红桃", 1), ("黑桃", 2), ......,("大王", 15), ("小王", 14)]

# 随机生成一个数，当做索引。
index = random.randint(0, len(total_poke_list) - 1)
# 获取牌
print("抽到的牌为：", total_poke_list[index])
# 踢除这张牌
total_poke_list.pop(index)

print("抽完之后，剩下的牌为：", total_poke_list)

# 采分点

请补充完善你的代码
result = {}    # 存储最终各位玩家的得分
user_list = ["zhangkai","likai","wangkai"]
# 补充代码


print(result)

实现

注意，因为下面的代码包含cls清屏指令，所以，请在Windows的终端中执行，在pycharm中右键运行清屏代码不生效。

import os

poke_type = ('红桃', '黑桃', '梅花', '方块')
poke_set = set()
poke_set.update({('小王', 0.5), ('大王', 0.5)})
poke_set.update({(k + i, 0.5) for k in poke_type for i in ['J', 'Q', 'k', 'A']})
poke_set.update({(k + str(i), i) for k in poke_type for i in range(2, 11)})
# print(poke_set)

user_dict = {
    "zhangkai": {'poke': [poke_set.pop()], 'score': 0, 'msg': ''},
    "likai": {'poke': [poke_set.pop()], 'score': 0, 'msg': ''},
    "wangkai": {'poke': [poke_set.pop()], 'score': 0, 'msg': ''},
}
# print(user_dict)

for i in user_dict:
    while True:
        user_dict[i]['score'] = sum([i[1] for i in user_dict[i]['poke']])
        choice = input('尊敬的用户[{}]，你现在手里有牌[{}]，得分[{}]，要牌y/不要n\n请根据需求输入: '.format(
            i,
            ' '.join([i[0] for i in user_dict[i]['poke']]),
            sum([i[1] for i in user_dict[i]['poke']])
        )).strip()
        if not choice:
            continue
        if choice.upper() == 'N':
            os.system('cls')  # windows 清屏指令，Linux请使用 clear
            break
        elif choice.upper() == "Y":

            user_dict[i]['poke'].insert(0, poke_set.pop())
            user_dict[i]['score'] = sum([i[1] for i in user_dict[i]['poke']])
            if user_dict[i]['score'] > 11:
                print('尊敬的用户[{}],你手里有牌[{}],得分是[{}]，大于11点，爆掉了，根据规则，我们不带你玩了!!!'.format(
                    i,
                    ' '.join([i[0] for i in user_dict[i]['poke']]),
                    user_dict[i]['score']
                ))
                user_dict[i]['msg'] = '实际得分[{}]，大于11点，爆掉了'.format(sum([i[1] for i in user_dict[i]['poke']]))
                user_dict[i]['score'] = 0
                break
        else:
            print(f'你输入的是[{choice}]，本程序不支持！！！！请重新输入！！！')
print('选牌完毕，正在计算得分.....')
for i in user_dict:
    print('尊敬的用户[{}]，你的总得分是[{}]'.format(i, user_dict[i]['score']), user_dict[i]['msg'])

# 计算出赢家
winner_user = max(user_dict, key=lambda x: user_dict[x]['score'])
print("最后的赢家是: {},得分是: {}".format(winner_user, user_dict[winner_user]['score']))

欢迎斧正，that's all

Python基础
函数
模块与包
异常处理
面向对象
Others

Python基础

简述变量命名规范
- 变量名是由字母,数字,下划线组成
- 变量名不能以数字开头
- 禁止使用python中的关键词
- 变量名要区分大小写,
- 变量名不能使用中文和拼音
- 变量名要有意义
- 推荐写法:驼峰体(UserName或者userName)或下划线(user_name)
name = input(“>>>”) name变量是什么数据类型

name = input(">>")
print(type(name))   # str

if条件语句的基本结构

if
if/if
if/else
if/elif
if/elif/elif
if/elif/.../else
if 嵌套

用print打印出下面内容：

⽂能提笔安天下, 
武能上⻢定乾坤. 
⼼存谋略何⼈胜, 
古今英雄唯是君.
print('''
⽂能提笔安天下, 
武能上⻢定乾坤. 
⼼存谋略何⼈胜, 
古今英雄唯是
''')
print("⽂能提笔安天下,\n武能上⻢定乾坤.\n⼼存谋略何⼈胜,\n古今英雄唯是君.")

利用if语句写出猜大小的游戏：设定一个理想数字比如：66，让用户输入数字，如果比66大，则显示猜测的结果大了；如果比66小，则显示猜测的结果小了;只有等于66，显示猜测结果正确:

a = input('请输入数字')
if int(a) > 66:
    print('大了')
if int(a) < 66:
    print('小了')
elif int(a) = 66:
    print('对了')

提⽰⽤户输入他的年龄, 程序进⾏判断:如果⼩于10, 提⽰⼩屁孩, 如果⼤于10, ⼩于 20, 提⽰青春期叛逆的⼩屁孩. 如果⼤于20, ⼩于30. 提⽰开始定性, 开始混社会的⼩屁孩⼉, 如果⼤于30, ⼩于40. 提⽰看老⼤不⼩了, 赶紧结婚⼩屁孩⼉. 如果⼤于40, ⼩于50. 提⽰家⾥有个不听话的⼩屁孩⼉. 如果⼤于50, ⼩于60. 提⽰⾃⼰⻢上变成不听话的老屁孩⼉.如果⼤于60, ⼩于70. 提⽰活着还不错的老屁孩⼉. 如果⼤于70, ⼩于 90. 提⽰⼈⽣就快结束了的⼀个老屁孩⼉. 如果⼤于90以上. 提⽰. 再⻅了这个世界.

s = input('请输入年龄')
s1 = int(s1)
if s1 < 10:
    print('小屁孩') 
elif 10 < s1 < 20:
    print('青春叛逆的小屁孩')
elif 20 < s1 < 30:
    print('开始定性,开始混社会的小屁孩')
elif 30 < s1 < 40:
    print('老大不小了,赶紧结婚小屁孩') 
elif 40 < s1 < 50:
    print('青春叛逆的小屁孩')
elif 50 < s1 < 60:
    print('开始定性,开始混社会的小屁孩')
elif 60 < s1 < 70:
    print('老大不小了,赶 
elif 70 < s1 < 80:
    print('青春叛逆的小屁孩')
elif 80 < s1 < 90:
    print('开始定性,开始混社会的小屁孩')

判断下面print的输出结果：

print(1 > 1 or 3 < 4 or 4 > 5 and 2 > 1 and 9 > 8 or 7 < 6)  # True
print(not 2 > 1 and 3 < 4 or 4 > 5 and 2 > 1 and 9 > 8 or 7 < 6)  # False

打印菱形小星星

"""
     *
    ***
   *****
  *******
 *********
***********
 *********
  *******
   *****
    ***
     *
"""
count = 1
range_num = 13
for i in range(1, range_num):
    if range_num / 2 > i:  # 当宽的一半小于 i 说明要从小到大，每次加两个*
        sign = "*" * count
        print(sign.center(range_num, ' '), '-----', 'count == ', count, 'i == ', i, '半数： ', range_num / 2)
        count += 2
    else:
        sign = "*" * count
        print(sign.center(range_num, ' '), '+++++++', 'count == ', count, 'i == ', i, '半数： ', range_num / 2)
        count -= 2

三级菜单

需求

需求：
可依次选择进入各子菜单
可从任意一层往回退到上一层
可从任意一层退出程序
所需新知识点：列表、字典
基础需求：80%

可依次选择进入各子菜单
可从任意一层往回退到上一层
可从任意一层退出程序
所需新知识点：列表、字典

升级需求：10%

使用一个while循环，且整体代码量不超过15行
编码规范需求：10%
思路：
字典查询列表循环
while 循环加 if 判断
for 循环打印结果

基础版


menu = {
    '北京':{
        '海淀':{
            '五道口':{
                'soho':{},
                '网易':{},
                'google':{}
            },
            '中关村':{
                '爱奇艺':{},
                '汽车之家':{},
                'youku':{},
            },
            '上地':{
                '百度':{},
            },
        },
        '昌平':{
            '沙河':{
                '老男孩':{},
                '北航':{},
            },
            '天通苑':{},
            '回龙观':{},
        },
        '朝阳':{},
        '东城':{},
    },
    '上海':{
        '闵行':{
            "人民广场":{
                '炸鸡店':{}
            }
        },
        '闸北':{
            '火车战':{
                '携程':{}
            }
        },
        '浦东':{},
    },
    '山东':{},
}


exit_flag = False
while not exit_flag:
    for key in menu:
        print(key)

    choice = input(">:").strip()
    if len(choice) == 0 : continue
    if choice == 'q':
        exit_flag = True
        continue
    if choice in menu: #省存在，进入此省下一级
        while not exit_flag:
            next_layer = menu[choice]
            for key2 in next_layer:
                print(key2)
            choice2 = input(">>:").strip()
            if len(choice2) == 0: continue
            if choice2 == 'b': break
            if choice2 == 'q':
                exit_flag = True
                continue
            if choice2 in next_layer: #再进入下一层
                while not exit_flag:
                    next_layer2 = next_layer[choice2]
                    for key3 in next_layer2:
                        print(key3)
                    choice3 = input(">>>:").strip()
                    if len(choice3) == 0: continue
                    if choice3 == 'b': break
                    if choice3 == 'q':
                        exit_flag = True
                        continue

                    if choice3 in next_layer2:
                        while not exit_flag:
                            next_layer3 = next_layer2[choice3]
                            for key4 in next_layer3:
                                print(key4)

                            choice4 = input(">>>>:").strip()
                            if choice4 == 'b':break
                            if choice4 == 'q':
                                exit_flag = True
                                continue

进阶版

menu = {
    '北京':{
        '海淀':{
            '五道口':{
                'soho':{},
                '网易':{},
                'google':{}
            },
            '中关村':{
                '爱奇艺':{},
                '汽车之家':{},
                'youku':{},
            },
            '上地':{
                '百度':{},
            },
        },
        '昌平':{
            '沙河':{
                '老男孩':{},
                '北航':{},
            },
            '天通苑':{},
            '回龙观':{},
        },
        '朝阳':{},
        '东城':{},
    },
    '上海':{
        '闵行':{
            "人民广场":{
                '炸鸡店':{}
            }
        },
        '闸北':{
            '火车战':{
                '携程':{}
            }
        },
        '浦东':{},
    },
    '山东':{},
}
last_layers =  [  menu  ]  #上一层

current_layer = menu  #当前层

while True:
    for key in current_layer:
        print(key)
    choice = input(">>:").strip()
    if len(choice)==0:continue
    if choice in current_layer: #进入下一层
        last_layers.append(current_layer) #当前层添加到列表
        current_layer = current_layer[choice] #北京
    if choice == "b":
        if last_layers:
            current_layer = last_layers[-1] #取到上一层，赋值给current_layer
            last_layers.pop()
    if choice == 'q':
        break

函数

编写认证功能的装饰器，为多个函数加上认证的功能(用户名和密码)，要求只需要登录成功一次，后续的函数都能自动登录成功：

"""
认证功能的装饰器
"""

certificate_dict = {}

def cert(func):
    def wrapper(*args, **kwargs):
        if certificate_dict.get('status', False):
            print('[{}] 自动登录成功'.format(certificate_dict['user_name']))
            return func(*args, **kwargs)
        else:
            user, pwd = input("user: ").strip(), input("pwd: ").strip()
            if user.lower() == 'zhangkai' and pwd == '123':  # 这里可以将用户名和密码写在文件中
                print("[{}]登录成功".format(user))
                certificate_dict['user_name'] = user
                certificate_dict['status'] = True
                return func(*args, **kwargs)

    return wrapper

@cert
def login():
    print('login function')

@cert
def index():
    print('index page')

@cert
def back():
    print('back page')

if __name__ == '__main__':
    login()
    index()
    back()

计算1!+2!+3!+10！的阶乘结果：

def foo1(n):
    num, count = 1, 0
    for i in range(1, n + 1):
        num *= i
        count += num

    return count


def foo2(n):
    count = 0
    for i in range(1, n + 1):
        tmp = 1
        for k in range(1, i + 1):
            tmp *= k
        count += tmp

    return count


if __name__ == '__main__':
    print(foo1(10))  # 4037913
    print(foo2(10))  # 4037913

使用函数完成三次登录，要求是用户名和密码都保存在一个info.txt中，且info.txt文件中存储有多个用户名和和密码，每个用户名和密码占用一行。三次登录失败即退出程序：

"""
# info.txt

zhangkai|123
likai|234
wangkai|345
"""

PATH = r'./info.txt'

def read_file(path):
	with open(path, 'r', encoding='utf-8') as f:
		l = []
		for i in f:
			name, pwd = i.strip().split("|")
			l.append({"user": name, "pwd": pwd})
		return l

def login():
	user_list = read_file(PATH)
	count = 1
	while count <= 3:
		user = input('user: ').strip()
		pwd = input('pwd: ').strip()
		for i in user_list:
			if user == i['user'] and pwd == i['pwd']:
				print('login successful')
				break
		else:
			print('login error')
			count += 1
if __name__ == '__main__':
	login()

import os

def foo(path):
    if os.path.isdir(path):
        total = 0
        for line in os.listdir(path):
            tmp_path = os.path.join(path, line)
            if os.path.isdir(tmp_path):
                total += foo(tmp_path)
            else:
                total += 1
        return total
    else:
        exit('给定的路径不是目录')

if __name__ == '__main__':
    print(foo(r"D:\tmp"))

使用相关模块生成6为的验证码，验证码内容包括随机数字、随机小写字符、随机大写字符：

import random
import string

# 法1
def code(n=6):
    end = ''
    for i in range(n):
        num = str(random.randint(0,9))
        alpha_up = chr(random.randint(65,90))
        alpha_low = chr(random.randint(97,122))
        aim = random.choice([num,alpha_up,alpha_low])
        end += aim
    return end
print(code(8)) # 传几位就生成几位验证码

# 法2
def foo(n):
    return ''.join(random.sample(string.digits + string.ascii_lowercase + string.ascii_uppercase, n))

print(foo(6))

用map来处理列表,把列表中所有人都变成xx_666，如张开_666, name = ["张开", "李开", "王开", "赵开"]:

name = ["张开", "李开", "王开", "赵开"]
print(list(map(lambda x: x + '_666', name)))
"""
['张开_666', '李开_666', '王开_666', '赵开_666']
"""

使用map来处理列表，将列表中每个人的名字都编程以xx_666，如张开_666，tmp_list = [{'name': '张开'}, {'name': '李开'}, {'name': '王开'}, {'name': '赵开'}]:

tmp_list = [{'name': '张开'}, {'name': '李开'}, {'name': '王开'}, {'name': '赵开'}]
print(list(map(lambda x: x['name'] + '_666', tmp_list)))
"""
['张开_666', '李开_666', '王开_666', '赵开_666']
"""

将下面的列表内的元素以age升序排序：

tmp_list = [
    {'name': '张开', 'age': 18}, {'name': '李开', 'age': 8},
    {'name': '王开', 'age': 32}, {'name': '赵开', 'age': 25}
]
tmp_list.sort(key=lambda x: x['age'], reverse=False)
print(tmp_list)
"""
[{'name': '李开', 'age': 8}, {'name': '张开', 'age': 18}, {'name': '赵开', 'age': 25}, {'name': '王开', 'age': 32}]
"""

文件操作，使用Python将图片新建一个副本，比如有a.jpg,使用Python得到副本b.jpg:

with open('a.jpg', 'rb') as rf:
      with open('b.jpg', 'wb') as wf:
            wf.write(rf.read())

实现一个统计函数执行时间的装饰器

import time
import random

def timmer(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        res = func(*args, **kwargs)
        print('{} running: {}'.format(func.__name__, time.time() - start))
        return res

    return wrapper

@timmer
def foo():
    time.sleep(random.random())

foo()

写函数,接收两个数字参数,将较小的数字返回:

def foo(x, y):

	return y if x > y else x
print(foo(3, 6))
print(foo(6, 3))

模块与包

获取当前的字符串时间(2020-12-24 11:06:14)，获取三天后的字符串时间(2020-12-24 11:06:14):

import time

# 一天的时间戳时间
one_day_time = 24 * 60 * 60

# 当前时间戳时间  --> 结构化时间 --> 字符串时间
now_timestamp_time = time.time()
now_struct_time = time.localtime(now_timestamp_time)
now_format_time = time.strftime('%Y-%m-%d %H:%M:%S', now_struct_time)
print(now_format_time)  # 2020-12-24 11:06:14

# 三天后的字符串时间：当前时间戳时间 + 3天时间戳时间 --> 结构化时间 --> 字符串时间
after_three_days_timestamp_time = now_timestamp_time + one_day_time * 3
after_three_days_struct_time = time.localtime(after_three_days_timestamp_time)
after_three_days_format_time = time.strftime('%Y-%m-%d %H:%M:%S', after_three_days_struct_time)
print(after_three_days_format_time)  # 2020-12-27 11:06:14

import datetime

after_three_days_struct_time = datetime.datetime.now() + datetime.timedelta(days=3)
after_three_days_format_time = after_three_days_struct_time.strftime("%Y-%m-%d %H:%M:%S")
print(after_three_days_format_time)

请将时间'2018-11-11 11:11:11'转换成时间戳时间

import time
''''
思路：字符串时间 --> 结构化时间 --> 时间戳时间
'''
t = "2018-11-11 11:11:11"
strpt_time = time.strptime(t, '%Y-%m-%d %H:%M:%S')
print(time.mktime(strpt_time))  # 1541905871.0

如何获取当前脚本的绝对路径和父级路径：

import os
print(os.path.abspath(__file__))
print(os.path.dirname(os.path.abspath(__file__)))

回答：

# 什么是可迭代对象？什么是迭代器？可迭代对象和迭代器的区别是什么？什么是生成器，如何得到一个生成器
# 如果一个对象具有__iter__方法，则称为可迭代对象
# 可迭代对象执行__iter__方法返回的结果称为迭代器
# 可迭代对象只有__iter__方法，而迭代器则有__iter__、__next__两个方法。
# 函数体内包含有yield关键字，那么该函数被称为生成器函数，而该函数执行的结果(返回值generator_obj)为生成器
# https://www.cnblogs.com/Neeo/articles/13200309.html
# https://www.cnblogs.com/Neeo/articles/13200313.html

写函数，完成给一个列表去重的功能(不能使用set集合)，tmp_list = [1, 2, 2, 1, 3, 4, 5, 6]：

def foo(l):
    tmp_list = []
    for i in l:
        if i not in tmp_list:
            tmp_list.append(i)
    return tmp_list

print(foo([1, 2, 2, 1, 3, 4, 5, 6]))  # [1, 2, 3, 4, 5, 6]

异常处理

面向对象

Others

求最大可能，也称为求一个集合的所有的子集：

def PowerSetsBinary(items):
    #generate all combination of N items
    N = len(items)
    #enumerate the 2**N possible combinations
    for i in range(2**N):
        combo = []
        for j in range(N):
            #test jth bit of integer i
            if(i >> j ) % 2 == 1:
                combo.append(items[j])
        yield combo
for i in PowerSetsBinary('123'):
    print(i)

'''
[]
['1']
['2']
['1', '2']
['3']
['1', '3']
['2', '3']
['1', '2', '3']
'''

Python生成目录树代码，用Python实现类似Windows下的tree命令，获取目录树结构：

import os
import os.path

BRANCH = '├─'
LAST_BRANCH = '└─'
TAB = '│  '
EMPTY_TAB = '   '


def get_dir_list(path, placeholder=''):
    folder_list = [folder for folder in os.listdir(path) if os.path.isdir(os.path.join(path, folder))]
    file_list = [file for file in os.listdir(path) if os.path.isfile(os.path.join(path, file))]
    result = ''
    for folder in folder_list[:-1]:
        result += placeholder + BRANCH + folder + '\n'
        result += get_dir_list(os.path.join(path, folder), placeholder + TAB)
    if folder_list:
        result += placeholder + (BRANCH if file_list else LAST_BRANCH) + folder_list[-1] + '\n'
        result += get_dir_list(os.path.join(path, folder_list[-1]), placeholder + (TAB if file_list else EMPTY_TAB))
    for file in file_list[:-1]:
        result += placeholder + BRANCH + file + '\n'
    if file_list:
        result += placeholder + LAST_BRANCH + file_list[-1] + '\n'
    return result


if __name__ == '__main__':
    print(os.path.dirname(os.getcwd()))
    print(get_dir_list(os.path.dirname(os.getcwd())))

打印斐波那契数列，斐波那契数列指的是这样一个数列 0, 1, 1, 2, 3, 5, 8, 13,特别指出：第0项是0，第1项是第一个1。从第三项开始，每一项都等于前两项之和。

代码示例

# 1. 利用for循环动态输入一个动态的数列，如输入5，>>> [0, 1, 1, 2, 3]
num = int(input('>>>:'))
fibonacci = [0, 1]
for i in range(num - 2):
    fibonacci.append(fibonacci[-2] + fibonacci[-1])
print(fibonacci)

# 2. 使用函数的方式
def fibonacci(num):
    result = [0, 1]
    for i in range(num-2):
        result.append(result[-2] + result[-1])
    return result
print(fibonacci(11))

# 3. 递归版本
def r_fibonacci(num):
    """递归版本 """
    if num < 2:
        return num
    return r_fibonacci(num - 1) + r_fibonacci(num - 2)
 
# print(r_fibonacci(10))
for i in range(11):
    print(r_fibonacci(i))


# 4. lambda版
fibonacci = lambda n: 1 if n <= 2 else fibonacci(n - 1) + fibonacci(n - 2)
print(fibonacci(35))

# 生成器版
class Fib(object):
    def __init__(self):
        self.prev = 0
        self.curr = 1
 
    def __iter__(self):
        return self
 
    def __next__(self):
        self.curr, self.prev = self.prev + self.curr, self.curr
        return self.curr
 
fib = Fib()
for i in range(10):
    print(next(fib))

# yield版
def fib():
    prev, curr = 0, 1
    while True:
        yield curr
        curr, prev = prev + curr, curr
 
f = fib()
for i in range(10):
    print(next(f))

欢迎斧正，that's all

需求
需求分析

需求

Python3.6.8
站长之家的正则在线测试

基本需求

统计日志文件的总pv、uv
列出全天每小时的pv、uv
列出top10 uv的IP地址，以及每个IP的pv点击数
列出top10访问量最多的页面及每个页面的访问量
列出访问来源设备列表及每个设备的访问量

说明

pv：page visit，页面访问量，一次请求就是一次pv
uv：uservisit，独立用户，一个IP就算一个独立的用户

注意：没有IP的日志在这里认为是异常日志，不在统计范围之内。

日志文件附件
百度云盘链接：https://pan.baidu.com/s/1IFrl1eDjCg8FP8eS86TfOA 提取码：0ada

需求分析

首先要了解日志的意思：

# 正常日志
27.10.109.31 - - [15/Apr/2019:00:45:23 +0800] "GET /api/v1/enroll/degrees/ HTTP/1.1" 200 270 "https://www.luffycity.com/study/degree" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15"
# 访问IP：27.10.109.31
# 访问时间：15/Apr/2019:00:45:23 +0800
# 访问的URL：/api/v1/enroll/degrees/
# 访问设备：Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15

# 异常日志
ee/2/ HTTP/1.1" 200 1055 "https://www.luffycity.com/study/degree" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36"
- - - [15/Apr/2019:00:32:54 +0800] "\x16\x03\x01\x00\x9A\x01\x00\x00\x96\x03\x03g\xAE%\xCF\xCA\xC8v\x191\x90\xAA\xAD9\xBC\xFE\x9AJ]\xFC\xB8\xB4\x83\xF6\xF7\xB3+\xA3<AG\xF3\xAE\x00\x00\x1A\xC0/\xC0+\xC0\x11\xC0\x07\xC0\x13\xC0\x09\xC0\x14\xC0" 400 166 "-" "-"
- - - [15/Apr/2019:01:05:45 +0800] "HEAD / HTTP/1.1" 499 0 "-" "-"

首先要把异常日志排除掉。
然后再针对性实现每个需求。

统计日志文件的总pv、uv

-- 循环每条正常的日志，使用正则查找当前行的 ip
      - pv = 每一个可以重复的 ip 的和
      - uv = 每一个不可重复的 ip 的和

列出全天每小时的pv、uv

-- 循环每条正常的日志，使用正则查找当前行的 ip
-- 以小时为分割，然后统计每个小时的pv和uv
      - pv = 每一个可以重复的 ip 的和
      - uv = 每一个不可重复的 ip 的和
-- 难点：
      - 从日志中读取的日期时间，如何处理为可用的时间格式(格式化后的字符串/时间戳)
      - 如何组织数据结构

列出top10 uv的IP地址，以及每个IP的pv点击数

-- 循环每条正常的日志，使用正则查找当前行的 ip
      - 循环读取所有ip，取出每一个ip
            - 统计每个不重复的ip，以及统计该ip重复出现的次数
      - 最后进行降序排序，然后取最高的top10

列出top10访问量最多的页面及每个页面的访问量

-- 循环每条正常的日志，使用正则查找当前行的 page
      - 循环读取所有page，取出每一个page
            - 统计每个不重复的page，以及统计该page重复出现的次数
      - 最后进行降序排序，然后取最高的top10 的page

列出访问来源设备列表及每个设备的访问量

-- 循环每条正常的日志，使用正则查找当前行的设备
      - 统计每个不重复的设备，以及统计该设备重复出现的次数
      - 虽然需求是列出所有的设备，但我的代码中只展示了前20条数据
-- 注意，设备有多种：
      - Mozilla
      - Mozilla/5.0  Mozilla/4.0  Mozilla/3.0
      - Python-urllib/2.7
      - Go-http-client/1.1
      - -                        # 是的，只有一个 - ，我认为是未知的设备，我代码中没有处理
      - curl/7.19.7
      - Sogou web
      - Xiaomi_MCT1_TD-LTE/V1

示例代码：

import datetime
import re
from prettytable import PrettyTable

LOG_INFO = './网站访问日志.log'

# ----- 正则规则 -----
MATCH_STARTSWITH = MATCH_IP = re.compile(
    "(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)")
MATCH_DATETIME = re.compile(
    "(?P<ip>((25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d)\.(25[0-5]|2[0-4]\d|[0-1]\d{2}|[1-9]?\d))).*?\[(?P<dt>.*?)\s\+0800\]")
MATCH_EQUIPMENT = re.compile("\"\s\"(?P<equipment>.*?)\"")
MATCH_PAGE = re.compile("\"[A-Z]{3,}\s(?P<page>.*?)\sHTTP/1.1")

FILE_LIST = []


def init():
    """ 过滤符合条件的日志，存储在列表中 """

    with open(LOG_INFO, 'r', encoding='UTF-8') as f:
        for line in f:
            res = MATCH_STARTSWITH.match(line)
            if res:
                FILE_LIST.append(line)


def total_pv_uv():
    """ 统计本日志文件的总pv、uv """
    tmp_dict = {"pv": [], "uv": set(), }
    print('waiting.....')
    for i in FILE_LIST:
        res = MATCH_IP.search(i)
        if res:
            res = res.group()
            tmp_dict['pv'].append(res)
            tmp_dict['uv'].add(res)
    table = PrettyTable(['总PV', '总UV'])
    table.add_row([len(tmp_dict['pv']), len(tmp_dict['uv'])])
    print(table)


def total_24hour_pv_uv():
    """ 列出全天每小时的pv、uv数 """
    tmp_dict = {}
    print("waiting.....")
    for i in FILE_LIST:
        res = MATCH_DATETIME.search(i)
        if res:
            ip, dt = res.group("ip"), res.group("dt")
            date_time_str = datetime.datetime.strptime(dt, '%d/%b/%Y:%H:%M:%S')
            date_time_day = date_time_str.strftime("%Y-%m-%d")
            date_time_hour = date_time_str.strftime("%Y-%m-%d %H")
            if not tmp_dict.get(date_time_day, False):
                tmp_dict[date_time_day] = {}
            # 创建当天的24小时
            for hour in range(0, 24):
                # 必须用zfill在个位数之前填充0，不然生成的 key 是这样的 2019-04-14 1，和 date_time_hour生成的key 2019-04-14 01 不一致，会出问题
                tmp_key = "{} ".format(date_time_day) + "{}".format(hour).zfill(2)
                if tmp_key not in tmp_dict[date_time_day]:
                    tmp_dict[date_time_day][tmp_key] = {"pv": [], "uv": set()}
            tmp_dict[date_time_day][date_time_hour]['uv'].add(ip)
            tmp_dict[date_time_day][date_time_hour]['pv'].append(ip)
    # print(tmp_dict)
    for k in tmp_dict:
        print('[{}]日的每小时的pv和uv数统计如下: '.format(k))
        table = PrettyTable(["时间", '每个小时的pv数', '每个小时的uv数'])
        for j, v in tmp_dict[k].items():
            table.add_row([j, len(v['pv']), len(v['uv'])])
        print(table)


def total_top10_uv():
    """ 列出top 10 uv的IP地址，以及每个ip的pv点击数 """
    tmp_dict = {}
    print("waiting.....")
    for i in FILE_LIST:
        res = MATCH_IP.search(i)
        if res:
            res = res.group()
            if res in tmp_dict:
                tmp_dict[res] += 1
            else:
                tmp_dict[res] = 1
    table = PrettyTable(['top10 uv的IP', 'top10 uv的ip的pv'])
    table.align['top10 uv的IP'] = 'l'
    tmp_list = sorted(tmp_dict.items(), key=lambda x: x[1], reverse=True)[
               0:10]  # tmp_dict.items()  --> ("113.89.97.191", 13)
    for n1, n2 in tmp_list:
        table.add_row([n1, n2])
    print(table)


def total_top10_page():
    """ 列出top 10 访问量最多的页面及每个页面的访问量 """
    print("waiting.....")
    tmp_dict = {}
    for i in FILE_LIST:
        res = MATCH_PAGE.search(i)
        if res:
            page = res.group("page")
            if page in tmp_dict:
                tmp_dict[page].append(page)
            else:
                tmp_dict[page] = [page]
    table = PrettyTable(['访问量是top10的页面URL', '访问量'])
    table.align['访问量是top10的页面URL'] = 'l'
    tmp_list = sorted(tmp_dict.items(), key=lambda x: len(x[1]), reverse=True)[0:10]
    for n1, n2 in tmp_list:
        table.add_row([n1, len(n2)])
    print(table)


def total_equipment_list():
    """ 列出访问来源的设备列表及每个设备的访问量 """
    print("waiting.....")
    tmp_dict = {}
    for i in FILE_LIST:
        res = MATCH_EQUIPMENT.search(i)
        if res:
            equipment = res.group("equipment")
            if equipment in tmp_dict:
                tmp_dict[equipment].append(equipment)
            else:
                tmp_dict[equipment] = [equipment]
    table = PrettyTable(['设备来源', '访问量'])
    table.align['设备来源'] = 'l'
    # 这里仅展示前20条，你可以取消分片，展示所有
    # tmp_list = sorted(tmp_dict.items(), key=lambda x: len(x[1]), reverse=True)
    tmp_list = sorted(tmp_dict.items(), key=lambda x: len(x[1]), reverse=True)[0:20]
    for n1, n2 in tmp_list:
        table.add_row([n1, len(n2)])
    print(table)


def q():
    """ 退出 """
    exit('再来呦')


def handler():
    tmp_dict = {
        "1": ["统计本日志文件的总pv、uv", total_pv_uv],
        "2": ["列出全天每小时的pv、uv数", total_24hour_pv_uv],
        "3": ["列出top 10 uv的IP地址，以及每个ip的pv点击数", total_top10_uv],
        "4": ["列出top 10 访问量最多的页面及每个页面的访问量", total_top10_page],
        "5": ["列出访问来源的设备列表及每个设备的访问量", total_equipment_list],
        "6": ["退出", q]
    }
    while True:
        print('欢迎使用网站访问数据分析系统'.center(40, '*'))
        for k, v in tmp_dict.items():
            print(k, v[0])
        cmd = input("输入序号选择对应的操作: ").strip()
        if cmd in tmp_dict:
            tmp_dict[cmd][-1]()
        else:
            print('输入不合法')


if __name__ == "__main__":
    init()
    handler()

演示结果，注意，由于原日志文件的内容差异、代码逻辑不同、正则规则不同，大家的结果可能存在误差。

*************欢迎使用网站访问数据分析系统*************
1 统计本日志文件的总pv、uv
2 列出全天每小时的pv、uv数
3 列出top 10 uv的IP地址，以及每个ip的pv点击数
4 列出top 10 访问量最多的页面及每个页面的访问量
5 列出访问来源的设备列表及每个设备的访问量
6 退出
输入序号选择对应的操作: 1
waiting.....
+-------+------+
|  总PV | 总UV |
+-------+------+
| 31288 | 1683 |
+-------+------+
*************欢迎使用网站访问数据分析系统*************
1 统计本日志文件的总pv、uv
2 列出全天每小时的pv、uv数
3 列出top 10 uv的IP地址，以及每个ip的pv点击数
4 列出top 10 访问量最多的页面及每个页面的访问量
5 列出访问来源的设备列表及每个设备的访问量
6 退出
输入序号选择对应的操作: 2
waiting.....
[2019-04-15]日的每小时的pv和uv数统计如下: 
+---------------+----------------+----------------+
|      时间     | 每个小时的pv数 | 每个小时的uv数 |
+---------------+----------------+----------------+
| 2019-04-15 00 |      397       |       49       |
| 2019-04-15 01 |      102       |       23       |
| 2019-04-15 02 |       38       |       10       |
| 2019-04-15 03 |       48       |       16       |
| 2019-04-15 04 |       37       |       15       |
| 2019-04-15 05 |       17       |       10       |
| 2019-04-15 06 |      180       |       14       |
| 2019-04-15 07 |      305       |       39       |
| 2019-04-15 08 |      978       |      109       |
| 2019-04-15 09 |      2329      |      170       |
| 2019-04-15 10 |      2317      |      202       |
| 2019-04-15 11 |      2111      |      163       |
| 2019-04-15 12 |      1148      |      122       |
| 2019-04-15 13 |      1585      |      185       |
| 2019-04-15 14 |      2376      |      259       |
| 2019-04-15 15 |      2555      |      215       |
| 2019-04-15 16 |      2047      |      210       |
| 2019-04-15 17 |      2394      |      212       |
| 2019-04-15 18 |      1493      |      138       |
| 2019-04-15 19 |      1593      |      165       |
| 2019-04-15 20 |      2016      |      191       |
| 2019-04-15 21 |      2141      |      205       |
| 2019-04-15 22 |      1888      |      201       |
| 2019-04-15 23 |      1193      |      141       |
+---------------+----------------+----------------+
*************欢迎使用网站访问数据分析系统*************
1 统计本日志文件的总pv、uv
2 列出全天每小时的pv、uv数
3 列出top 10 uv的IP地址，以及每个ip的pv点击数
4 列出top 10 访问量最多的页面及每个页面的访问量
5 列出访问来源的设备列表及每个设备的访问量
6 退出
输入序号选择对应的操作: 3
waiting.....
+-----------------+------------------+
| top10 uv的IP    | top10 uv的ip的pv |
+-----------------+------------------+
| 221.218.214.8   |       4018       |
| 122.71.67.110   |       855        |
| 118.113.14.162  |       357        |
| 47.95.112.89    |       299        |
| 113.246.241.131 |       244        |
| 117.25.109.180  |       219        |
| 106.44.6.54     |       209        |
| 116.18.244.11   |       203        |
| 58.45.45.183    |       198        |
| 60.247.104.68   |       195        |
+-----------------+------------------+
*************欢迎使用网站访问数据分析系统*************
1 统计本日志文件的总pv、uv
2 列出全天每小时的pv、uv数
3 列出top 10 uv的IP地址，以及每个ip的pv点击数
4 列出top 10 访问量最多的页面及每个页面的访问量
5 列出访问来源的设备列表及每个设备的访问量
6 退出
输入序号选择对应的操作: 4
waiting.....
+-------------------------------------------+--------+
| 访问量是top10的页面URL                    | 访问量 |
+-------------------------------------------+--------+
| /api/v1/enroll/degrees/                   |  2047  |
| /api/v1/banners/                          |  1088  |
| /api/v1/course_sub/category/list/         |  1050  |
| /api/v1/courses/?sub_category=0&ordering= |  1028  |
| /api/v1/enroll/info/?degree_id=1          |  651   |
| /api/v1/learndata/?degree_id=1            |  622   |
| /api/v1/account/login/                    |  579   |
| /api/v1/enroll/degree/1/                  |  511   |
| /api/v1/captcha_check/                    |  436   |
| /mentor/                                  |  279   |
+-------------------------------------------+--------+
*************欢迎使用网站访问数据分析系统*************
1 统计本日志文件的总pv、uv
2 列出全天每小时的pv、uv数
3 列出top 10 uv的IP地址，以及每个ip的pv点击数
4 列出top 10 访问量最多的页面及每个页面的访问量
5 列出访问来源的设备列表及每个设备的访问量
6 退出
输入序号选择对应的操作: 5
waiting.....
+-----------------------------------------------------------------------------------------------------------------------------------------+--------+
| 设备来源                                                                                                                                | 访问量 |
+-----------------------------------------------------------------------------------------------------------------------------------------+--------+
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36                     |  2934  |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0                                                          |  1268  |
| Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36               |  1085  |
| Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36                          |  1048  |
| Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36                           |  1008  |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36                      |  974   |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36                      |  930   |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36                     |  894   |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134       |  735   |
| Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36                      |  677   |
| Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0                                                           |  434   |
| Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36                          |  425   |
| Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36                           |  414   |
| Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36                           |  406   |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763       |  381   |
| Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36                |  336   |
| Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36                          |  331   |
| Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36                          |  328   |
| Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36               |  327   |
| Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1 Mobile/15E148 Safari/604.1 |  326   |
+-----------------------------------------------------------------------------------------------------------------------------------------+--------+
*************欢迎使用网站访问数据分析系统*************
1 统计本日志文件的总pv、uv
2 列出全天每小时的pv、uv数
3 列出top 10 uv的IP地址，以及每个ip的pv点击数
4 列出top 10 访问量最多的页面及每个页面的访问量
5 列出访问来源的设备列表及每个设备的访问量
6 退出
输入序号选择对应的操作: 6
再来呦

that's all

posted @ 2023-12-21 00:54 silencio。阅读(335) 评论(0) 收藏举报

刷新页面返回顶部

silencio

逝者如斯夫,不舍昼夜

Python - 棋牌游戏11点 Python - 疯狂练习题 Python - 网站访问日志分析作业

需求

实现