字符串操作、文件操作,英文词频统计预处理
作业来源:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2684
1.字符串操作:
- 解析身份证号:生日、性别、出生地等。
# -*- coding: utf-8 -*- """ Spyder Editor This is a temporary script file. """ #获取身份证号中的出生日期与性别 ID=input("请输入您的身份证号:"); while(len(ID)!=18): print("您的身份证号码输入错误"); ID = input("请重新输入您的身份证号:"); year=ID[6:10]; month=ID[10:12]; day=ID[12:14]; province=ID[0:2]; area={'11':'北京市','12':'天津市','13':'河北省','14':'山西省','15':'内蒙古自治区','21':'辽宁省','22':'吉林省','23':'黑龙江省','31':'上海市','32':'江苏省','33':'浙江省','34':'安徽省','35':'福建省','36':'江西省','37':'山东省','41':'河南省','42':'湖北省','43':'湖南省','44':'广东省','45':'广西壮族自治区','46':'海南省','50':'重庆市','51':'四川省','52':'贵州省','53':'云南省','54':'西藏自治区','61':'陕西省','62':'甘肃省','63':'青海省','64':'宁夏回族自治区','65':'新疆维吾尔自治区','71':'台湾省','81':'香港特别行政区','82':'澳门特别行政区'} print("你所查询的身份证归属地为:"+area.get(province), " 出生日期是{}-{}-{}".format(year,month,day)); sex=ID[-2]; if int(sex)%2==0: print("性别为女"); else: print("性别为男")
运行结果截图:
- 凯撒密码编码与解码
def encryption(): str_raw = input("请输入明文:") k = int(input("请输入位移值:")) str_change = str_raw.lower() str_list = list(str_change) str_list_encry = str_list i = 0 while i < len(str_list): if ord(str_list[i]) < 123-k: str_list_encry[i] = chr(ord(str_list[i]) + k) else: str_list_encry[i] = chr(ord(str_list[i]) + k - 26) i = i+1 print ("加密结果为:"+"".join(str_list_encry)) def decryption(): str_raw = input("请输入密文:") k = int(input("请输入位移值:")) str_change = str_raw.lower() str_list = list(str_change) str_list_decry = str_list i = 0 while i < len(str_list): if ord(str_list[i]) >= 97+k: str_list_decry[i] = chr(ord(str_list[i]) - k) else: str_list_decry[i] = chr(ord(str_list[i]) + 26 - k) i = i+1 print ("解密结果为:"+"".join(str_list_decry)) while True: print (u"1. 加密") print (u"2. 解密") choice = input("请选择:") if choice == "1": encryption() elif choice == "2": decryption() else: print (u"您的输入有误!")
运行结果截图:
- 网址观察与批量生成
for i in range(3,8): url='http://news.gzcc.cn/html/xiaoyuanxinwen/{}.html'.format(i) print(url)
运行结果截图:
2.英文词频统计预处理
- 下载一首英文的歌词或文章或小说。
- 将所有大写转换为小写
- 将所有其他做分隔符(,.?!)替换为空格
- 分隔出一个一个的单词
- 并统计单词出现的次数。
代码如下:
#英文歌词: str1='''I will not make the same mistakes that you did I will not let myself cause my heart so much misery I will not break the way you did You fell so hard I learned the hard way, to never let it get that far - Because of you I never stray too far from the sidewalk Because of you I learned to play on the safe side So I don't get hurt Because of you I find it hard to trust Not only me, but everyone around me Because of you I am afraid - I lose my way And it's not too long before you point it out I cannot cry Because I know that's weakness in your eyes I'm forced to fake a smile, a laugh Every day of my life My heart can't possibly break When it wasn't even whole to start with - Because of you I never stray too far from the sidewalk Because of you I learned to play on the safe side So I don't get hurt Because of you I find it hard to trust Not only me, but everyone around me Because of you I am afraid - I watched you die I heard you cry Every night in your sleep I was so young You should have known better than to lean on me You never thought of anyone else You just saw your pain And now I cry In the middle of the night Over the same damn thing - Because of you I never stray too far from the sidewalk Because of you I learned to play on the safe side so I don't get hurt Because of you I tried my hardest just to forget everything Because of you I don't know how to let anyone else in Because of you I'm ashamed of my life because it's empty Because of you I am afraid - Because of you''' #把单词全部变成小写 s1=str1.lower() print(s1) #去掉空格 str1=str1.lstrip() print(str1) #将歌词的每个单词分隔组成列表形式 print("将歌词的每个单词分隔组成列表形式:") strList=str1.split() print(strList) #计算单词出现的次数 print("计算单词出现的次数:") strSet=set(strList) for word in strSet: print(word,strList.count(word))
运行结果截图:
3.文件操作
- 词频统计:下载一首英文的歌词或文章或小说,保存为utf8文件。从文件读入文本进行处理。
代码如下:
print("词频统计") file = open("E:\\Shape of you.txt") soy=file.read(); file.close(); s=",.?!" for i in s: soy=soy.replace(i," ") lyric=soy.lower().split() print(soy) count={} for i in lyric: try: count[i]=count[i]+1 except KeyError: count[i]=1 print(count)
运行结果截图:
4.函数定义
- 加密函数
-
12345678
def get_text():
plaincode =
'abcd'
cipher=
''
for
i
in
plaincode:
cipher=cipher+chr(ord(i) + 3)
return
cipher
bigstr = get_text()
print(bigstr)
- 解密函数
-
12345678
def get_text():
plaincode =
'defg'
cipher=
''
for
i
in
plaincode:
cipher=cipher+chr(ord(i) -3)
return
cipher
bigstr = get_text()
print(bigstr)
- 读文本函数
-
123456
def get_text():
with open(
'yw.txt'
,
'r'
, encoding=
'utf8'
,errors=
'ignore'
)
as
f:
text = f.read()
return
text
bigstr = get_text()
print(bigstr)