8 pandas实战-美国大选数据分析
import numpy as np
import pandas as pd
#方便大家操作,将月份和参选人以及所在政党进行定义:
months = {'JAN' : 1, 'FEB' : 2, 'MAR' : 3, 'APR' : 4, 'MAY' : 5, 'JUN' : 6,
'JUL' : 7, 'AUG' : 8, 'SEP' : 9, 'OCT': 10, 'NOV': 11, 'DEC' : 12}
of_interest = ['Obama, Barack', 'Romney, Mitt', 'Santorum, Rick',
'Paul, Ron', 'Gingrich, Newt']
parties = {
'Bachmann, Michelle': 'Republican',
'Romney, Mitt': 'Republican',
'Obama, Barack': 'Democrat',
"Roemer, Charles E. 'Buddy' III": 'Reform',
'Pawlenty, Timothy': 'Republican',
'Johnson, Gary Earl': 'Libertarian',
'Paul, Ron': 'Republican',
'Santorum, Rick': 'Republican',
'Cain, Herman': 'Republican',
'Gingrich, Newt': 'Republican',
'McCotter, Thaddeus G': 'Republican',
'Huntsman, Jon': 'Republican',
'Perry, Rick': 'Republican'
}
需求
- 加载数据
- 查看数据的基本信息
- 指定数据截取,将如下字段的数据进行提取,其他数据舍弃
- cand_nm :候选人姓名
- contbr_nm : 捐赠人姓名
- contbr_st :捐赠人所在州
- contbr_employer : 捐赠人所在公司
- contbr_occupation : 捐赠人职业
- contb_receipt_amt :捐赠数额(美元)
- contb_receipt_dt : 捐款的日期
- 对新数据进行总览,查看是否存在缺失数据
- 用统计学指标快速描述数值型属性的概要。
- 空值处理。可能因为忘记填写或者保密等等原因,相关字段出现了空值,将其填充为NOT PROVIDE
- 异常值处理。将捐款金额<=0的数据删除
- 新建一列为各个候选人所在党派party
- 查看party这一列中有哪些不同的元素
- 统计party列中各个元素出现次数
- 查看各个党派收到的政治献金总数contb_receipt_amt
- 查看具体每天各个党派收到的政治献金总数contb_receipt_amt
- 将表中日期格式转换为'yyyy-mm-dd'。
- 查看老兵(捐献者职业)DISABLED VETERAN主要支持谁
df = pd.read_csv('./data/usa_election.txt')
df
/Users/bobo/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
cmte_id | cand_id | cand_nm | contbr_nm | contbr_city | contbr_st | contbr_zip | contbr_employer | contbr_occupation | contb_receipt_amt | contb_receipt_dt | receipt_desc | memo_cd | memo_text | form_tp | file_num | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C00410118 | P20002978 | Bachmann, Michelle | HARVEY, WILLIAM | MOBILE | AL | 3.6601e+08 | RETIRED | RETIRED | 250.0 | 20-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
1 | C00410118 | P20002978 | Bachmann, Michelle | HARVEY, WILLIAM | MOBILE | AL | 3.6601e+08 | RETIRED | RETIRED | 50.0 | 23-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
2 | C00410118 | P20002978 | Bachmann, Michelle | SMITH, LANIER | LANETT | AL | 3.68633e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 05-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
3 | C00410118 | P20002978 | Bachmann, Michelle | BLEVINS, DARONDA | PIGGOTT | AR | 7.24548e+08 | NONE | RETIRED | 250.0 | 01-AUG-11 | NaN | NaN | NaN | SA17A | 749073 |
4 | C00410118 | P20002978 | Bachmann, Michelle | WARDENBURG, HAROLD | HOT SPRINGS NATION | AR | 7.19016e+08 | NONE | RETIRED | 300.0 | 20-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
5 | C00410118 | P20002978 | Bachmann, Michelle | BECKMAN, JAMES | SPRINGDALE | AR | 7.27647e+08 | NONE | RETIRED | 500.0 | 23-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
6 | C00410118 | P20002978 | Bachmann, Michelle | BLEVINS, DARONDA | PIGGOTT | AR | 7.24548e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 21-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
7 | C00410118 | P20002978 | Bachmann, Michelle | BLEVINS, DARONDA | PIGGOTT | AR | 7.24548e+08 | NONE | RETIRED | 250.0 | 05-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
8 | C00410118 | P20002978 | Bachmann, Michelle | COLLINS, SARAH | MESA | AZ | 8.52107e+08 | ST. JOSEPH HOSPITAL | RN | 250.0 | 21-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
9 | C00410118 | P20002978 | Bachmann, Michelle | COLEMAN, RONALD | TUCSON | AZ | 8.57499e+08 | RAYTHEON | ELECTRICAL ENGINEER | 250.0 | 20-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
10 | C00410118 | P20002978 | Bachmann, Michelle | ATCHLEY, JR, KEITH | MESA | AZ | 85215 | NONE | RETIRED | 250.0 | 22-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
11 | C00410118 | P20002978 | Bachmann, Michelle | FARNSWORTH, ROSS | MESA | AZ | 8.52062e+08 | FARNSWORTH COMPANIES | LAND DEVELOPER | 500.0 | 22-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
12 | C00410118 | P20002978 | Bachmann, Michelle | PRESTON, CLIFFORD | ORO VALLEY | AZ | 8.57379e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 21-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
13 | C00410118 | P20002978 | Bachmann, Michelle | WILSON, RICHARD | FLAGSTAFF | AZ | 8.60011e+08 | NONE | RETIRED | 500.0 | 17-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
14 | C00410118 | P20002978 | Bachmann, Michelle | MILLER, WILLIAM | SCOTTSDALE | AZ | 8.52513e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 11-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
15 | C00410118 | P20002978 | Bachmann, Michelle | DOLAN, WILLIAM | PHOENIX | AZ | 8.50145e+08 | VA MEDICAL CENTER | PHYSICIAN | 300.0 | 08-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
16 | C00410118 | P20002978 | Bachmann, Michelle | REULING, RICHARD | GREEN VALLEY | AZ | 8.56225e+08 | NONE | RETIRED | 1000.0 | 05-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
17 | C00410118 | P20002978 | Bachmann, Michelle | PETERSEN, W | GILBERT | AZ | 85295 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 27-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
18 | C00410118 | P20002978 | Bachmann, Michelle | ALLUMBAUGH, KATHY | SANTA ANA | CA | 9.27051e+08 | NONE | RETIRED | 250.0 | 13-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
19 | C00410118 | P20002978 | Bachmann, Michelle | DEL POZO, JOSE | DEL MAR | CA | 9.20143e+08 | RETIRES | RETIRED | 300.0 | 16-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
20 | C00410118 | P20002978 | Bachmann, Michelle | KIEFFER, PIERRE | SACRAMENTO | CA | 9.58257e+08 | SELF | OFFICE FURNITURE BROKER | 500.0 | 16-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
21 | C00410118 | P20002978 | Bachmann, Michelle | HANNAH, STEPHEN | SHERMAN OAKS | CA | 9.14132e+08 | RETIRED | RETIRED | 250.0 | 16-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
22 | C00410118 | P20002978 | Bachmann, Michelle | MINNIS, RITA | MILPITAS | CA | 9.50358e+08 | MILPITS MATERIALS | MANAGER | 2500.0 | 17-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
23 | C00410118 | P20002978 | Bachmann, Michelle | MINNIS, RITA | MILPITAS | CA | 9.50358e+08 | MILPITS MATERIALS | MANAGER | 2500.0 | 17-JUN-11 | NaN | NaN | NaN | SA17A | 736166 |
24 | C00410118 | P20002978 | Bachmann, Michelle | BOSTON, JOYCE | RANCHO CUCAMONGA | CA | 9.17305e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 300.0 | 18-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
25 | C00410118 | P20002978 | Bachmann, Michelle | MANSFIELD, LORNA | WALNUT CREEK | CA | 9.45952e+08 | NONE | RETIRED | 100.0 | 18-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
26 | C00410118 | P20002978 | Bachmann, Michelle | EDWARDS, MARK | LA JOLLA | CA | 9.20378e+08 | NONE | RETIRED | 150.0 | 05-AUG-11 | NaN | NaN | NaN | SA17A | 749073 |
27 | C00410118 | P20002978 | Bachmann, Michelle | PECONI, GIANFRANCO | CEDAR GLEN | CA | 9.23211e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 01-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
28 | C00410118 | P20002978 | Bachmann, Michelle | WILSON, CAROL R. | LINCOLN | CA | 9.56488e+08 | NONE | RETIRED | 500.0 | 05-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
29 | C00410118 | P20002978 | Bachmann, Michelle | CASTAGNOZZI, MARY | PIEDMONT | CA | 9.46114e+08 | PIEDMONT LAUNGUAGE SCHOOL | MANDARIN TEACHER | 250.0 | 05-JUL-11 | NaN | NaN | NaN | SA17A | 749073 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
536011 | C00500587 | P20003281 | Perry, Rick | ZATEZALO, DAVID G. MR. | WHEELING | WV | 260036639 | RHINO RESOURCES | COAL MINER | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536012 | C00500587 | P20003281 | Perry, Rick | THALMAN, FRANK E. MR. | WHEELING | WV | 260031672 | WARWOOD ARMATURE REPAIR COMPANY | VICE PRESIDENT | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536013 | C00500587 | P20003281 | Perry, Rick | THALMAN, RAYMOND V. MR. III | WHEELING | WV | 260030401 | WARWOOD ARMATURE REPAIR COMPANY | PRESIDENT | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536014 | C00500587 | P20003281 | Perry, Rick | TEMPLIN, JOE MR. | WHEELING | WV | 260034922 | THE OHIO VALLEY COAL COMPANY | MANAGER OF INFORMATION SYSTEMS | 250.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536015 | C00500587 | P20003281 | Perry, Rick | WOISNET, JILL MRS. | MORGANTOWN | WV | 265088649 | SWANSON INDUSTRIES INC | VP | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536016 | C00500587 | P20003281 | Perry, Rick | WOISNET, LEX MR. | MORGANTOWN | WV | 265088649 | SWANSON INDUSTRIES INC. | PURCHASING MANAGER | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536017 | C00500587 | P20003281 | Perry, Rick | WHITT, RICHARD P. MR. | WHEELING | WV | 260036021 | IVALDET LABORATORIES | EXECUTIVE | 300.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536018 | C00500587 | P20003281 | Perry, Rick | WHITESCARVER, IDALEE MRS. | GRAFTON | WV | 263549310 | QUALITY HYDRAULICS INC. | SECRETARY TO VICE PRESIDENT | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536019 | C00500587 | P20003281 | Perry, Rick | WHITESCARVER, JOHN E. MR. | GRAFTON | WV | 263549310 | QUALITY HYDRAULICS INC. | PRESIDENT/TREASURER | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536020 | C00500587 | P20003281 | Perry, Rick | FRIESS, STEPHEN MR. | JACKSON | WY | 830021655 | FRIESS INC. | CONSULTANT | 1000.0 | 29-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536021 | C00500587 | P20003281 | Perry, Rick | HANSEN, STEVE MR. | CHEYENNE | WY | 820013163 | TEXAS ENERGY LLC | EXECUTIVE | 250.0 | 15-NOV-11 | NaN | X | SEE ATTRIBUTION | SA17A | 761750 |
536022 | C00500587 | P20003281 | Perry, Rick | HARDER, ROBERT | POWELL | WY | 824359238 | NaN | RETIRED | 100.0 | 29-OCT-11 | NaN | NaN | NaN | SA17A | 761750 |
536023 | C00500587 | P20003281 | Perry, Rick | HARDER, ROBERT | POWELL | WY | 824359238 | NaN | RETIRED | 100.0 | 12-NOV-11 | NaN | NaN | NaN | SA17A | 761750 |
536024 | C00500587 | P20003281 | Perry, Rick | HARDER, ROBERT | POWELL | WY | 824359238 | NaN | RETIRED | 100.0 | 30-NOV-11 | NaN | NaN | NaN | SA17A | 761750 |
536025 | C00500587 | P20003281 | Perry, Rick | KINSEY, JOAN | LARAMIE | WY | 820709758 | RETIRED | RETIRED | 1000.0 | 17-AUG-11 | NaN | NaN | NaN | SA17A | 751678 |
536026 | C00500587 | P20003281 | Perry, Rick | SUGDEN, SUSAN MRS. | JACKSON | WY | 830010489 | SELF-EMPLOYED | INVESTOR | 2500.0 | 26-AUG-11 | NaN | NaN | NaN | SA17A | 751678 |
536027 | C00500587 | P20003281 | Perry, Rick | LUCAS, WES | TETON VILLAGE | WY | 83025 | SIRVA | CEO | 2500.0 | 04-OCT-11 | NaN | NaN | NaN | SA17A | 761750 |
536028 | C00500587 | P20003281 | Perry, Rick | LUCAS, ELISABET | TETON VILLAGE | WY | 830250824 | LUCAS PROPERTIES | OWNER | 2500.0 | 04-OCT-11 | NaN | NaN | NaN | SA17A | 761750 |
536029 | C00500587 | P20003281 | Perry, Rick | TEXAS ENERGY L.L.C. | CHEYENNE | WY | 820013163 | LLC | LLC | 250.0 | 30-SEP-11 | NaN | X | SEE ATTRIBUTION BELOW | SA17A | 761750 |
536030 | C00500587 | P20003281 | Perry, Rick | SPENCE, RUSSELL | RIVERTON | WY | 825019711 | INFORMATION REQUESTED PER BEST EFFORTS | INFORMATION REQUESTED PER BEST EFFORTS | 250.0 | 15-AUG-11 | NaN | NaN | NaN | SA17A | 751678 |
536031 | C00500587 | P20003281 | Perry, Rick | TEXAS ENERGY L.L.C. | CHEYENNE | WY | 820013163 | NaN | NaN | 250.0 | 30-SEP-11 | NaN | NaN | ATTRIBUTION TO PARTNERS REQUESTED | SA17A | 751678 |
536032 | C00500587 | P20003281 | Perry, Rick | SUGDEN, RICHARD G. MR. | JACKSON | WY | 830010489 | FAMILY PRACTICE ASSOCIATES | DOCTOR | 2500.0 | 26-AUG-11 | NaN | NaN | NaN | SA17A | 751678 |
536033 | C00500587 | P20003281 | Perry, Rick | HARDER, ROBERT | POWELL | WY | 824359238 | NaN | RETIRED | 100.0 | 01-OCT-11 | NaN | NaN | NaN | SA17A | 761750 |
536034 | C00500587 | P20003281 | Perry, Rick | ELWOOD, MIKE MR. | INFO REQUESTED | XX | 99999 | AM COAL | ENGINEER | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536035 | C00500587 | P20003281 | Perry, Rick | HEFFERNAN, JILL PRINCE MRS. | INFO REQUESTED | XX | 99999 | INFORMATION REQUESTED PER BEST EFFORTS | INFORMATION REQUESTED PER BEST EFFORTS | 500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536036 | C00500587 | P20003281 | Perry, Rick | ANDERSON, MARILEE MRS. | INFO REQUESTED | XX | 99999 | INFORMATION REQUESTED PER BEST EFFORTS | INFORMATION REQUESTED PER BEST EFFORTS | 2500.0 | 31-AUG-11 | NaN | NaN | NaN | SA17A | 751678 |
536037 | C00500587 | P20003281 | Perry, Rick | TOLBERT, DARYL MR. | INFO REQUESTED | XX | 99999 | T.A.C.C. | LONGWALL MAINTENANCE FOREMAN | 500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536038 | C00500587 | P20003281 | Perry, Rick | GRANE, BRYAN F. MR. | INFO REQUESTED | XX | 99999 | INFORMATION REQUESTED PER BEST EFFORTS | INFORMATION REQUESTED PER BEST EFFORTS | 500.0 | 29-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536039 | C00500587 | P20003281 | Perry, Rick | DUFFY, DAVID A. MR. | INFO REQUESTED | XX | 99999 | DUFFY EQUIPMENT COMPANY INC. | BUSINESS OWNER | 2500.0 | 30-SEP-11 | NaN | NaN | NaN | SA17A | 751678 |
536040 | C00500587 | P20003281 | Perry, Rick | GORMAN, CHRIS D. MR. | INFO REQUESTED | XX | 99999 | INFORMATION REQUESTED PER BEST EFFORTS | INFORMATION REQUESTED PER BEST EFFORTS | 5000.0 | 29-SEP-11 | REATTRIBUTION / REDESIGNATION REQUESTED (AUTOM... | NaN | REATTRIBUTION / REDESIGNATION REQUESTED (AUTOM... | SA17A | 751678 |
536041 rows × 16 columns
#对新数据进行总览,查看是否存在缺失数据
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536041 entries, 0 to 536040
Data columns (total 16 columns):
cmte_id 536041 non-null object
cand_id 536041 non-null object
cand_nm 536041 non-null object
contbr_nm 536041 non-null object
contbr_city 536026 non-null object
contbr_st 536040 non-null object
contbr_zip 535973 non-null object
contbr_employer 525088 non-null object
contbr_occupation 530520 non-null object
contb_receipt_amt 536041 non-null float64
contb_receipt_dt 536041 non-null object
receipt_desc 8479 non-null object
memo_cd 49718 non-null object
memo_text 52740 non-null object
form_tp 536041 non-null object
file_num 536041 non-null int64
dtypes: float64(1), int64(1), object(14)
memory usage: 65.4+ MB
#用统计学指标快速描述数值型属性的概要
df.describe()
contb_receipt_amt | file_num | |
---|---|---|
count | 5.360410e+05 | 536041.000000 |
mean | 3.750373e+02 | 761472.107800 |
std | 3.564436e+03 | 5148.893508 |
min | -3.080000e+04 | 723511.000000 |
25% | 5.000000e+01 | 756218.000000 |
50% | 1.000000e+02 | 763233.000000 |
75% | 2.500000e+02 | 763621.000000 |
max | 1.944042e+06 | 767394.000000 |
#空值处理。可能因为忘记填写或者保密等等原因,相关字段出现了空值,将其填充为NOT PROVIDE
df.fillna(value='NOT PROVIDE',inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536041 entries, 0 to 536040
Data columns (total 16 columns):
cmte_id 536041 non-null object
cand_id 536041 non-null object
cand_nm 536041 non-null object
contbr_nm 536041 non-null object
contbr_city 536041 non-null object
contbr_st 536041 non-null object
contbr_zip 536041 non-null object
contbr_employer 536041 non-null object
contbr_occupation 536041 non-null object
contb_receipt_amt 536041 non-null float64
contb_receipt_dt 536041 non-null object
receipt_desc 536041 non-null object
memo_cd 536041 non-null object
memo_text 536041 non-null object
form_tp 536041 non-null object
file_num 536041 non-null int64
dtypes: float64(1), int64(1), object(14)
memory usage: 65.4+ MB
#异常值处理。将捐款金额<=0的数据删除
df['contb_receipt_amt'] <= 0 #判断哪些值为小于等于0
df.loc[df['contb_receipt_amt'] <= 0] #捐赠金额小于等于0的行数据
drop_indexs = df.loc[df['contb_receipt_amt'] <= 0].index
df.drop(labels=drop_indexs,axis=0,inplace=True)
#新建一列为各个候选人所在党派party
df['party'] = df['cand_nm'].map(parties)
df.head()
cmte_id | cand_id | cand_nm | contbr_nm | contbr_city | contbr_st | contbr_zip | contbr_employer | contbr_occupation | contb_receipt_amt | contb_receipt_dt | receipt_desc | memo_cd | memo_text | form_tp | file_num | party | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C00410118 | P20002978 | Bachmann, Michelle | HARVEY, WILLIAM | MOBILE | AL | 3.6601e+08 | RETIRED | RETIRED | 250.0 | 20-JUN-11 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 736166 | Republican |
1 | C00410118 | P20002978 | Bachmann, Michelle | HARVEY, WILLIAM | MOBILE | AL | 3.6601e+08 | RETIRED | RETIRED | 50.0 | 23-JUN-11 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 736166 | Republican |
2 | C00410118 | P20002978 | Bachmann, Michelle | SMITH, LANIER | LANETT | AL | 3.68633e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 05-JUL-11 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 749073 | Republican |
3 | C00410118 | P20002978 | Bachmann, Michelle | BLEVINS, DARONDA | PIGGOTT | AR | 7.24548e+08 | NONE | RETIRED | 250.0 | 01-AUG-11 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 749073 | Republican |
4 | C00410118 | P20002978 | Bachmann, Michelle | WARDENBURG, HAROLD | HOT SPRINGS NATION | AR | 7.19016e+08 | NONE | RETIRED | 300.0 | 20-JUN-11 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 736166 | Republican |
#查看party这一列中有哪些不同的元素
df['party'].unique()
array(['Republican', 'Democrat', 'Reform', 'Libertarian'], dtype=object)
#统计party列中各个元素出现次数
df['party'].value_counts()
Democrat 289999
Republican 234300
Reform 5313
Libertarian 702
Name: party, dtype: int64
#查看各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by='party')['contb_receipt_amt'].sum()
party
Democrat 8.259441e+07
Libertarian 4.132769e+05
Reform 3.429658e+05
Republican 1.251181e+08
Name: contb_receipt_amt, dtype: float64
#查看具体每天各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by=['contb_receipt_dt','party'])['contb_receipt_amt'].sum()
contb_receipt_dt party
01-APR-11 Reform 50.00
Republican 12635.00
01-AUG-11 Democrat 182198.00
Libertarian 1000.00
Reform 1847.00
Republican 268903.02
01-DEC-11 Democrat 651982.82
Libertarian 725.00
Reform 875.00
Republican 505255.96
01-FEB-11 Republican 250.00
01-JAN-11 Republican 8600.00
01-JAN-12 Democrat 74303.80
Reform 515.00
Republican 76804.72
01-JUL-11 Democrat 175364.00
Libertarian 2000.00
Reform 100.00
Republican 125973.72
01-JUN-11 Democrat 148409.00
Libertarian 500.00
Reform 50.00
Republican 435609.20
01-MAR-11 Republican 1000.00
01-MAY-11 Democrat 82644.00
Reform 480.00
Republican 28663.87
01-NOV-11 Democrat 129309.87
Libertarian 3000.00
Reform 1792.00
...
30-OCT-11 Reform 3910.00
Republican 46413.16
30-SEP-11 Democrat 3409587.24
Libertarian 550.00
Reform 2050.00
Republican 5094824.20
31-AUG-11 Democrat 375487.44
Libertarian 10750.00
Reform 450.00
Republican 1038330.90
31-DEC-11 Democrat 3571793.57
Reform 695.00
Republican 1165777.72
31-JAN-11 Republican 6000.00
31-JAN-12 Democrat 1421887.31
Reform 150.00
Republican 963681.41
31-JUL-11 Democrat 20305.00
Reform 1066.00
Republican 12781.02
31-MAR-11 Reform 200.00
Republican 74575.00
31-MAY-11 Democrat 352005.66
Libertarian 250.00
Reform 100.00
Republican 313839.80
31-OCT-11 Democrat 216971.87
Libertarian 4250.00
Reform 3205.00
Republican 751542.36
Name: contb_receipt_amt, Length: 1183, dtype: float64
#将表中日期格式转换为'yyyy-mm-dd
def trandformDate(d):
day,month,year = d.split('-')
month = months[month]#将英文形式的月份转换成了数字形式的月份
return '20'+year+'-'+str(month)+'-'+day
df['contb_receipt_dt'] = df['contb_receipt_dt'].map(trandformDate)
df.head()
cmte_id | cand_id | cand_nm | contbr_nm | contbr_city | contbr_st | contbr_zip | contbr_employer | contbr_occupation | contb_receipt_amt | contb_receipt_dt | receipt_desc | memo_cd | memo_text | form_tp | file_num | party | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C00410118 | P20002978 | Bachmann, Michelle | HARVEY, WILLIAM | MOBILE | AL | 3.6601e+08 | RETIRED | RETIRED | 250.0 | 2011-6-20 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 736166 | Republican |
1 | C00410118 | P20002978 | Bachmann, Michelle | HARVEY, WILLIAM | MOBILE | AL | 3.6601e+08 | RETIRED | RETIRED | 50.0 | 2011-6-23 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 736166 | Republican |
2 | C00410118 | P20002978 | Bachmann, Michelle | SMITH, LANIER | LANETT | AL | 3.68633e+08 | INFORMATION REQUESTED | INFORMATION REQUESTED | 250.0 | 2011-7-05 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 749073 | Republican |
3 | C00410118 | P20002978 | Bachmann, Michelle | BLEVINS, DARONDA | PIGGOTT | AR | 7.24548e+08 | NONE | RETIRED | 250.0 | 2011-8-01 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 749073 | Republican |
4 | C00410118 | P20002978 | Bachmann, Michelle | WARDENBURG, HAROLD | HOT SPRINGS NATION | AR | 7.19016e+08 | NONE | RETIRED | 300.0 | 2011-6-20 | NOT PROVIDE | NOT PROVIDE | NOT PROVIDE | SA17A | 736166 | Republican |
查看老兵(捐献者职业)DISABLED VETERAN主要支持谁.给谁捐赠的钱越多表示越支持谁
可以先将源数据中的老兵这个职业对应的行数据取出
df['contbr_occupation'] == 'DISABLED VETERAN'
df_old = df.loc[df['contbr_occupation'] == 'DISABLED VETERAN']
分组:根据候选人分组,对捐赠金额求和
df_old.groupby(by='cand_nm')['contb_receipt_amt'].sum()
作者:华王
博客:https://www.cnblogs.com/huahuawang/