8 pandas实战-美国大选数据分析

import numpy as np
import pandas as pd
#方便大家操作,将月份和参选人以及所在政党进行定义:
months = {'JAN' : 1, 'FEB' : 2, 'MAR' : 3, 'APR' : 4, 'MAY' : 5, 'JUN' : 6,
          'JUL' : 7, 'AUG' : 8, 'SEP' : 9, 'OCT': 10, 'NOV': 11, 'DEC' : 12}
of_interest = ['Obama, Barack', 'Romney, Mitt', 'Santorum, Rick', 
               'Paul, Ron', 'Gingrich, Newt']
parties = {
  'Bachmann, Michelle': 'Republican',
  'Romney, Mitt': 'Republican',
  'Obama, Barack': 'Democrat',
  "Roemer, Charles E. 'Buddy' III": 'Reform',
  'Pawlenty, Timothy': 'Republican',
  'Johnson, Gary Earl': 'Libertarian',
  'Paul, Ron': 'Republican',
  'Santorum, Rick': 'Republican',
  'Cain, Herman': 'Republican',
  'Gingrich, Newt': 'Republican',
  'McCotter, Thaddeus G': 'Republican',
  'Huntsman, Jon': 'Republican',
  'Perry, Rick': 'Republican'           
 }

需求

  • 加载数据
  • 查看数据的基本信息
  • 指定数据截取,将如下字段的数据进行提取,其他数据舍弃
    • cand_nm :候选人姓名
    • contbr_nm : 捐赠人姓名
    • contbr_st :捐赠人所在州
    • contbr_employer : 捐赠人所在公司
    • contbr_occupation : 捐赠人职业
    • contb_receipt_amt :捐赠数额(美元)
    • contb_receipt_dt : 捐款的日期
  • 对新数据进行总览,查看是否存在缺失数据
  • 用统计学指标快速描述数值型属性的概要。
  • 空值处理。可能因为忘记填写或者保密等等原因,相关字段出现了空值,将其填充为NOT PROVIDE
  • 异常值处理。将捐款金额<=0的数据删除
  • 新建一列为各个候选人所在党派party
  • 查看party这一列中有哪些不同的元素
  • 统计party列中各个元素出现次数
  • 查看各个党派收到的政治献金总数contb_receipt_amt
  • 查看具体每天各个党派收到的政治献金总数contb_receipt_amt
  • 将表中日期格式转换为'yyyy-mm-dd'。
  • 查看老兵(捐献者职业)DISABLED VETERAN主要支持谁
df = pd.read_csv('./data/usa_election.txt')
df
/Users/bobo/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 20-JUN-11 NaN NaN NaN SA17A 736166
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 23-JUN-11 NaN NaN NaN SA17A 736166
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 05-JUL-11 NaN NaN NaN SA17A 749073
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 01-AUG-11 NaN NaN NaN SA17A 749073
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 20-JUN-11 NaN NaN NaN SA17A 736166
5 C00410118 P20002978 Bachmann, Michelle BECKMAN, JAMES SPRINGDALE AR 7.27647e+08 NONE RETIRED 500.0 23-JUN-11 NaN NaN NaN SA17A 736166
6 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 21-JUN-11 NaN NaN NaN SA17A 736166
7 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 05-JUL-11 NaN NaN NaN SA17A 749073
8 C00410118 P20002978 Bachmann, Michelle COLLINS, SARAH MESA AZ 8.52107e+08 ST. JOSEPH HOSPITAL RN 250.0 21-JUN-11 NaN NaN NaN SA17A 736166
9 C00410118 P20002978 Bachmann, Michelle COLEMAN, RONALD TUCSON AZ 8.57499e+08 RAYTHEON ELECTRICAL ENGINEER 250.0 20-JUN-11 NaN NaN NaN SA17A 736166
10 C00410118 P20002978 Bachmann, Michelle ATCHLEY, JR, KEITH MESA AZ 85215 NONE RETIRED 250.0 22-JUN-11 NaN NaN NaN SA17A 736166
11 C00410118 P20002978 Bachmann, Michelle FARNSWORTH, ROSS MESA AZ 8.52062e+08 FARNSWORTH COMPANIES LAND DEVELOPER 500.0 22-JUN-11 NaN NaN NaN SA17A 736166
12 C00410118 P20002978 Bachmann, Michelle PRESTON, CLIFFORD ORO VALLEY AZ 8.57379e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 21-JUN-11 NaN NaN NaN SA17A 736166
13 C00410118 P20002978 Bachmann, Michelle WILSON, RICHARD FLAGSTAFF AZ 8.60011e+08 NONE RETIRED 500.0 17-JUN-11 NaN NaN NaN SA17A 736166
14 C00410118 P20002978 Bachmann, Michelle MILLER, WILLIAM SCOTTSDALE AZ 8.52513e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 11-JUL-11 NaN NaN NaN SA17A 749073
15 C00410118 P20002978 Bachmann, Michelle DOLAN, WILLIAM PHOENIX AZ 8.50145e+08 VA MEDICAL CENTER PHYSICIAN 300.0 08-JUL-11 NaN NaN NaN SA17A 749073
16 C00410118 P20002978 Bachmann, Michelle REULING, RICHARD GREEN VALLEY AZ 8.56225e+08 NONE RETIRED 1000.0 05-JUL-11 NaN NaN NaN SA17A 749073
17 C00410118 P20002978 Bachmann, Michelle PETERSEN, W GILBERT AZ 85295 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 27-JUN-11 NaN NaN NaN SA17A 736166
18 C00410118 P20002978 Bachmann, Michelle ALLUMBAUGH, KATHY SANTA ANA CA 9.27051e+08 NONE RETIRED 250.0 13-JUN-11 NaN NaN NaN SA17A 736166
19 C00410118 P20002978 Bachmann, Michelle DEL POZO, JOSE DEL MAR CA 9.20143e+08 RETIRES RETIRED 300.0 16-JUN-11 NaN NaN NaN SA17A 736166
20 C00410118 P20002978 Bachmann, Michelle KIEFFER, PIERRE SACRAMENTO CA 9.58257e+08 SELF OFFICE FURNITURE BROKER 500.0 16-JUN-11 NaN NaN NaN SA17A 736166
21 C00410118 P20002978 Bachmann, Michelle HANNAH, STEPHEN SHERMAN OAKS CA 9.14132e+08 RETIRED RETIRED 250.0 16-JUN-11 NaN NaN NaN SA17A 736166
22 C00410118 P20002978 Bachmann, Michelle MINNIS, RITA MILPITAS CA 9.50358e+08 MILPITS MATERIALS MANAGER 2500.0 17-JUN-11 NaN NaN NaN SA17A 736166
23 C00410118 P20002978 Bachmann, Michelle MINNIS, RITA MILPITAS CA 9.50358e+08 MILPITS MATERIALS MANAGER 2500.0 17-JUN-11 NaN NaN NaN SA17A 736166
24 C00410118 P20002978 Bachmann, Michelle BOSTON, JOYCE RANCHO CUCAMONGA CA 9.17305e+08 INFORMATION REQUESTED INFORMATION REQUESTED 300.0 18-JUL-11 NaN NaN NaN SA17A 749073
25 C00410118 P20002978 Bachmann, Michelle MANSFIELD, LORNA WALNUT CREEK CA 9.45952e+08 NONE RETIRED 100.0 18-JUL-11 NaN NaN NaN SA17A 749073
26 C00410118 P20002978 Bachmann, Michelle EDWARDS, MARK LA JOLLA CA 9.20378e+08 NONE RETIRED 150.0 05-AUG-11 NaN NaN NaN SA17A 749073
27 C00410118 P20002978 Bachmann, Michelle PECONI, GIANFRANCO CEDAR GLEN CA 9.23211e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 01-JUL-11 NaN NaN NaN SA17A 749073
28 C00410118 P20002978 Bachmann, Michelle WILSON, CAROL R. LINCOLN CA 9.56488e+08 NONE RETIRED 500.0 05-JUL-11 NaN NaN NaN SA17A 749073
29 C00410118 P20002978 Bachmann, Michelle CASTAGNOZZI, MARY PIEDMONT CA 9.46114e+08 PIEDMONT LAUNGUAGE SCHOOL MANDARIN TEACHER 250.0 05-JUL-11 NaN NaN NaN SA17A 749073
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
536011 C00500587 P20003281 Perry, Rick ZATEZALO, DAVID G. MR. WHEELING WV 260036639 RHINO RESOURCES COAL MINER 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536012 C00500587 P20003281 Perry, Rick THALMAN, FRANK E. MR. WHEELING WV 260031672 WARWOOD ARMATURE REPAIR COMPANY VICE PRESIDENT 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536013 C00500587 P20003281 Perry, Rick THALMAN, RAYMOND V. MR. III WHEELING WV 260030401 WARWOOD ARMATURE REPAIR COMPANY PRESIDENT 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536014 C00500587 P20003281 Perry, Rick TEMPLIN, JOE MR. WHEELING WV 260034922 THE OHIO VALLEY COAL COMPANY MANAGER OF INFORMATION SYSTEMS 250.0 30-SEP-11 NaN NaN NaN SA17A 751678
536015 C00500587 P20003281 Perry, Rick WOISNET, JILL MRS. MORGANTOWN WV 265088649 SWANSON INDUSTRIES INC VP 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536016 C00500587 P20003281 Perry, Rick WOISNET, LEX MR. MORGANTOWN WV 265088649 SWANSON INDUSTRIES INC. PURCHASING MANAGER 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536017 C00500587 P20003281 Perry, Rick WHITT, RICHARD P. MR. WHEELING WV 260036021 IVALDET LABORATORIES EXECUTIVE 300.0 30-SEP-11 NaN NaN NaN SA17A 751678
536018 C00500587 P20003281 Perry, Rick WHITESCARVER, IDALEE MRS. GRAFTON WV 263549310 QUALITY HYDRAULICS INC. SECRETARY TO VICE PRESIDENT 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536019 C00500587 P20003281 Perry, Rick WHITESCARVER, JOHN E. MR. GRAFTON WV 263549310 QUALITY HYDRAULICS INC. PRESIDENT/TREASURER 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536020 C00500587 P20003281 Perry, Rick FRIESS, STEPHEN MR. JACKSON WY 830021655 FRIESS INC. CONSULTANT 1000.0 29-SEP-11 NaN NaN NaN SA17A 751678
536021 C00500587 P20003281 Perry, Rick HANSEN, STEVE MR. CHEYENNE WY 820013163 TEXAS ENERGY LLC EXECUTIVE 250.0 15-NOV-11 NaN X SEE ATTRIBUTION SA17A 761750
536022 C00500587 P20003281 Perry, Rick HARDER, ROBERT POWELL WY 824359238 NaN RETIRED 100.0 29-OCT-11 NaN NaN NaN SA17A 761750
536023 C00500587 P20003281 Perry, Rick HARDER, ROBERT POWELL WY 824359238 NaN RETIRED 100.0 12-NOV-11 NaN NaN NaN SA17A 761750
536024 C00500587 P20003281 Perry, Rick HARDER, ROBERT POWELL WY 824359238 NaN RETIRED 100.0 30-NOV-11 NaN NaN NaN SA17A 761750
536025 C00500587 P20003281 Perry, Rick KINSEY, JOAN LARAMIE WY 820709758 RETIRED RETIRED 1000.0 17-AUG-11 NaN NaN NaN SA17A 751678
536026 C00500587 P20003281 Perry, Rick SUGDEN, SUSAN MRS. JACKSON WY 830010489 SELF-EMPLOYED INVESTOR 2500.0 26-AUG-11 NaN NaN NaN SA17A 751678
536027 C00500587 P20003281 Perry, Rick LUCAS, WES TETON VILLAGE WY 83025 SIRVA CEO 2500.0 04-OCT-11 NaN NaN NaN SA17A 761750
536028 C00500587 P20003281 Perry, Rick LUCAS, ELISABET TETON VILLAGE WY 830250824 LUCAS PROPERTIES OWNER 2500.0 04-OCT-11 NaN NaN NaN SA17A 761750
536029 C00500587 P20003281 Perry, Rick TEXAS ENERGY L.L.C. CHEYENNE WY 820013163 LLC LLC 250.0 30-SEP-11 NaN X SEE ATTRIBUTION BELOW SA17A 761750
536030 C00500587 P20003281 Perry, Rick SPENCE, RUSSELL RIVERTON WY 825019711 INFORMATION REQUESTED PER BEST EFFORTS INFORMATION REQUESTED PER BEST EFFORTS 250.0 15-AUG-11 NaN NaN NaN SA17A 751678
536031 C00500587 P20003281 Perry, Rick TEXAS ENERGY L.L.C. CHEYENNE WY 820013163 NaN NaN 250.0 30-SEP-11 NaN NaN ATTRIBUTION TO PARTNERS REQUESTED SA17A 751678
536032 C00500587 P20003281 Perry, Rick SUGDEN, RICHARD G. MR. JACKSON WY 830010489 FAMILY PRACTICE ASSOCIATES DOCTOR 2500.0 26-AUG-11 NaN NaN NaN SA17A 751678
536033 C00500587 P20003281 Perry, Rick HARDER, ROBERT POWELL WY 824359238 NaN RETIRED 100.0 01-OCT-11 NaN NaN NaN SA17A 761750
536034 C00500587 P20003281 Perry, Rick ELWOOD, MIKE MR. INFO REQUESTED XX 99999 AM COAL ENGINEER 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536035 C00500587 P20003281 Perry, Rick HEFFERNAN, JILL PRINCE MRS. INFO REQUESTED XX 99999 INFORMATION REQUESTED PER BEST EFFORTS INFORMATION REQUESTED PER BEST EFFORTS 500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536036 C00500587 P20003281 Perry, Rick ANDERSON, MARILEE MRS. INFO REQUESTED XX 99999 INFORMATION REQUESTED PER BEST EFFORTS INFORMATION REQUESTED PER BEST EFFORTS 2500.0 31-AUG-11 NaN NaN NaN SA17A 751678
536037 C00500587 P20003281 Perry, Rick TOLBERT, DARYL MR. INFO REQUESTED XX 99999 T.A.C.C. LONGWALL MAINTENANCE FOREMAN 500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536038 C00500587 P20003281 Perry, Rick GRANE, BRYAN F. MR. INFO REQUESTED XX 99999 INFORMATION REQUESTED PER BEST EFFORTS INFORMATION REQUESTED PER BEST EFFORTS 500.0 29-SEP-11 NaN NaN NaN SA17A 751678
536039 C00500587 P20003281 Perry, Rick DUFFY, DAVID A. MR. INFO REQUESTED XX 99999 DUFFY EQUIPMENT COMPANY INC. BUSINESS OWNER 2500.0 30-SEP-11 NaN NaN NaN SA17A 751678
536040 C00500587 P20003281 Perry, Rick GORMAN, CHRIS D. MR. INFO REQUESTED XX 99999 INFORMATION REQUESTED PER BEST EFFORTS INFORMATION REQUESTED PER BEST EFFORTS 5000.0 29-SEP-11 REATTRIBUTION / REDESIGNATION REQUESTED (AUTOM... NaN REATTRIBUTION / REDESIGNATION REQUESTED (AUTOM... SA17A 751678

536041 rows × 16 columns

#对新数据进行总览,查看是否存在缺失数据
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536041 entries, 0 to 536040
Data columns (total 16 columns):
cmte_id              536041 non-null object
cand_id              536041 non-null object
cand_nm              536041 non-null object
contbr_nm            536041 non-null object
contbr_city          536026 non-null object
contbr_st            536040 non-null object
contbr_zip           535973 non-null object
contbr_employer      525088 non-null object
contbr_occupation    530520 non-null object
contb_receipt_amt    536041 non-null float64
contb_receipt_dt     536041 non-null object
receipt_desc         8479 non-null object
memo_cd              49718 non-null object
memo_text            52740 non-null object
form_tp              536041 non-null object
file_num             536041 non-null int64
dtypes: float64(1), int64(1), object(14)
memory usage: 65.4+ MB
#用统计学指标快速描述数值型属性的概要
df.describe()
contb_receipt_amt file_num
count 5.360410e+05 536041.000000
mean 3.750373e+02 761472.107800
std 3.564436e+03 5148.893508
min -3.080000e+04 723511.000000
25% 5.000000e+01 756218.000000
50% 1.000000e+02 763233.000000
75% 2.500000e+02 763621.000000
max 1.944042e+06 767394.000000
#空值处理。可能因为忘记填写或者保密等等原因,相关字段出现了空值,将其填充为NOT PROVIDE
df.fillna(value='NOT PROVIDE',inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 536041 entries, 0 to 536040
Data columns (total 16 columns):
cmte_id              536041 non-null object
cand_id              536041 non-null object
cand_nm              536041 non-null object
contbr_nm            536041 non-null object
contbr_city          536041 non-null object
contbr_st            536041 non-null object
contbr_zip           536041 non-null object
contbr_employer      536041 non-null object
contbr_occupation    536041 non-null object
contb_receipt_amt    536041 non-null float64
contb_receipt_dt     536041 non-null object
receipt_desc         536041 non-null object
memo_cd              536041 non-null object
memo_text            536041 non-null object
form_tp              536041 non-null object
file_num             536041 non-null int64
dtypes: float64(1), int64(1), object(14)
memory usage: 65.4+ MB
#异常值处理。将捐款金额<=0的数据删除
df['contb_receipt_amt'] <= 0 #判断哪些值为小于等于0
df.loc[df['contb_receipt_amt'] <= 0] #捐赠金额小于等于0的行数据
drop_indexs = df.loc[df['contb_receipt_amt'] <= 0].index
df.drop(labels=drop_indexs,axis=0,inplace=True)
#新建一列为各个候选人所在党派party
df['party'] = df['cand_nm'].map(parties)
df.head()
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 20-JUN-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 23-JUN-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 05-JUL-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 01-AUG-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 20-JUN-11 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
#查看party这一列中有哪些不同的元素
df['party'].unique()
array(['Republican', 'Democrat', 'Reform', 'Libertarian'], dtype=object)
#统计party列中各个元素出现次数
df['party'].value_counts()
Democrat       289999
Republican     234300
Reform           5313
Libertarian       702
Name: party, dtype: int64
#查看各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by='party')['contb_receipt_amt'].sum()
party
Democrat       8.259441e+07
Libertarian    4.132769e+05
Reform         3.429658e+05
Republican     1.251181e+08
Name: contb_receipt_amt, dtype: float64
#查看具体每天各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by=['contb_receipt_dt','party'])['contb_receipt_amt'].sum()
contb_receipt_dt  party      
01-APR-11         Reform              50.00
                  Republican       12635.00
01-AUG-11         Democrat        182198.00
                  Libertarian       1000.00
                  Reform            1847.00
                  Republican      268903.02
01-DEC-11         Democrat        651982.82
                  Libertarian        725.00
                  Reform             875.00
                  Republican      505255.96
01-FEB-11         Republican         250.00
01-JAN-11         Republican        8600.00
01-JAN-12         Democrat         74303.80
                  Reform             515.00
                  Republican       76804.72
01-JUL-11         Democrat        175364.00
                  Libertarian       2000.00
                  Reform             100.00
                  Republican      125973.72
01-JUN-11         Democrat        148409.00
                  Libertarian        500.00
                  Reform              50.00
                  Republican      435609.20
01-MAR-11         Republican        1000.00
01-MAY-11         Democrat         82644.00
                  Reform             480.00
                  Republican       28663.87
01-NOV-11         Democrat        129309.87
                  Libertarian       3000.00
                  Reform            1792.00
                                    ...    
30-OCT-11         Reform            3910.00
                  Republican       46413.16
30-SEP-11         Democrat       3409587.24
                  Libertarian        550.00
                  Reform            2050.00
                  Republican     5094824.20
31-AUG-11         Democrat        375487.44
                  Libertarian      10750.00
                  Reform             450.00
                  Republican     1038330.90
31-DEC-11         Democrat       3571793.57
                  Reform             695.00
                  Republican     1165777.72
31-JAN-11         Republican        6000.00
31-JAN-12         Democrat       1421887.31
                  Reform             150.00
                  Republican      963681.41
31-JUL-11         Democrat         20305.00
                  Reform            1066.00
                  Republican       12781.02
31-MAR-11         Reform             200.00
                  Republican       74575.00
31-MAY-11         Democrat        352005.66
                  Libertarian        250.00
                  Reform             100.00
                  Republican      313839.80
31-OCT-11         Democrat        216971.87
                  Libertarian       4250.00
                  Reform            3205.00
                  Republican      751542.36
Name: contb_receipt_amt, Length: 1183, dtype: float64
#将表中日期格式转换为'yyyy-mm-dd
def trandformDate(d):
    day,month,year = d.split('-')
    month = months[month]#将英文形式的月份转换成了数字形式的月份
    return '20'+year+'-'+str(month)+'-'+day
df['contb_receipt_dt'] = df['contb_receipt_dt'].map(trandformDate)
df.head()
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 2011-6-20 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 2011-6-23 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 2011-7-05 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 2011-8-01 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 749073 Republican
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 2011-6-20 NOT PROVIDE NOT PROVIDE NOT PROVIDE SA17A 736166 Republican

查看老兵(捐献者职业)DISABLED VETERAN主要支持谁.给谁捐赠的钱越多表示越支持谁

可以先将源数据中的老兵这个职业对应的行数据取出

df['contbr_occupation'] == 'DISABLED VETERAN'
df_old = df.loc[df['contbr_occupation'] == 'DISABLED VETERAN']

分组:根据候选人分组,对捐赠金额求和

df_old.groupby(by='cand_nm')['contb_receipt_amt'].sum()


posted @ 2021-06-16 13:34  风hua  阅读(137)  评论(0编辑  收藏  举报