利用CTU-13数据集进行僵尸网络检测
写在前面,CTU-13的数据集示例:
StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes,SrcBytes,Label 2011/08/10 09:46:59.607825,1.026539,tcp,94.44.127.113,1577, ->,147.32.84.59,6881,S_RA,0,0,4,276,156,flow=Background-Established-cmpgw-CVUT 2011/08/10 09:47:00.634364,1.009595,tcp,94.44.127.113,1577, ->,147.32.84.59,6881,S_RA,0,0,4,276,156,flow=Background-Established-cmpgw-CVUT 2011/08/10 09:47:48.185538,3.056586,tcp,147.32.86.89,4768, ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt 2011/08/10 09:47:48.230897,3.111769,tcp,147.32.86.89,4788, ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt 2011/08/10 09:47:48.963351,3.083411,tcp,147.32.86.89,4850, ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt 2011/08/10 09:47:58.806814,3.097288,tcp,147.32.86.89,4866, ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt 2011/08/10 09:51:34.450457,1.048908,tcp,213.200.244.217,47908, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 09:54:55.231320,4.373526,tcp,75.105.28.60,1419, ->,147.32.84.59,6881,S_RA,0,0,4,252,132,flow=Background-Established-cmpgw-CVUT 2011/08/10 09:57:13.352114,4.827912,tcp,75.105.28.60,1491, ->,147.32.84.59,6881,S_RA,0,0,4,252,132,flow=Background-Established-cmpgw-CVUT 2011/08/10 09:58:43.301515,0.049697,tcp,178.111.79.115,41752, ->,147.32.84.229,13363,SR_SA,0,0,5,352,208,flow=Background-TCP-Established 2011/08/10 09:54:09.710772,328.361664,tcp,147.32.84.59,49185, ->,147.32.80.7,80,SRPA_SPA,0,0,7,760,520,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:00:34.864769,5.242459,tcp,75.105.28.60,1586, ->,147.32.84.59,6881,S_RA,0,0,4,252,132,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:01:16.344485,0.972390,tcp,89.31.40.106,28451, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:06:19.661695,0.923098,tcp,89.31.40.106,13717, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:07:41.514293,1.009763,tcp,188.112.70.72,1817, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:08:18.464075,0.969967,tcp,85.248.56.40,42480, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:09:36.758829,2.853907,tcp,41.188.145.202,2285, ->,147.32.84.229,13363,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:09:38.376337,2.948305,tcp,41.188.145.202,2288, ->,147.32.84.229,443,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:09:39.984069,2.956415,tcp,41.188.145.202,2291, ->,147.32.84.229,80,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:09:44.359242,0.920374,tcp,89.31.40.106,39927, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:09:39.612736,6.031078,tcp,41.188.145.202,2285, ->,147.32.84.229,13363,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:09:41.324642,6.035653,tcp,41.188.145.202,2288, ->,147.32.84.229,443,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:09:42.940484,6.028175,tcp,41.188.145.202,2291, ->,147.32.84.229,80,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:09:51.394331,0.000000,tcp,147.32.84.59,6881, ?>,188.112.70.72,1904,RA_,0,0,1,60,60,flow=Background-Attempt-cmpgw-CVUT 2011/08/10 10:09:57.619871,1.383731,tcp,213.24.237.172,15007, ->,147.32.84.109,4899,S_RA,0,0,4,244,124,flow=Background-TCP-Attempt 2011/08/10 10:10:53.596635,0.939583,tcp,85.248.56.40,42572, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:13:28.886935,3.015055,tcp,69.231.198.54,4144, ->,147.32.86.179,80,SR_SA,0,0,3,184,122,flow=Background-TCP-Established 2011/08/10 10:13:31.657243,0.987838,tcp,188.112.70.72,2057, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT 2011/08/10 10:13:42.930890,1.419550,tcp,95.153.177.13,35049, ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVU
然后僵尸网络的数据,一共4万多条,示例如下:
2011/08/10 15:48:36.603724,0.059293,udp,147.32.84.165,2079, <->,94.127.67.112,53,CON,0,0,2,226,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.607360,0.103091,udp,147.32.84.165,2079, <->,192.41.162.30,53,CON,0,0,2,288,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.611327,0.140512,udp,147.32.84.165,2079, <->,192.55.83.30,53,CON,0,0,2,323,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.615374,0.071502,udp,147.32.84.165,2079, <->,213.171.60.92,53,CON,0,0,2,240,77,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.619348,0.146849,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,332,78,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.623326,0.069379,udp,147.32.84.165,2079, <->,193.232.128.6,53,CON,0,0,2,266,73,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.627332,0.013470,udp,147.32.84.165,2079, <->,178.248.240.75,53,CON,0,0,2,249,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.631326,0.035150,udp,147.32.84.165,2079, <->,192.12.94.30,53,CON,0,0,2,332,98,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.636594,0.029000,udp,147.32.84.165,2079, <->,192.93.0.4,53,CON,0,0,2,240,80,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.640670,0.000576,udp,147.32.84.165,2079, <->,193.232.142.17,53,CON,0,0,2,240,80,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.644677,0.121641,udp,147.32.84.165,2079, <->,192.31.80.30,53,CON,0,0,2,322,93,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.649388,0.103050,udp,147.32.84.165,2079, <->,192.41.162.30,53,CON,0,0,2,332,98,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.654584,0.075283,udp,147.32.84.165,2079, <->,217.10.35.5,53,CON,0,0,2,256,80,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.658527,0.071333,udp,147.32.84.165,2079, <->,213.171.60.92,53,CON,0,0,2,240,77,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.666330,0.075548,udp,147.32.84.165,2079, <->,217.10.35.5,53,CON,0,0,2,256,80,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.667170,0.020246,udp,147.32.84.165,2079, <->,216.239.32.10,53,CON,0,0,2,246,98,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.671732,0.071367,udp,147.32.84.165,2079, <->,213.171.60.92,53,CON,0,0,2,240,77,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.693473,0.000523,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,386,73,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.693817,0.021204,udp,147.32.84.165,2079, <->,192.33.4.12,53,CON,0,0,2,434,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.711265,0.096246,udp,147.32.84.165,2079, <->,216.239.38.10,53,CON,0,0,2,168,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.725172,0.058786,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,373,72,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.732670,0.069816,udp,147.32.84.165,2079, <->,193.232.128.6,53,CON,0,0,2,218,75,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.739366,0.132555,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,388,73,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.739638,0.203287,udp,147.32.84.165,2079, <->,199.19.54.1,53,CON,0,0,2,310,72,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.753076,0.178658,udp,147.32.84.165,2079, <->,76.74.236.21,53,CON,0,0,2,259,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.767083,0.096140,udp,147.32.84.165,2079, <->,216.239.38.10,53,CON,0,0,2,236,93,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.767456,0.008897,udp,147.32.84.165,2079, <->,216.239.36.10,53,CON,0,0,2,246,98,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.797369,0.103273,udp,147.32.84.165,2079, <->,192.41.162.30,53,CON,0,0,2,260,81,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.803168,0.072956,udp,147.32.84.165,2079, <->,83.242.140.21,53,CON,0,0,2,234,75,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:36.900945,0.138787,udp,147.32.84.165,2079, ->,207.182.130.90,53,INT,0,,1,81,81,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:36.972182,0.065794,udp,147.32.84.165,2077, <->,79.137.236.5,53,CON,0,0,2,252,72,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.943562,0.000510,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.943832,0.056818,udp,147.32.84.165,2079, <->,89.208.17.22,53,CON,0,0,2,234,70,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.944078,0.065793,udp,147.32.84.165,2079, <->,95.163.69.51,53,CON,0,0,2,162,73,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.944471,0.060810,udp,147.32.84.165,2079, <->,194.226.96.8,53,CON,0,0,2,544,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.944743,0.070236,udp,147.32.84.165,2079, <->,79.174.74.74,53,CON,0,0,2,213,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.945022,0.055323,udp,147.32.84.165,2079, <->,217.16.20.30,53,CON,0,0,2,204,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:37.993311,0.065569,udp,147.32.84.165,2079, <->,95.163.69.51,53,CON,0,0,2,193,68,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:38.005950,0.060812,udp,147.32.84.165,2079, <->,194.226.96.8,53,CON,0,0,2,544,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:38.067526,0.062003,udp,147.32.84.165,2079, <->,194.85.61.20,53,CON,0,0,2,226,76,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:38.130317,0.000519,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:39.034911,0.062867,udp,147.32.84.165,2077, <->,213.180.204.213,53,CON,0,0,2,148,74,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:39.035175,0.099364,udp,147.32.84.165,2077, <->,82.146.55.155,53,CON,0,0,2,142,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:39.035453,0.000000,udp,147.32.84.165,2077, ->,89.149.254.87,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:39.124911,17.884367,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,150,75,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:39.525887,239.334076,tcp,147.32.84.165,1394, ->,212.117.171.138,65500,SA_SPA,0,0,10,604,122,flow=From-Botnet-V42-TCP-Not-Encrypted-SMTP-Private-Proxy-1 2011/08/10 15:48:39.526019,9.012423,tcp,147.32.84.165,4944, ->,188.138.90.25,25,S_,0,,3,186,186,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:39.526087,9.012417,tcp,147.32.84.165,1400, ->,74.125.93.27,25,S_,0,,3,186,186,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:40.036997,0.064627,udp,147.32.84.165,2077, <->,217.107.217.16,53,CON,0,0,2,217,83,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:40.127212,0.068327,udp,147.32.84.165,2079, <->,81.177.1.85,53,CON,0,0,2,193,73,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:40.127509,0.000000,udp,147.32.84.165,2079, ->,173.212.197.124,53,INT,0,,1,77,77,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:41.038227,0.027763,udp,147.32.84.165,2077, ->,188.40.106.4,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:41.038380,0.056562,udp,147.32.84.165,2077, <->,188.65.208.29,53,CON,0,0,2,144,72,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:41.134482,22.922115,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,136,68,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:41.134689,0.000289,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,140,70,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:41.134822,0.000535,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:42.129326,0.000542,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:43.041020,0.056619,udp,147.32.84.165,2077, <->,82.146.43.2,53,CON,0,0,2,142,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:43.143158,0.025064,udp,147.32.84.165,2079, <->,85.25.126.145,53,CON,0,0,2,270,67,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:43.143529,0.000000,udp,147.32.84.165,2079, ->,77.82.34.183,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:43.143828,0.125798,udp,147.32.84.165,2079, <->,209.190.16.83,53,CON,0,0,2,356,81,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:43.144038,1.299856,udp,147.32.84.165,2079, <->,78.46.90.36,53,CON,0,0,2,262,74,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:43.212624,0.055366,udp,147.32.84.165,2079, <->,217.16.22.30,53,CON,0,0,2,204,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:43.631387,3.004279,tcp,147.32.84.165,1389, ->,199.49.1.56,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:44.262798,0.024404,udp,147.32.84.165,2079, ->,78.159.114.121,53,INT,0,,1,70,70,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:44.275499,0.055468,udp,147.32.84.165,2079, <->,90.156.144.47,53,CON,0,0,2,193,73,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:45.043735,0.061122,udp,147.32.84.165,2077, <->,217.107.217.16,53,CON,0,0,2,217,83,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:45.333775,3.004314,tcp,147.32.84.165,1305, ->,188.138.90.25,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:45.333905,3.004330,tcp,147.32.84.165,1232, ->,64.12.139.193,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:45.444050,18.612602,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,136,68,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:45.444255,0.000280,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,140,70,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:46.045303,0.062259,udp,147.32.84.165,2077, <->,93.158.134.213,53,CON,0,0,2,148,74,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:46.135442,3.003875,tcp,147.32.84.165,1401, ->,207.115.21.22,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:46.445639,0.050659,udp,147.32.84.165,2079, <->,178.218.208.130,53,CON,0,0,2,193,68,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:46.445764,0.000000,udp,147.32.84.165,2079, ->,173.212.197.124,53,INT,0,,1,77,77,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:46.445905,0.000373,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:47.056449,0.027269,udp,147.32.84.165,2077, ->,188.40.106.4,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:47.056523,0.099087,udp,147.32.84.165,2077, <->,82.146.55.155,53,CON,0,0,2,142,71,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:47.447119,0.000349,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:48.058272,0.000000,udp,147.32.84.165,2077, ->,89.149.254.87,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS 2011/08/10 15:48:48.338346,0.000000,tcp,147.32.84.165,1081, ->,202.59.166.29,25,S_,0,,1,62,62,flow=From-Botnet-V42-TCP-Attempt-SPAM 2011/08/10 15:48:48.448462,0.000405,udp,147.32.84.165,2079, <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:49.059334,0.056870,udp,147.32.84.165,2077, <->,188.65.208.29,53,CON,0,0,2,144,72,flow=From-Botnet-V42-UDP-DNS 2011/08/10 15:48:49.449944,0.055294,udp,147.32.84.165,2079, <->,217.16.16.30,53,CON,0,0,2,204,71,flow=From-Botnet-V42-UDP-DN
2011/08/10 11:08:59.574857,0.299675,tcp,147.32.84.165,1282, ->,94.63.150.20,80,FSPA_FSPA,0,0,10,1310,767,flow=From-Botnet-V42-TCP-WEB-Established
2011/08/10 11:08:59.970514,15.650887,tcp,147.32.84.165,1283, ->,195.88.191.59,80,FSPA_FSPA,0,0,164,104023,4266,flow=From-Botnet-V42-TCP-Established-HTTP-Binary-Download-3
2011/08/10 11:09:07.692938,2.924329,tcp,147.32.84.165,1284, ->,123.194.145.64,6667,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt
2011/08/10 11:09:15.281026,0.299838,tcp,147.32.84.165,1285, ->,94.63.150.20,80,FSPA_FSPA,0,0,10,1301,758,flow=From-Botnet-V42-TCP-WEB-Established
2011/08/10 11:09:15.704792,0.330985,tcp,147.32.84.165,1286, ->,94.63.150.20,80,FSPA_FSPA,0,0,10,1310,767,flow=From-Botnet-V42-TCP-WEB-Established
2011/08/10 11:09:17.696956,1.861224,tcp,147.32.84.165,1287, ->,58.143.125.236,6667,S_RA,0,0,6,366,186,flow=From-Botnet-V42-TCP-Attempt
2011/08/10 11:09:22.564057,1.227437,tcp,147.32.84.165,1288, ->,200.171.4.222,6667,FSPA_FSPA,0,0,10,1148,590,flow=From-Botnet-V42-TCP-CC1-HTTP-Not-Encrypted
2011/08/10 11:09:23.545442,87.400864,tcp,147.32.84.165,1289, ->,212.117.171.138,65500,FSPA_FSPA,0,0,31,2627,1607,flow=From-Botnet-V42-TCP-Not-Encrypted-SMTP-Private-Proxy-1
2011/08/10 11:09:26.500381,2.943700,tcp,147.32.84.165,1290, ->,187.106.81.34,6667,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt
2011/08/10 11:09:26.571864,1.517806,udp,147.32.84.165,1025, <->,147.32.80.9,53,CON,0,0,3,370,158,flow=From-Botnet-V42-UDP-DNS
2011/08/10 11:09:27.022695,0.201243,udp,147.32.84.165,1291, <->,147.32.80.9,53,CON,0,0,2,553,78,flow=From-Botnet-V42-UDP-DNS
2011/08/10 11:09:27.225336,500.001862,tcp,147.32.84.165,1292, ->,195.113.232.98,80,SPA_FSPA,0,0,14,5498,531,flow=From-Botnet-V42-TCP-Established-HTTP-Ad-40
2011/08/10 11:09:28.090806,0.023474,udp,147.32.84.165,1293, <->,94.100.28.114,9381,CON,0,0,2,120,60,flow=From-Botnet-V42-UDP-Established
2011/08/10 11:09:28.122236,3545.593750,udp,147.32.84.165,1293, <->,95.211.58.97,8399,CON,0,0,124,7836,5847,flow=From-Botnet-V42-UDP-Established
2011/08/10 11:09:30.204903,17.036390,tcp,147.32.84.165,1294, ->,60.199.114.56,10298,SPA_FSPA,0,0,56,46078,1531,flow=From-Botnet-V42-TCP-Established
2011/08/10 11:09:30.205052,15.332468,tcp,147.32.84.165,1295, ->,77.79.4.96,41422,SPA_FSPA,0,0,58,46712,1647,flow=From-Botnet-V42-TCP-Established
2011/08/10 11:09:30.205209,15.348065,tcp,147.32.84.165,1296, ->,77.79.4.96,41422,SPA_FSPA,0,0,58,46200,1647,flow=From-Botnet-V42-TCP-Established
2011/08/10 11:09:30.205413,16.994499,tcp,147.32.84.165,1297, ->,60.199.114.56,10298,SPA_FSPA,0,0,56,46589,1530,flow=From-Botnet-V42-TCP-Established
可以看到,有广告、垃圾邮件等。上述数据的处理:
import socket, struct, sys import numpy as np import pickle def loaddata(fileName): file = open(fileName, 'r') xdata = [] ydata = [] xdataT = [] ydataT = [] flag=0 count1=0 count2=0 count3=0 count4=0 #dicts to convert protocols and state to integers protoDict = {'arp': 5, 'unas': 13, 'udp': 1, 'rtcp': 7, 'pim': 3, 'udt': 11, 'esp': 12, 'tcp' : 0, 'rarp': 14, 'ipv6-icmp': 9, 'rtp': 2, 'ipv6': 10, 'ipx/spx': 6, 'icmp': 4, 'igmp' : 8} stateDict = {'': 1, 'FSR_SA': 30, '_FSA': 296, 'FSRPA_FSA': 77, 'SPA_SA': 31, 'FSA_SRA': 1181, 'FPA_R': 46, 'SPAC_SPA': 37, 'FPAC_FPA': 2, '_R': 1, 'FPA_FPA': 784, 'FPA_FA': 66, '_FSRPA': 1, 'URFIL': 431, 'FRPA_PA': 5, '_RA': 2, 'SA_A': 2, 'SA_RA': 125, 'FA_FPA': 17, 'FA_RA': 14, 'PA_FPA': 48, 'URHPRO': 380, 'FSRPA_SRA': 8, 'R_':541, 'DCE': 5, 'SA_R': 1674, 'SA_': 4295, 'RPA_FSPA': 4, 'FA_A': 17, 'FSPA_FSPAC': 7, 'RA_': 2230, 'FSRPA_SA': 255, 'NNS': 47, 'SRPA_FSPAC': 1, 'RPA_FPA': 42, 'FRA_R': 10, 'FSPAC_FSPA': 86, 'RPA_R': 3, '_FPA': 5, 'SREC_SA': 1, 'URN': 339, 'URO': 6, 'URH': 3593, 'MRQ': 4, 'SR_FSA': 1, 'SPA_SRPAC': 1, 'URP': 23598, 'RPA_A': 1, 'FRA_': 351, 'FSPA_SRA': 91, 'FSA_FSA': 26138, 'PA_': 149, 'FSRA_FSPA': 798, 'FSPAC_FSA': 11, 'SRPA_SRPA': 176, 'SA_SA': 33, 'FSPAC_SPA': 1, 'SRA_RA': 78, 'RPAC_PA': 1, 'FRPA_R': 1, 'SPA_SPA': 2989, 'PA_RA': 3, 'SPA_SRPA': 4185, 'RA_FA': 8, 'FSPAC_SRPA': 1, 'SPA_FSA': 1, 'FPA_FSRPA': 3, 'SRPA_FSA': 379, 'FPA_FRA': 7, 'S_SRA': 81, 'FSA_SA': 6, 'State': 1, 'SRA_SRA': 38, 'S_FA': 2, 'FSRPAC_SPA': 7, 'SRPA_FSPA': 35460, 'FPA_A': 1, 'FSA_FPA': 3, 'FRPA_RA': 1, 'FSAU_SA': 1, 'FSPA_FSRPA': 10560, 'SA_FSA': 358, 'FA_FRA': 8, 'FSRPA_SPA': 2807, 'FSRPA_FSRA': 32, 'FRA_FPA': 6, 'FSRA_FSRA': 3, 'SPAC_FSRPA': 1, 'FS_': 40, 'FSPA_FSRA': 798, 'FSAU_FSA': 13, 'A_R': 36, 'FSRPAE_FSPA': 1, 'SA_FSRA': 4, 'PA_PAC': 3, 'FSA_FSRA': 279, 'A_A': 68, 'REQ': 892, 'FA_R': 124, 'FSRPA_SRPA': 97, 'FSPAC_FSRA':20, 'FRPA_RPA': 7, 'FSRA_SPA': 8, 'INT': 85813, 'FRPA_FRPA': 6, 'SRPAC_FSPA': 4, 'SPA_SRA': 808, 'SA_SRPA': 1, 'SPA_FSPA': 2118, 'FSRAU_FSA': 2, 'RPA_PA': 171,'_SPA': 268, 'A_PA': 47, 'SPA_FSRA': 416, 'FSPA_FSRPAC': 2, 'PAC_PA': 5, 'SRPA_SPA': 9646, 'SRPA_FSRA': 13, 'FPA_FRPA': 49, 'SRA_SPA': 10, 'SA_SRA': 838, 'PA_PA': 5979, 'FPA_RPA': 27, 'SR_RA': 10, 'RED': 4579, 'CON': 2190507, 'FSRPA_FSPA':13547, 'FSPA_FPA': 4, 'FAU_R': 2, 'ECO': 2877, 'FRPA_FPA': 72, 'FSAU_SRA': 1, 'FRA_FA': 8, 'FSPA_FSPA': 216341, 'SEC_RA': 19, 'ECR': 3316, 'SPAC_FSPA': 12, 'SR_A': 34, 'SEC_': 5, 'FSAU_FSRA': 3, 'FSRA_FSRPA': 11, 'SRC': 13, 'A_RPA': 1, 'FRA_PA': 3, 'A_RPE': 1, 'RPA_FRPA': 20, '_SRA': 74, 'SRA_FSPA': 293, 'FPA_': 118, 'FSRPAC_FSRPA': 2, '_FA': 1, 'DNP': 1, 'FSRPA_FSRPA': 379, 'FSRA_SRA': 14, '_FRPA': 1, 'SR_': 59, 'FSPA_SPA': 517, 'FRPA_FSPA': 1, 'PA_A': 159, 'PA_SRA': 1, 'FPA_RA': 5, 'S_': 68710, 'SA_FSRPA': 4, 'FSA_FSRPA': 1, 'SA_SPA': 4, 'RA_A': 5, '_SRPA': 9, 'S_FRA': 156, 'FA_FRPA': 1, 'PA_R': 72, 'FSRPAEC_FSPA': 1, '_PA': 7, 'RA_S': 1, 'SA_FR': 2, 'RA_FPA': 6, 'RPA_': 5, '_FSPA': 2395, 'FSA_FSPA': 230, 'UNK': 2, 'A_RA': 9, 'FRPA_': 6, 'URF': 10, 'FS_SA': 97, 'SPAC_SRPA': 8, 'S_RPA': 32, 'SRPA_SRA': 69, 'SA_RPA': 30, 'PA_FRA': 4, 'FSRA_SA': 49, 'FSRA_FSA': 206, 'PAC_RPA': 1, 'SRA_': 18, 'FA_': 451, 'S_SA': 6917, 'FSPA_SRPA': 427, 'TXD': 542,'SRA_SA': 1514, 'FSPA_FA': 1, 'FPA_FSPA': 10, 'RA_PA': 3, 'SRA_FSA': 709, 'SRPA_SPAC': 3, 'FSPAC_FSRPA': 10, 'A_': 191, 'URNPRO': 2, 'PA_RPA': 81, 'FSPAC_SRA':1, 'SRPA_FSRPA': 3054, 'SPA_': 1, 'FA_FA': 259, 'FSPA_SA': 75, 'SR_SRA': 1, 'FSA_': 2, 'SRPA_SA': 406, 'SR_SA': 3119, 'FRPA_FA': 1, 'PA_FRPA': 13, 'S_R': 34, 'FSPAEC_FSPAE': 3, 'S_RA': 61105, 'FSPA_FSA': 5326, '_SA': 20, 'SA_FSPA': 15, 'SRPAC_SPA': 8, 'FPA_PA': 19, 'FSRPAE_FSA': 1, 'S_A': 1, 'RPA_RPA': 3, 'NRS': 6, 'RSP': 115, 'SPA_FSRPA': 1144, 'FSRPAC_FSPA': 139} file.readline() for line in file: sd = line[:-1].split(',') dur, proto, Sport, Dport, Sip, Dip, totP, totB, label, state = sd[1], sd[2], sd[4], sd[7], sd[3], sd[6], sd[-4], sd[-3], sd[-1], sd[8] try: Sip = socket.inet_aton(Sip) Sip = struct.unpack("!L", Sip)[0] except: continue try: Dip = socket.inet_aton(Dip) Dip = struct.unpack("!L", Dip)[0] except: continue if Sport=='': continue if Dport=='': continue #back, nor, bot try: if "Background" in label: label=0 elif "Normal" in label: label = 0 elif "Botnet" in label: # 看来就是做了一个0、1分类,并没有做具体的僵尸网络类型识别 label = 1 if flag==0: #Training Dataset if label==0 and count1<20001: xdata.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]]) ydata.append(label) count1+=1 elif label==1 and count2<20001: xdata.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]]) ydata.append(label) count2+=1 elif count1>19999 and count2>19999: #print("HI") flag=1 else: #Test dataset if label==0 and count3<5001: #print("H") xdataT.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]]) ydataT.append(label) count3+=1 elif label==1 and count4<5001: xdataT.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]]) ydataT.append(label) count4 += 1 elif count3>4999 and count4>4999: break except: continue #pickle the dataset for fast loading file = open('flowdata.pickle', 'wb') pickle.dump([np.array(xdata), np.array(ydata), np.array(xdataT), np.array(ydataT)], file) #return the training and the test dataset return np.array(xdata), np.array(ydata), np.array(xdataT), np.array(ydataT) if __name__ == "__main__": loaddata('flowdata.binetflow')
下文根据mean和std计算特征的操作也是不太稳健的,有可能真实网络的数据分布不是这个。
Build botnet detectors using machine learning algorithms in Python [Tutorial]
Botnets are connected computers that perform a number of repetitive tasks to keep websites going. Connected devices play an important role in modern life. From smart home appliances, computers, coffee machines, and cameras, to connected cars, this huge shift in our lifestyles has made our lives easier. Unfortunately, these exposed devices could be easily targeted by attackers and cybercriminals who could use them later to enable larger-scale attacks. Security vendors provide many solutions and products to defend against botnets, but in this tutorial, we are going to learn how to build novel botnet detection systems with Python and machine learning techniques.
You will find all the code discussed, in addition to some other useful scripts, in the following repository: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter05
This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing
We are going to learn how to build different botnet detection systems with many machine learning algorithms. As a start to a first practical lab, let’s start by building a machine learning-based botnet detector using different classifiers. By now, I hope you have acquired a clear understanding about the major steps of building machine learning systems. So, I believe that you already know that, as a first step, we need to look for a dataset.
Many educational institutions and organizations are given a set of collected datasets from internal laboratories. One of the most well known botnet datasets is called the CTU-13 dataset. It is a labeled dataset with botnet, normal, and background traffic delivered by CTU University, Czech Republic. During their work, they tried to capture real botnet traffic mixed with normal traffic and background traffic. To download the dataset and check out more information about it, you can visit the following link: https://mcfp.weebly.com/the-ctu-13-dataset-a-labeled-dataset-with-botnet-normal-and-background-traffic.html.
The dataset is bidirectional NetFlow files. But what are bidirectional NetFlow files? Netflow is an internet protocol developed by Cisco. The goal of this protocol is to collect IP traffic information and monitor network traffic in order to have a clearer view about the network traffic flow. The main components of a NetFlow architecture are a NetFlow Exporter, a Netflow collector, and a Flow Storage. The following diagram illustrates the different components of a NetFlow infrastructure:
When it comes to NetFlow generally, when host A sends an information to host B and from host B to host A as a reply, the operation is named unidirectional NetFlow. The sending and the reply are considered different operations. In bidirectional NetFlow, we consider the flows from host A and host B as one flow. Let’s download the dataset by using the following command:
$ wget --no-check-certificate https://mcfp.felk.cvut.cz/publicDatasets/CTU-13-Dataset/CTU-13-Dataset.tar.bz2
Extract the downloaded tar.bz2 file by using the following command:
# tar xvjf CTU-13-Dataset.tar.bz2
The file contains all the datasets, with the different scenarios. For the demonstration, we are going to use dataset 8 (scenario 8). You can select any scenario or you can use your own collected data, or any other .binetflow files delivered by other institutions:
Load the data using pandas as usual:
>>> import pandas as pd >>> data = pd.read_csv("capture20110816-3.binetflow") >>> data['Label'] = data.Label.str.contains("Botnet")
Exploring the data is essential in any data-centric project. For example, you can start by checking the names of the features or the columns:
>> data.columns
The command results in the columns of the dataset: StartTime, Dur, Proto, SrcAddr, Sport, Dir, DstAddr, Dport, State, sTos, dTos, TotPkts, TotBytes, SrcBytes, and Label. The columns represent the features used in the dataset; for example, Dur represents duration, Sport represents the source port, and so on. You can find the full list of features in the chapter’s GitHub repository.
Before training the model, we need to build some scripts to prepare the data. This time, we are going to build a separate Python script to prepare data, and later we can just import it into the main script.
I will call the first script DataPreparation.py. There are many proposals done to help extract the features and prepare data to build botnet detectors using machine learning. In our case, I customized two new scripts inspired by the data loading scripts built by NagabhushanS:
from __future__ import division import os, sys import threading
After importing the required Python packages, we created a class called Prepare to select training and testing data:
class Prepare(threading.Thread): def __init__(self, X, Y, XT, YT, accLabel=None): threading.Thread.__init__(self) self.X = X self.Y = Y self.XT=XT self.YT=YT self.accLabel= accLabel
def run(self):
X = np.zeros(self.X.shape)
Y = np.zeros(self.Y.shape)
XT = np.zeros(self.XT.shape)
YT = np.zeros(self.YT.shape)
np.copyto(X, self.X)
np.copyto(Y, self.Y)
np.copyto(XT, self.XT)
np.copyto(YT, self.YT)
for i in range(9):
X[:, i] = (X[:, i] - X[:, i].mean()) / (X[:, i].std())
for i in range(9):
XT[:, i] = (XT[:, i] - XT[:, i].mean()) / (XT[:, i].std())
The second script is called LoadData.py. You can find it on GitHub and use it directly in your projects to load data from .binetflow files and generate a pickle file.
Let’s use what we developed previously to train the models. After building the data loader and preparing the machine learning algorithms that we are going to use, it is time to train and test the models.
First, load the data from the pickle file, which is why we need to import the pickle Python library. Don’t forget to import the previous scripts using:
import LoadData import DataPreparation import pickle file = open('flowdata.pickle', 'rb') data = pickle.load(file)
Select the data sections:
Xdata = data[0] Ydata = data[1] XdataT = data[2] YdataT = data[3]
As machine learning classifiers, we are going to try many different algorithms so later we can select the best algorithm for our model. Import the required modules to use four machine learning algorithms from sklearn:
from sklearn.linear_model import * from sklearn.tree import * from sklearn.naive_bayes import * from sklearn.neighbors import *
Prepare the data by using the previous module build. Don’t forget to import DataPreparation by typing import DataPreparation:
>>> DataPreparation.Prepare(Xdata,Ydata,XdataT,YdataT)
Now, we can train the models; and to do that, we are going to train the model with different techniques so later we can select the most suitable machine learning technique for our project. The steps are like what we learned in previous projects: after preparing the data and selecting the features, define the machine learning algorithm, fit the model, and print out the score after defining its variable.
As machine learning classifiers, we are going to test many of them. Let’s start with a decision tree:
- Decision tree model:
>>> clf = DecisionTreeClassifier() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print (“The Score of the Decision Tree Classifier is”, Score * 100)
The score of the decision tree classifier is 99%
- Logistic regression model:
>>> clf = LogisticRegression(C=10000) >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT)
>>> print ("The Score of the Logistic Regression Classifier is", Score * 100)
The score of the logistic regression classifier is 96%
- Gaussian Naive Bayes model:
>>> clf = GaussianNB() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print("The Score of the Gaussian Naive Bayes classifier is", Score * 100)
The score of the Gaussian Naive Bayes classifier is 72%
- k-Nearest Neighbors model:
>>> clf = KNeighborsClassifier() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print("The Score of the K-Nearest Neighbours classifier is", Score * 100)
The score of the k-Nearest Neighbors classifier is 96%
- Neural network model:
To build a Neural network Model use the following code:
>>> from keras.models import * >>> from keras.layers import Dense, Activation >>> from keras.optimizers import *
model = Sequential()
model.add(Dense(10, input_dim=9, activation="sigmoid")) model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1))
sgd = SGD(lr=0.01, decay=0.000001, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='mse')
model.fit(Xdata, Ydata, nb_epoch=200, batch_size=100)
Score = model.evaluate(XdataT, YdataT, verbose=0)
Print(“The Score of the Neural Network is”, Score * 100 )
With this code, we imported the required Keras modules, we built the layers, we compiled the model with an SGD optimizer, we fit the model, and we printed out the score of the model.