Feature 값 2개 : https://github.com/sangwonH/DBSCAN/blob/master/DBSCAN_Feature_02.ipynb
Feature 값 3개 : https://github.com/sangwonH/DBSCAN/blob/master/DBSCAN_Feature_03.ipynb
데이터는 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html 여기서 받았는데 494021 row를 가진다.
필드는 총 42이며, http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names 를 참고,
마지막 필드가 해당 패킷이 정상인지 아닌지를 나타내는데 해당 필드에 대한 정보가 없어서 attack_label로 정의.
데이터가 커서 attack_label로 몇개씩만 뽑아서 68 row 데이터를 생성하고 DBSCAN으로 predict 값을 출력.
필드 정의하는 부분
data = pd.read_csv('/Users/inmobi/Downloads/DBSCAN/kddcup.data_10_percent_test02.csv') data.columns=['duration: continuous' ,'protocol_type: symbolic' ,'service: symbolic' ,'flag: symbolic' ,'src_bytes: continuous' ,'dst_bytes: continuous' ,'land: symbolic','wrong_fragment: continuous','urgent: continuous','hot: continuous','num_failed_logins: continuous','logged_in: symbolic','num_compromised: continuous','root_shell: continuous','su_attempted: continuous','num_root: continuous','num_file_creations: continuous','num_shells: continuous','num_access_files: continuous','num_outbound_cmds: continuous','is_host_login: symbolic','is_guest_login: symbolic','count: continuous','srv_count: continuous','serror_rate: continuous','srv_serror_rate: continuous','rerror_rate: continuous','srv_rerror_rate: continuous','same_srv_rate: continuous','diff_srv_rate: continuous','srv_diff_host_rate: continuous','dst_host_count: continuous','dst_host_srv_count: continuous','dst_host_same_srv_rate: continuous','dst_host_diff_srv_rate: continuous','dst_host_same_src_port_rate: continuous','dst_host_srv_diff_host_rate: continuous','dst_host_serror_rate: continuous','dst_host_srv_serror_rate: continuous','dst_host_rerror_rate: continuous','dst_host_srv_rerror_rate: continuous','attack_label']
파일 크기 및 실제 파일의 내용
inmobis-MacBook-Pro:DBSCAN inmobi$ wc kddcup.data_10_percent.csv
494021 494021 74889749 kddcup.data_10_percent.csv
inmobis-MacBook-Pro:DBSCAN inmobi$ wc kddcup.data_10_percent_test02.csv
26 68 3960 kddcup.data_10_percent_test02.csv
inmobis-MacBook-Pro:DBSCAN inmobi$ cat kddcup.data_10_percent_test02.csv
duration: continuous,protocol_type: symbolic,service: symbolic,flag: symbolic,src_bytes: continuous,dst_bytes: continuous,land: symbolic,wrong_fragment: continuous,urgent: continuous,hot: continuous,num_failed_logins: continuous,logged_in: symbolic,num_compromised: continuous,root_shell: continuous,su_attempted: continuous,num_root: continuous,num_file_creations: continuous,num_shells: continuous,num_access_files: continuous,num_outbound_cmds: continuous,is_host_login: symbolic,is_guest_login: symbolic,count: continuous,srv_count: continuous,serror_rate: continuous,srv_serror_rate: continuous,rerror_rate: continuous,srv_rerror_rate: continuous,same_srv_rate: continuous,diff_srv_rate: continuous,srv_diff_host_rate: continuous,dst_host_count: continuous,dst_host_srv_count: continuous,dst_host_same_srv_rate: continuous,dst_host_diff_srv_rate: continuous,dst_host_same_src_port_rate: continuous,dst_host_srv_diff_host_rate: continuous,dst_host_serror_rate: continuous,dst_host_srv_serror_rate: continuous,dst_host_rerror_rate: continuous,dst_host_srv_rerror_rate: continuous,attack_label
0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,9,9,1,0,0.11,0,0,0,0,0,normal.
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,19,19,1,0,0.05,0,0,0,0,0,normal.
0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,29,29,1,0,0.03,0,0,0,0,0,normal.
0,tcp,http,SF,219,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0,0,0,0,1,0,0,39,39,1,0,0.03,0,0,0,0,0,normal.
0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0,0,0,0,1,0,0,49,49,1,0,0.02,0,0,0,0,0,normal.
0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0,0,0,0,1,0,0,59,59,1,0,0.02,0,0,0,0,0,normal.
0,tcp,http,SF,212,1940,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,1,0,1,1,69,1,0,1,0.04,0,0,0,0,normal.
0,tcp,http,SF,159,4087,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,5,5,0,0,0,0,1,0,0,11,79,1,0,0.09,0.04,0,0,0,0,normal.
0,tcp,http,SF,210,151,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,8,89,1,0,0.12,0.04,0,0,0,0,normal.
0,tcp,http,SF,212,786,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,8,99,1,0,0.12,0.05,0,0,0,0,normal.
169,tcp,telnet,SF,1567,2857,0,0,0,3,0,1,4,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,1,0,1,0,0,0,0,0,buffer_overflow.
179,tcp,telnet,SF,1559,2855,0,0,0,3,0,1,4,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,2,2,1,0,0.5,0,0,0,0,0,buffer_overflow.
49,tcp,telnet,SF,2402,3939,0,0,0,4,0,1,2,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,2,1,0,1,1,0,0,0,0,buffer_overflow.
290,tcp,telnet,SF,415,70529,0,0,0,3,0,1,4,0,0,4,4,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,1,0,1,0,0,0,0,0,buffer_overflow.
31,tcp,telnet,SF,137,1351,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,2,2,1,0,0.5,0,0,0,0,0,buffer_overflow.
0,tcp,ftp_data,SF,0,5696,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,81,1,0,1,0.02,0,0,0,0,buffer_overflow.
0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,29,0,0,0,0,1,0,0,255,96,0.38,0.01,0.38,0,0,0,0,0,teardrop.
0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30,30,0,0,0,0,1,0,0,255,97,0.38,0.01,0.38,0,0,0,0,0,teardrop.
0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,31,0,0,0,0,1,0,0,255,98,0.38,0.01,0.38,0,0,0,0,0,teardrop.
0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,32,32,0,0,0,0,1,0,0,255,99,0.39,0.01,0.39,0,0,0,0,0,teardrop.
0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,33,0,0,0,0,1,0,0,255,100,0.39,0.01,0.39,0,0,0,0,0,teardrop.
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,11,1,0,1,1,0,0,0,0,ipsweep.
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,21,1,0,1,1,0,0,0,0,ipsweep.
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,31,1,0,1,1,0,0,0,0,ipsweep.
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,41,1,0,1,1,0,0,0,0,ipsweep.
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,51,1,0,1,1,0,0,0,0,ipsweep.
predict 값을 근거로 클러스터링의 결과(정확도/일치여부)를 알 수 있는데
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
attack_label |
n |
n |
n |
n |
n |
n |
n |
n |
n |
n |
b_o |
b_o |
b_o |
b_o |
b_o |
b_o |
t_d | t_d |
t_d |
t_d |
t_d |
ipS |
ipS |
ipS |
ipS |
ipS |
predict |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-1 |
-1 |
-1 |
-1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
n : normal b_o : buffer_overflow t_d : teardrop ipS : ipsweap
위와 같은 형태로 예측 따라서 feature값을 3개를 준 경우 또 다른 결과 따라서 feature 값 선정과 갯수가 중요.
'Programming > DBSCAN' 카테고리의 다른 글
DBSCAN을 활용한 Unsupervised Anomaly Detection (0) | 2017.11.23 |
---|