Feature 값 2개 : https://github.com/sangwonH/DBSCAN/blob/master/DBSCAN_Feature_02.ipynb

Feature 값 3개 : https://github.com/sangwonH/DBSCAN/blob/master/DBSCAN_Feature_03.ipynb


데이터는 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html 여기서 받았는데 494021 row를 가진다.

필드는 총 42이며, http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names 를 참고,

마지막 필드가 해당 패킷이 정상인지 아닌지를 나타내는데 해당 필드에 대한 정보가 없어서 attack_label로 정의.

데이터가 커서 attack_label로 몇개씩만 뽑아서 68 row 데이터를 생성하고 DBSCAN으로 predict 값을 출력.


필드 정의하는 부분

data = pd.read_csv('/Users/inmobi/Downloads/DBSCAN/kddcup.data_10_percent_test02.csv')
data.columns=['duration: continuous'
,'protocol_type: symbolic'
,'service: symbolic'
,'flag: symbolic'
,'src_bytes: continuous'
,'dst_bytes: continuous'
,'land: symbolic','wrong_fragment: continuous','urgent: continuous','hot: continuous','num_failed_logins: continuous','logged_in: symbolic','num_compromised: continuous','root_shell: continuous','su_attempted: continuous','num_root: continuous','num_file_creations: continuous','num_shells: continuous','num_access_files: continuous','num_outbound_cmds: continuous','is_host_login: symbolic','is_guest_login: symbolic','count: continuous','srv_count: continuous','serror_rate: continuous','srv_serror_rate: continuous','rerror_rate: continuous','srv_rerror_rate: continuous','same_srv_rate: continuous','diff_srv_rate: continuous','srv_diff_host_rate: continuous','dst_host_count: continuous','dst_host_srv_count: continuous','dst_host_same_srv_rate: continuous','dst_host_diff_srv_rate: continuous','dst_host_same_src_port_rate: continuous','dst_host_srv_diff_host_rate: continuous','dst_host_serror_rate: continuous','dst_host_srv_serror_rate: continuous','dst_host_rerror_rate: continuous','dst_host_srv_rerror_rate: continuous','attack_label']


 파일 크기 및 실제 파일의 내용

inmobis-MacBook-Pro:DBSCAN inmobi$ wc kddcup.data_10_percent.csv

  494021  494021 74889749 kddcup.data_10_percent.csv

inmobis-MacBook-Pro:DBSCAN inmobi$ wc kddcup.data_10_percent_test02.csv 

      26      68    3960 kddcup.data_10_percent_test02.csv

inmobis-MacBook-Pro:DBSCAN inmobi$ cat kddcup.data_10_percent_test02.csv 

duration: continuous,protocol_type: symbolic,service: symbolic,flag: symbolic,src_bytes: continuous,dst_bytes: continuous,land: symbolic,wrong_fragment: continuous,urgent: continuous,hot: continuous,num_failed_logins: continuous,logged_in: symbolic,num_compromised: continuous,root_shell: continuous,su_attempted: continuous,num_root: continuous,num_file_creations: continuous,num_shells: continuous,num_access_files: continuous,num_outbound_cmds: continuous,is_host_login: symbolic,is_guest_login: symbolic,count: continuous,srv_count: continuous,serror_rate: continuous,srv_serror_rate: continuous,rerror_rate: continuous,srv_rerror_rate: continuous,same_srv_rate: continuous,diff_srv_rate: continuous,srv_diff_host_rate: continuous,dst_host_count: continuous,dst_host_srv_count: continuous,dst_host_same_srv_rate: continuous,dst_host_diff_srv_rate: continuous,dst_host_same_src_port_rate: continuous,dst_host_srv_diff_host_rate: continuous,dst_host_serror_rate: continuous,dst_host_srv_serror_rate: continuous,dst_host_rerror_rate: continuous,dst_host_srv_rerror_rate: continuous,attack_label

0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,9,9,1,0,0.11,0,0,0,0,0,normal.

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,19,19,1,0,0.05,0,0,0,0,0,normal.

0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,29,29,1,0,0.03,0,0,0,0,0,normal.

0,tcp,http,SF,219,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0,0,0,0,1,0,0,39,39,1,0,0.03,0,0,0,0,0,normal.

0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0,0,0,0,1,0,0,49,49,1,0,0.02,0,0,0,0,0,normal.

0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0,0,0,0,1,0,0,59,59,1,0,0.02,0,0,0,0,0,normal.

0,tcp,http,SF,212,1940,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,1,0,1,1,69,1,0,1,0.04,0,0,0,0,normal.

0,tcp,http,SF,159,4087,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,5,5,0,0,0,0,1,0,0,11,79,1,0,0.09,0.04,0,0,0,0,normal.

0,tcp,http,SF,210,151,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,8,89,1,0,0.12,0.04,0,0,0,0,normal.

0,tcp,http,SF,212,786,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0,0,0,0,1,0,0,8,99,1,0,0.12,0.05,0,0,0,0,normal.

169,tcp,telnet,SF,1567,2857,0,0,0,3,0,1,4,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,1,0,1,0,0,0,0,0,buffer_overflow.

179,tcp,telnet,SF,1559,2855,0,0,0,3,0,1,4,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,2,2,1,0,0.5,0,0,0,0,0,buffer_overflow.

49,tcp,telnet,SF,2402,3939,0,0,0,4,0,1,2,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,2,1,0,1,1,0,0,0,0,buffer_overflow.

290,tcp,telnet,SF,415,70529,0,0,0,3,0,1,4,0,0,4,4,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,1,0,1,0,0,0,0,0,buffer_overflow.

31,tcp,telnet,SF,137,1351,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,2,2,1,0,0.5,0,0,0,0,0,buffer_overflow.

0,tcp,ftp_data,SF,0,5696,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,81,1,0,1,0.02,0,0,0,0,buffer_overflow.

0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,29,0,0,0,0,1,0,0,255,96,0.38,0.01,0.38,0,0,0,0,0,teardrop.

0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30,30,0,0,0,0,1,0,0,255,97,0.38,0.01,0.38,0,0,0,0,0,teardrop.

0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,31,0,0,0,0,1,0,0,255,98,0.38,0.01,0.38,0,0,0,0,0,teardrop.

0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,32,32,0,0,0,0,1,0,0,255,99,0.39,0.01,0.39,0,0,0,0,0,teardrop.

0,udp,private,SF,28,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,33,33,0,0,0,0,1,0,0,255,100,0.39,0.01,0.39,0,0,0,0,0,teardrop.

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,11,1,0,1,1,0,0,0,0,ipsweep.

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,21,1,0,1,1,0,0,0,0,ipsweep.

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,31,1,0,1,1,0,0,0,0,ipsweep.

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,41,1,0,1,1,0,0,0,0,ipsweep.

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,51,1,0,1,1,0,0,0,0,ipsweep.


predict 값을 근거로 클러스터링의 결과(정확도/일치여부)를 알 수 있는데

 

 1

3

4

5

10 

 11

 12

 13

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

 attack_label

 n

n 

n 

n 

 n

n 

n 

 n

 n

 n

 b_o

b_o 

b_o 

b_o 

b_o 

b_o 

t_d

t_d 

t_d

 t_d

 t_d

 ipS

ipS 

ipS 

ipS 

ipS 

 predict

 0

 0

 0

 0

 0

 0

-1 

-1 

 -1

 -1

1 

 1

 1

 1

 1

 1

 1

 1

 1

 1

 1

n : normal        b_o : buffer_overflow        t_d : teardrop            ipS : ipsweap 


위와 같은 형태로 예측 따라서 feature값을 3개를 준 경우 또 다른 결과 따라서 feature 값 선정과 갯수가 중요.


'Programming > DBSCAN' 카테고리의 다른 글

DBSCAN을 활용한 Unsupervised Anomaly Detection  (0) 2017.11.23

+ Recent posts