Source Code Author Identification Method Combining Semantics and Statistical Features

Author(s):  
Xu Sun ◽  
Yutong Sun ◽  
Leilei Kong ◽  
Yong Han ◽  
Hui Ning
2021 ◽  
pp. 1-13
Author(s):  
Anmin Zhou ◽  
Tianyi Huang ◽  
Cheng Huang ◽  
Dunhan Li ◽  
Chuangchuang Song

Python is a concise language which can be used to build lightweight tools or dynamic object-orientated applications. The various attributes of Python have made it attractive to numerous malware authors. Attackers often embed malicious shell commands into Python scripts for illegal operations. However, traditional static analysis methods are not feasible to detect this kind of attack because they focus on common features and failure in finding those malicious commands. On the other hand, dynamic analysis is not optimal in this case for its time-consuming and inefficient. In this paper, we propose PyComm, a model for detecting malicious commands in Python scripts with multidimensional features based on machine learning, which considers both 12 statistical features and string sequences of Python source code. Meanwhile, three comparison experiments are designed to evaluate the validity of proposed method. Experimental results show that presented model has achieved an excellent performance based on those practical features and random forest (RF) algorithm, obtained an accuracy of 0.955 with a recall of 0.943.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10813
Author(s):  
Qianfei Huang ◽  
Wenyang Zhou ◽  
Fei Guo ◽  
Lei Xu ◽  
Lichao Zhang

With the accumulation of data on 6mA modification sites, an increasing number of scholars have begun to focus on the identification of 6mA sites. Despite the recognized importance of 6mA sites, methods for their identification remain lacking, with most existing methods being aimed at their identification in individual species. In the present study, we aimed to develop an identification method suitable for multiple species. Based on previous research, we propose a method for 6mA site recognition. Our experiments prove that the proposed 6mA-Pred method is effective for identifying 6mA sites in genes from taxa such as rice, Mus musculus, and human. A series of experimental results show that 6mA-Pred is an excellent method. We provide the source code used in the study, which can be obtained from http://39.100.246.211:5004/6mA_Pred/.


Sign in / Sign up

Export Citation Format

Share Document