Detecting Malicious Applications on Android is based on Static Analysis using Machine Learning Algorithm

Attacks on users through mobile devices in general, and mobile devices with Android operating system in particular, have been causing many serious consequences. Research [1] lists the vulnerabilities found in the Android operating system, making it the preferred target of cyberattackers. Report [2] statistics the number of cyberattacks via mobile devices and mobile devices using Android operating system. The report points out the insecurity of information from applications downloaded by users from Android apps stores. Therefore, to prevent the attack and distribution of malware through Android apps, it is necessary to research the method of detecting malicious code from the time users download applications to their devices. Recent approaches often rely on static analysis and dynamic analysis to look for unusual behavior in applications. In this paper, we will propose the use of static analysis techniques to build a behavior of malicious code in the application and machine learning algorithms to detect malicious behavior.

Download Full-text

Machine Learning Algorithm’s Measurement and Analytical Visualization of User’s Reviews for Google Play Store

10.20944/preprints202003.0249.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Abdul Karim ◽

Azhari Azhari ◽

Samir Brahim Belhaouri ◽

Ali Adil Qureshi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Ve Bayes ◽

Android Apps ◽

Online Marketplace ◽

Document Frequency ◽

Logistic Regression Algorithm ◽

Google Play

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.

Download Full-text

An Empirical Evaluation of User Movement Data on Smartphones

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500251 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1950025

Author(s):

Christopher Kelley ◽

Janelle Mason ◽

Albert Esterline ◽

Kaushik Roy

Keyword(s):

Machine Learning ◽

Mobile Devices ◽

Short Term Memory ◽

Learning Algorithm ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Accelerometer Data ◽

K Nearest Neighbor ◽

Movement Data ◽

User Movement

Movement data can be collected and used to add new security features and functionality to users’ mobile devices. Measuring a user’s movement using mobile devices allows for the use of behavioral biometrics. This assessment could introduce a shift in our current methods for securing mobile devices: instead of physical attributes like fingerprints or our face, the use of behavioral attributes like the way we walk or perform some personal activity. In this paper, an empirical evaluation of different classification techniques is conducted on user movement data. The datasets used in this empirical evaluation contain accelerometer data that were collected during various experiments from several mobile devices, including smartphones, smart watches, and other accelerometer sensors. We aggregated the user movement data and provided them as input into five traditional machine learning algorithms. The classification performances of the data were compared with a deep learning technique, the Long Short-Term Memory-Recurrent Neural Network (LSTM-RNN). The LSTM-RNN achieved its highest accuracy at 89% compared to 97% from a traditional machine learning algorithm, specifically the k-Nearest Neighbor (k-NN) algorithm on wrist-worn accelerometer data, thus showing the LSTM to be a less viable option.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

Automatic Malicious Code Classification System through Static Analysis Using Machine Learning

Symmetry ◽

10.3390/sym13010035 ◽

2020 ◽

Vol 13 (1) ◽

pp. 35

Author(s):

Sungjoong Kim ◽

Seongkyu Yeom ◽

Haengrok Oh ◽

Dongil Shin ◽

Dongkyoo Shin

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Information And Communication Technology ◽

Communication Technology ◽

Classification System ◽

Daily Life ◽

Malicious Code ◽

Access To Information ◽

Analysis Method ◽

Information And Communication

The development of information and communication technology (ICT) is making daily life more convenient by allowing access to information at anytime and anywhere and by improving the efficiency of organizations. Unfortunately, malicious code is also proliferating and becoming increasingly complex and sophisticated. In fact, even novices can now easily create it using hacking tools, which is causing it to increase and spread exponentially. It has become difficult for humans to respond to such a surge. As a result, many studies have pursued methods to automatically analyze and classify malicious code. There are currently two methods for analyzing it: a dynamic analysis method that executes the program directly and confirms the execution result, and a static analysis method that analyzes the program without executing it. This paper proposes a static analysis automation technique for malicious code that uses machine learning. This classification system was designed by combining a method for classifying malicious code using a portable executable (PE) structure and a method for classifying it using a PE structure. The system has 98.77% accuracy when classifying normal and malicious files. The proposed system can be used to classify various types of malware from PE files to shell code.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text

CAFD: Context-Aware Fault Diagnostic Scheme towards Sensor Faults Utilizing Machine Learning

Sensors ◽

10.3390/s21020617 ◽

2021 ◽

Vol 21 (2) ◽

pp. 617

Author(s):

Umer Saeed ◽

Young-Doo Lee ◽

Sana Ullah Jan ◽

Insoo Koo

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Diagnostic System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Context Aware ◽

Sensor Faults ◽

Training Time ◽

Low Intensity ◽

Fault Diagnostic

Sensors’ existence as a key component of Cyber-Physical Systems makes it susceptible to failures due to complex environments, low-quality production, and aging. When defective, sensors either stop communicating or convey incorrect information. These unsteady situations threaten the safety, economy, and reliability of a system. The objective of this study is to construct a lightweight machine learning-based fault detection and diagnostic system within the limited energy resources, memory, and computation of a Wireless Sensor Network (WSN). In this paper, a Context-Aware Fault Diagnostic (CAFD) scheme is proposed based on an ensemble learning algorithm called Extra-Trees. To evaluate the performance of the proposed scheme, a realistic WSN scenario composed of humidity and temperature sensor observations is replicated with extreme low-intensity faults. Six commonly occurring types of sensor fault are considered: drift, hard-over/bias, spike, erratic/precision degradation, stuck, and data-loss. The proposed CAFD scheme reveals the ability to accurately detect and diagnose low-intensity sensor faults in a timely manner. Moreover, the efficiency of the Extra-Trees algorithm in terms of diagnostic accuracy, F1-score, ROC-AUC, and training time is demonstrated by comparison with cutting-edge machine learning algorithms: a Support Vector Machine and a Neural Network.

Download Full-text

Machine Learning Algorithms, Applied to Intact Islets of Langerhans, Demonstrate Significantly Enhanced Insulin Staining at the Capillary Interface of Human Pancreatic β Cells

Metabolites ◽

10.3390/metabo11060363 ◽

2021 ◽

Vol 11 (6) ◽

pp. 363

Author(s):

Louise Cottle ◽

Ian Gilroy ◽

Kylie Deng ◽

Thomas Loudovaris ◽

Helen E. Thomas ◽

...

Keyword(s):

Machine Learning ◽

Islets Of Langerhans ◽

Learning Algorithm ◽

Contact Point ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Β Cells ◽

Pancreatic Β Cells ◽

Blood Glucose Concentrations ◽

Insulin Staining

Pancreatic β cells secrete the hormone insulin into the bloodstream and are critical in the control of blood glucose concentrations. β cells are clustered in the micro-organs of the islets of Langerhans, which have a rich capillary network. Recent work has highlighted the intimate spatial connections between β cells and these capillaries, which lead to the targeting of insulin secretion to the region where the β cells contact the capillary basement membrane. In addition, β cells orientate with respect to the capillary contact point and many proteins are differentially distributed at the capillary interface compared with the rest of the cell. Here, we set out to develop an automated image analysis approach to identify individual β cells within intact islets and to determine if the distribution of insulin across the cells was polarised. Our results show that a U-Net machine learning algorithm correctly identified β cells and their orientation with respect to the capillaries. Using this information, we then quantified insulin distribution across the β cells to show enrichment at the capillary interface. We conclude that machine learning is a useful analytical tool to interrogate large image datasets and analyse sub-cellular organisation.

Download Full-text

Automation of surgical skill assessment using a three-stage machine learning algorithm

Scientific Reports ◽

10.1038/s41598-021-84295-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Joël L. Lavanchy ◽

Joel Zindel ◽

Kadir Kirtac ◽

Isabell Twick ◽

Enes Hosgor ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Surgical Skills ◽

Adverse Outcomes ◽

Machine Learning Algorithms ◽

Surgical Instruments ◽

Skill Assessment ◽

Surgical Skill ◽

Surgical Skill Assessment ◽

Motion Features

AbstractSurgical skills are associated with clinical outcomes. To improve surgical skills and thereby reduce adverse outcomes, continuous surgical training and feedback is required. Currently, assessment of surgical skills is a manual and time-consuming process which is prone to subjective interpretation. This study aims to automate surgical skill assessment in laparoscopic cholecystectomy videos using machine learning algorithms. To address this, a three-stage machine learning method is proposed: first, a Convolutional Neural Network was trained to identify and localize surgical instruments. Second, motion features were extracted from the detected instrument localizations throughout time. Third, a linear regression model was trained based on the extracted motion features to predict surgical skills. This three-stage modeling approach achieved an accuracy of 87 ± 0.2% in distinguishing good versus poor surgical skill. While the technique cannot reliably quantify the degree of surgical skill yet it represents an important advance towards automation of surgical skill assessment.

Download Full-text

Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds

Water ◽

10.3390/w13091217 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1217

Author(s):

Nicolò Bellin ◽

Erica Racchetti ◽

Catia Maurone ◽

Marco Bartoli ◽

Valeria Rossi

Keyword(s):

Machine Learning ◽

Fuzzy Sets ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Short Term ◽

Unsupervised Machine Learning ◽

Fuzzy C Means ◽

Air Temperatures ◽

Physico Chemical ◽

Shallow Ponds

Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences.

Download Full-text