Hybrid Representation to Locate Vulnerable Lines of Code

2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Locating vulnerable lines of code in large software systems needs huge efforts from human experts. This explains the high costs in terms of budget and time needed to correct vulnerabilities. To minimize these costs, automatic solutions of vulnerabilities prediction have been proposed. Existing machine learning (ML)-based solutions face difficulties in predicting vulnerabilities in coarse granularity and in defining suitable code features that limit their effectiveness. To addressee these limitations, in the present work, the authors propose an improved ML-based approach using slice-based code representation and the technique of TF-IDF to automatically extract effective features. The obtained results showed that combining these two techniques with ML techniques allows building effective vulnerability prediction models (VPMs) that locate vulnerabilities in a finer granularity and with excellent performances (high precision (>98%), low FNR (<2%) and low FPR (<3%) which outperforms software metrics and are equivalent to the best performing recent deep learning-based approaches.

Author(s):  
Feidu Akmel ◽  
Ermiyas Birihanu ◽  
Bahir Siraj

Software systems are any software product or applications that support business domains such as Manufacturing,Aviation, Health care, insurance and so on.Software quality is a means of measuring how software is designed and how well the software conforms to that design. Some of the variables that we are looking for software quality are Correctness, Product quality, Scalability, Completeness and Absence of bugs, However the quality standard that was used from one organization is different from other for this reason it is better to apply the software metrics to measure the quality of software. Attributes that we gathered from source code through software metrics can be an input for software defect predictor. Software defect are an error that are introduced by software developer and stakeholders. Finally, in this study we discovered the application of machine learning on software defect that we gathered from the previous research works.


2021 ◽  
Author(s):  
R. Priyadarshini ◽  
K. Anuratha ◽  
N. Rajendran ◽  
S. Sujeetha

Anamoly is an uncommon and it represents an outlier i.e, a nonconforming case. According to Oxford Dictionary of Mathematics anamoly is defined as an unusal and erroneous observation that usually doesn’t follow the general pattern of drawn population. The process of detecting the anmolies is a process of data mining and it aims at finding the data points or patterns that do not adapt with the actual complete pattern of the data.The study on anamoly behavior and its impact has been done on areas such as Network Security, Finance, Healthcare and Earth Sciences etc. The proper detection and prediction of anamolies are of great importance as these rare observations may carry siginificant information. In today’s finanicial world, the enterprise data is digitized and stored in the cloudand so there is a significant need to detect the anaomalies in financial data which will help the enterprises to deal with the huge amount of auditing The corporate and enterprise is conducting auidts on large number of ledgers and journal entries. The monitoring of those kinds of auidts is performed manually most of the times. There should be proper anamoly detection in the high dimensional data published in the ledger format for auditing purpose. This work aims at analyzing and predicting unusal fraudulent financial transations by emplyoing few Machine Learning and Deep Learning Methods. Even if any of the anamoly like manipulation or tampering of data detected, such anamolies and errors can be identified and marked with proper proof with the help of the machine learning based algorithms. The accuracy of the prediction is increased by 7% by implementing the proposed prediction models.


2021 ◽  
Author(s):  
Aditya Nagori ◽  
Anushtha Kalia ◽  
Arjun Sharma ◽  
Pradeep Singh ◽  
Harsh Bandhey ◽  
...  

Shock is a major killer in the ICU and machine learning based early predictions can potentially save lives. Generalization across age and geographical context is an unaddressed challenge. In this retrospective observational study, we built real-time shock prediction models generalized across age groups and continents. More than 1.5 million patient-hours of novel data from a pediatric ICU in New Delhi and 5 million patient-hours from the adult ICU MIMIC database were used to build models. We achieved model generalization through a novel fractal deep-learning approach and predicted shock up to 12 hours in advance. Our deep learning models showed a receiver operating curve (AUROC) drop from 78% (95%CI, 73-83) on MIMIC data to 66% (95%CI, 54-78) on New Delhi data, outperforming standard machine learning by nearly a 10% gap. Therefore, better representations and deep learning can partly address the generalizability-gap of ICU prediction models trained across geographies. Our data and algorithms are publicly available as a pre-configured docker environment at https://github.com/SAFE-ICU/ShoQPred.


Author(s):  
Shradha Verma ◽  
Anuradha Chug ◽  
Amit Prakash Singh ◽  
Shubham Sharma ◽  
Puranjay Rajvanshi

With the increasing computational power, areas such as machine learning, image processing, deep learning, etc. have been extensively applied in agriculture. This chapter investigates the applications of the said areas and various prediction models in plant pathology for accurate classification, identification, and quantification of plant diseases. The authors aim to automate the plant disease identification process. To accomplish this objective, CNN has been utilized for image classification. Research shows that deep learning architectures outperform other machine learning tools significantly. To this effect, the authors have implemented and trained five CNN models, namely Inception ResNet v2, VGG16, VGG19, ResNet50, and Xception, on PlantVillage dataset for tomato leaf images. The authors analyzed 18,160 tomato leaf images spread across 10 class labels. After comparing their performance measures, ResNet50 proved to be the most accurate prediction tool. It was employed to create a mobile application to classify and identify tomato plant diseases successfully.


2021 ◽  
Author(s):  
KOUSHIK DEB

Character Computing consists of not only personality trait recognition, but also correlation among these traits. Tons of research has been conducted in this area. Various factors like demographics, sentiment, gender, LIWC, and others have been taken into account in order to understand human personality. In this paper, we have concentrated on the factors that could be obtained from available data using Natural Language Processing. It has been observed that the most successful personality trait prediction models are highly dependent on NLP techniques. Researchers across the globe have used different kinds of machine learning and deep learning techniques to automate this process. Different combinations of factors lead the research in different directions. We have presented a comparative study among those experiments and tried to derive a direction for future development.


Author(s):  
Rowland W. Pettit ◽  
Robert Fullem ◽  
Chao Cheng ◽  
Christopher I. Amos

AI is a broad concept, grouping initiatives that use a computer to perform tasks that would usually require a human to complete. AI methods are well suited to predict clinical outcomes. In practice, AI methods can be thought of as functions that learn the outcomes accompanying standardized input data to produce accurate outcome predictions when trialed with new data. Current methods for cleaning, creating, accessing, extracting, augmenting, and representing data for training AI clinical prediction models are well defined. The use of AI to predict clinical outcomes is a dynamic and rapidly evolving arena, with new methods and applications emerging. Extraction or accession of electronic health care records and combining these with patient genetic data is an area of present attention, with tremendous potential for future growth. Machine learning approaches, including decision tree methods of Random Forest and XGBoost, and deep learning techniques including deep multi-layer and recurrent neural networks, afford unique capabilities to accurately create predictions from high dimensional, multimodal data. Furthermore, AI methods are increasing our ability to accurately predict clinical outcomes that previously were difficult to model, including time-dependent and multi-class outcomes. Barriers to robust AI-based clinical outcome model deployment include changing AI product development interfaces, the specificity of regulation requirements, and limitations in ensuring model interpretability, generalizability, and adaptability over time.


2019 ◽  
Author(s):  
Abdul Karim ◽  
Vahid Riahi ◽  
Avinash Mishra ◽  
Abdollah Dehzangi ◽  
M. A. Hakim Newton ◽  
...  

Abstract Representing molecules in the form of only one type of features and using those features to predict their activities is one of the most important approaches for machine-learning-based chemical-activity-prediction. For molecular activities like quantitative toxicity prediction, the performance depends on the type of features extracted and the machine learning approach used. For such cases, using one type of features and machine learning model restricts the prediction performance to specific representation and model used. In this paper, we study quantitative toxicity prediction and propose a machine learning model for the same. Our model uses an ensemble of heterogeneous predictors instead of typically using homogeneous predictors. The predictors that we use vary either on the type of features used or on the deep learning architecture employed. Each of these predictors presumably has its own strengths and weaknesses in terms of toxicity prediction. Our motivation is to make a combined model that utilizes different types of features and architectures to obtain better collective performance that could go beyond the performance of each individual predictor. We use six predictors in our model and test the model on four standard quantitative toxicity benchmark datasets. Experimental results show that our model outperforms the state-of-the-art toxicity prediction models in 8 out of 12 accuracy measures. Our experiments show that ensembling heterogeneous predictor improves the performance over single predictors and homogeneous ensembling of single predictors.The results show that each data representation or deep learning based predictor has its own strengths and weaknesses, thus employing a model ensembling multiple heterogeneous predictors could go beyond individual performance of each data representation or each predictor type.


2020 ◽  
Author(s):  
Joon Lee

UNSTRUCTURED In contrast with medical imaging diagnostics powered by artificial intelligence (AI), in which deep learning has led to breakthroughs in recent years, patient outcome prediction poses an inherently challenging problem because it focuses on events that have not yet occurred. Interestingly, the performance of machine learning–based patient outcome prediction models has rarely been compared with that of human clinicians in the literature. Human intuition and insight may be sources of underused predictive information that AI will not be able to identify in electronic data. Both human and AI predictions should be investigated together with the aim of achieving a human-AI symbiosis that synergistically and complementarily combines AI with the predictive abilities of clinicians.


Author(s):  
Rudolf Ramler ◽  
Johannes Himmelbauer ◽  
Thomas Natschläger

The information about which modules of a future version of a software system will be defect-prone is a valuable planning aid for quality managers and testers. Defect prediction promises to indicate these defect-prone modules. In this chapter, building a defect prediction model from data is characterized as an instance of a data-mining task, and key questions and consequences arising when establishing defect prediction in a large software development project are discussed. Special emphasis is put on discussions on how to choose a learning algorithm, select features from different data sources, deal with noise and data quality issues, as well as model evaluation for evolving systems. These discussions are accompanied by insights and experiences gained by projects on data mining and defect prediction in the context of large software systems conducted by the authors over the last couple of years. One of these projects has been selected to serve as an illustrative use case throughout the chapter.


2021 ◽  
Vol 13 (2) ◽  
pp. 744
Author(s):  
Elsa Chaerun Nisa ◽  
Yean-Der Kuan

Over the last few decades, total energy consumption has increased while energy resources remain limited. Energy demand management is crucial for this reason. To solve this problem, predicting and forecasting water-cooled chiller power consumption using machine learning and deep learning are presented. The prediction models adopted are thermodynamic model and multi-layer perceptron (MLP), while the time-series forecasting models adopted are MLP, one-dimensional convolutional neural network (1D-CNN), and long short-term memory (LSTM). Each group of models is compared. The best model in each group is then selected for implementation. The data were collected every minute from an academic building at one of the universities in Taiwan. The experimental result demonstrates that the best prediction model is the MLP with 0.971 of determination (R2), 0.743 kW of mean absolute error (MAE), and 1.157 kW of root mean square error (RMSE). The time-series forecasting model trained every day for three consecutive days using new data to forecast the next minute of power consumption. The best time-series forecasting model is LSTM with 0.994 of R2, 0.233 kW of MAE, and 1.415 kW of RMSE. The models selected for both MLP and LSTM indicated very close predictive and forecasting values to the actual value.


Sign in / Sign up

Export Citation Format

Share Document