scholarly journals A New Prediction Approach for Preventing Default Customers from Applying Personal Loans Using Machine Learning

Author(s):  
Mohamed H. Khedr ◽  
Nesrine A. Azim ◽  
Ammar M. Ammar

In the Egyptian banking industry, loan officers use pure judgment to make personal loan approval decisions. In this paper, we develop a new predictive method for default customers' loans using machine learning. The new predictive method uses the available personal data and historical credit data to evaluate the credit trust-worthiness of customers to obtain loans. We used the ABE dataset for training and testing, as we used 10 features from the application form and i- score report class that could give great help to credit officers for taking the right decision through avoiding customer selection using random techniques. The collected dataset was analysed by using various machine learning classifiers based on important selected features, to obtain high accuracy. We compared the performance of several machine learning classifiers before and after feature selection. We have found that in terms of high accuracy, the most important features are (activity – income – loan) and in terms of better performance the decision tree classifier has surpassed any other machine learning classifier with significant prediction accuracy of almost 94.85%.

2019 ◽  
Vol 9 (11) ◽  
pp. 2375 ◽  
Author(s):  
Riaz Ullah Khan ◽  
Xiaosong Zhang ◽  
Rajesh Kumar ◽  
Abubakar Sharif ◽  
Noorbakhsh Amiri Golilarz ◽  
...  

In recent years, the botnets have been the most common threats to network security since it exploits multiple malicious codes like a worm, Trojans, Rootkit, etc. The botnets have been used to carry phishing links, to perform attacks and provide malicious services on the internet. It is challenging to identify Peer-to-peer (P2P) botnets as compared to Internet Relay Chat (IRC), Hypertext Transfer Protocol (HTTP) and other types of botnets because P2P traffic has typical features of the centralization and distribution. To resolve the issues of P2P botnet identification, we propose an effective multi-layer traffic classification method by applying machine learning classifiers on features of network traffic. Our work presents a framework based on decision trees which effectively detects P2P botnets. A decision tree algorithm is applied for feature selection to extract the most relevant features and ignore the irrelevant features. At the first layer, we filter non-P2P packets to reduce the amount of network traffic through well-known ports, Domain Name System (DNS). query, and flow counting. The second layer further characterized the captured network traffic into non-P2P and P2P. At the third layer of our model, we reduced the features which may marginally affect the classification. At the final layer, we successfully detected P2P botnets using decision tree Classifier by extracting network communication features. Furthermore, our experimental evaluations show the significance of the proposed method in P2P botnets detection and demonstrate an average accuracy of 98.7%.


2020 ◽  
Author(s):  
Kadi L. Saar ◽  
Alexey S. Morgunov ◽  
Runzhang Qi ◽  
William E. Arter ◽  
Georg Krainer ◽  
...  

AbstractIntracellular phase separation of proteins into biomolecular condensates is increasingly recognised as an important phenomenon for cellular compartmentalisation and regulation of biological function. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, here, we established an in silico strategy for understanding on a global level the associations between protein sequence and condensate formation, and used this information to construct machine learning classifiers for predicting liquid–liquid phase separation (LLPS) from protein sequence. Our analysis highlighted that LLPS–prone sequences are more disordered, hydrophobic and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database, and have their disordered regions enriched in polar, aromatic and charged residues. Using these determining features together with neural network based word2vec sequence embeddings, we developed machine learning classifiers for predicting protein condensate formation. Our model, trained to distinguish LLPS-prone sequences from structured proteins, achieved high accuracy (93%; 25-fold cross-validation) and identified condensate forming sequences from external independent test data at 97% sensitivity. Moreover, in combination with a classifier that had developed a nuanced insight into the features governing protein phase behaviour by learning to distinguish between sequences of varying LLPS propensity, the sensitivity was supplemented with high specificity (approximated ROC–AUC of 0.85). These results provide a platform rooted in molecular principles for understanding protein phase behaviour. The predictor is accessible from https://deephase.ch.cam.ac.uk/.Significance StatementThe tendency of many cellular proteins to form protein-rich biomolecular condensates underlies the formation of subcellular compartments and has been linked to various physiological functions. Understanding the molecular basis of this fundamental process and predicting protein phase behaviour have therefore become important objectives. To develop a global understanding of how protein sequence determines its phase behaviour, here, we constructed bespoke datasets of proteins of varying phase separation propensity and identified explicit biophysical and sequence-specific features common to phase separating proteins. Moreover, by combining this insight with neural network based sequence embeddings, we trained machine learning classifiers that identified phase separating sequences with high accuracy, including from independent external test data. The predictor is available from https://deephase.ch.cam.ac.uk/.


2021 ◽  
Vol 5 (4 (113)) ◽  
pp. 55-63
Author(s):  
Beimbet Daribayev ◽  
Aksultan Mukhanbet ◽  
Yedil Nurakhov ◽  
Timur Imankulov

The problem of oil displacement was solved using neural networks and machine learning classifiers. The Buckley-Leverett model is selected, which describes the process of oil displacement by water. It consists of the equation of continuity of oil, water phases and Darcy’s law. The challenge is to optimize the oil displacement problem. Optimization will be performed at three levels: vectorization of calculations; implementation of classical algorithms; implementation of the algorithm using neural networks. A feature of the method proposed in the work is the identification of the method with high accuracy and the smallest errors, comparing the results of machine learning classifiers and types of neural networks. The research paper is also one of the first papers in which a comparison was made with machine learning classifiers and neural and recurrent neural networks. The classification was carried out according to three classification algorithms, such as decision tree, support vector machine (SVM) and gradient boosting. As a result of the study, the Gradient Boosting classifier and the neural network showed high accuracy, respectively 99.99 % and 97.4 %. The recurrent neural network trained faster than the others. The SVM classifier has the lowest accuracy score. To achieve this goal, a dataset was created containing over 67,000 data for class 10. These data are important for the problems of oil displacement in porous media. The proposed methodology provides a simple and elegant way to instill oil knowledge into machine learning algorithms. This removes two of the most significant drawbacks of machine learning algorithms: the need for large datasets and the robustness of extrapolation. The presented principles can be generalized in countless ways in the future and should lead to a new class of algorithms for solving both forward and inverse oil problems


2019 ◽  
Vol 24 (3) ◽  
pp. 224-233 ◽  
Author(s):  
Scott J. Warchal ◽  
John C. Dawson ◽  
Neil O. Carragher

Multiparametric high-content imaging assays have become established to classify cell phenotypes from functional genomic and small-molecule library screening assays. Several groups have implemented machine learning classifiers to predict the mechanism of action of phenotypic hit compounds by comparing the similarity of their high-content phenotypic profiles with a reference library of well-annotated compounds. However, the majority of such examples are restricted to a single cell type often selected because of its suitability for simple image analysis and intuitive segmentation of morphological features. The aim of the current study was to evaluate and compare the performance of a classic ensemble-based tree classifier trained on extracted morphological features and a deep learning classifier using convolutional neural networks (CNNs) trained directly on images from the same dataset to predict compound mechanism of action across a morphologically and genetically distinct cell panel. Our results demonstrate that application of a CNN classifier delivers equivalent accuracy compared with an ensemble-based tree classifier at compound mechanism of action prediction within cell lines. However, our CNN analysis performs worse than an ensemble-based tree classifier when trained on multiple cell lines at predicting compound mechanism of action on an unseen cell line.


2021 ◽  
Vol 13 (8) ◽  
pp. 1433
Author(s):  
Shobitha Shetty ◽  
Prasun Kumar Gupta ◽  
Mariana Belgiu ◽  
S. K. Srivastav

Machine learning classifiers are being increasingly used nowadays for Land Use and Land Cover (LULC) mapping from remote sensing images. However, arriving at the right choice of classifier requires understanding the main factors influencing their performance. The present study investigated firstly the effect of training sampling design on the classification results obtained by Random Forest (RF) classifier and, secondly, it compared its performance with other machine learning classifiers for LULC mapping using multi-temporal satellite remote sensing data and the Google Earth Engine (GEE) platform. We evaluated the impact of three sampling methods, namely Stratified Equal Random Sampling (SRS(Eq)), Stratified Proportional Random Sampling (SRS(Prop)), and Stratified Systematic Sampling (SSS) upon the classification results obtained by the RF trained LULC model. Our results showed that the SRS(Prop) method favors major classes while achieving good overall accuracy. The SRS(Eq) method provides good class-level accuracies, even for minority classes, whereas the SSS method performs well for areas with large intra-class variability. Toward evaluating the performance of machine learning classifiers, RF outperformed Classification and Regression Trees (CART), Support Vector Machine (SVM), and Relevance Vector Machine (RVM) with a >95% confidence level. The performance of CART and SVM classifiers were found to be similar. RVM achieved good classification results with a limited number of training samples.


Sign in / Sign up

Export Citation Format

Share Document