Machine Learning (Neuronal Net, Random Forest, and C5.0 single decision tree) based on pXRF data as a tool to date sediment layers of the Nile Delta

Author(s):  
Martin Seeliger ◽  
Marina Altmeyer ◽  
Andreas Ginau ◽  
Robert Schiestl ◽  
Jürgen Wunderlich

<p>This paper presents the application of machine-learning techniques on pXRF data to establish a chronology for sediment cores around Tell Buto (Tell el-Fara´in) in the northwestern Nile Delta. As modern laboratories for dating techniques like OSL or <sup>14</sup>C are rare in Egypt and sample export is restricted, we are facing a lack of opportunities to create a robust chronology, which is indispensable in modern Geoarchaeology.</p><p>Therefore, we present a new approach to transfer archaeological age information gained at the excavation at Buto to corings of the wider Buto area. Sediments of archaeological outcrops and pits with known age are measured using pXRF to create a geochemical “fingerprint” for several historic eras. Afterwards, these “fingerprints” are transferred to corings of the surrounding areas using machine-learning algorithms.</p><p>This paper presents 1) the application of three different machine-learning approaches (Neuronal Net, Random Forest, and C5.0 decision tree) to check if archaeological age information can be transferred to sediments far off the settlement mounds using pXRF data, 2) the comparison of all approaches and the evaluation if the easily anticipated decision tree and Random Forest show similar results as the “black-box system” Neuronal Net, and finally, 3) a case study that provides the results of Altmeyer et al. (in review) for Kom el-Gir, a further settlement mound little north of Buto, with a chronostratigraphic framework based on this approach.</p><p>Reference:</p><p>Altmeyer, M., Seeliger, M., Ginau, A., Schiestl, R. & J. Wunderlich (in review):  Reconstruction of former channel systems in the northwestern Nile Delta (Egypt) based on corings and electrical resistivity tomography (ERT). (Submitted to E & G Quaternary Science Journal).</p>

Author(s):  
Zulqarnain Khokhar ◽  
◽  
Murtaza Ahmed Siddiqi ◽  

Wi-Fi based indoor positioning with the help of access points and smart devices have become an integral part in finding a device or a person’s location. Wi-Fi based indoor localization technology has been among the most attractive field for researchers for a number of years. In this paper, we have presented Wi-Fi based in-door localization using three different machine-learning techniques. The three machine learning algorithms implemented and compared are Decision Tree, Random Forest and Gradient Boosting classifier. After making a fingerprint of the floor based on Wi-Fi signals, mentioned algorithms were used to identify device location at thirty different positions on the floor. Random Forest and Gradient Boosting classifier were able to identify the location of the device with accuracy higher than 90%. While Decision Tree was able to identify the location with accuracy a bit higher than 80%.


Recent advancements in remote sensing platforms from satellites to close-range Remotely Piloted Aircraft System (RPAS), is principal to a growing demand for innovative image processing and classification tools. Where, Machine learning approaches are very prevailing group of data driven implication tools that provide a broader scope when applied to remote sensed data. In this paper, applying different machine learning approaches on the remote sensing images with open source packages in R, to find out which algorithm is more efficient for obtaining better accuracy. We carried out a rigorous comparison of four machine learning algorithms-Support vector machine, Random forest, regression tree, Classification and Naive Bayes. These algorithms are evaluated by Classification accurateness, Kappa index and curve area as accuracy metrics. Ten runs are done to obtain the variance in the results on the training set. Using k-fold cross validation the validation is carried out. This theme identifies Random forest approach as the best method based on the accuracy measure under different conditions. Random forest is used to train efficient and highly stable with respect to variations in classification representation parameter values and significantly more accurate than other machine learning approaches trailed


2020 ◽  
Vol 12 (1) ◽  
pp. 20-38
Author(s):  
Winfred Yaokumah ◽  
Isaac Wiafe

Determining the machine learning (ML) technique that performs best on new datasets is an important factor in the design of effective anomaly-based intrusion detection systems. This study therefore evaluated four machine learning algorithms (naive Bayes, k-nearest neighbors, decision tree, and random forest) on UNSW-NB 15 dataset for intrusion detection. The experiment results showed that random forest and decision tree classifiers are effective for detecting intrusion. Random forest had the highest weighted average accuracy of 89.66% and a mean absolute error (MAE) value of 0.0252 whereas decision tree recorded 89.20% and 0.0242, respectively. Naive Bayes classifier had the worst results on the dataset with 56.43% accuracy and a MAE of 0.0867. However, contrary to existing knowledge, naïve Bayes was observed to be potent in classifying backdoor attacks. Observably, naïve Bayes performed relatively well in classes where tree-based classifiers demonstrated abysmal performance.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Faizan Ullah ◽  
Qaisar Javaid ◽  
Abdu Salam ◽  
Masood Ahmad ◽  
Nadeem Sarwar ◽  
...  

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.


2019 ◽  
Author(s):  
Cheng-Sheng Yu ◽  
Yu-Jiun Lin ◽  
Chang-Hsien Lin ◽  
Sen-Te Wang ◽  
Shiyng-Yu Lin ◽  
...  

BACKGROUND Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. OBJECTIVE We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. METHODS Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. RESULTS Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. CONCLUSIONS Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.


Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Peter Appiahene ◽  
Yaw Marfo Missah ◽  
Ussiph Najim

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.


Author(s):  
Joy Iong-Zong Chen ◽  
Kong-Long Lai

The design of an analogue IC layout is a time-consuming and manual process. Despite several studies in the sector, some geometric restrictions have resulted in disadvantages in the process of automated analogue IC layout design. As a result, analogue design has a performance lag when compared to manual design. This prevents the deployment of a large range of automated tools. With the recent technical developments, this challenge is resolved using machine learning techniques. This study investigates performance-driven placement in the VLSI IC design process, as well as analogue IC performance prediction by utilizing various machine learning approaches. Further, several amplifier designs are simulated. From the simulation results, it is evident that, when compared to the manual layout, an improved performance is obtained by using the proposed approach.


10.2196/17110 ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. e17110 ◽  
Author(s):  
Cheng-Sheng Yu ◽  
Yu-Jiun Lin ◽  
Chang-Hsien Lin ◽  
Sen-Te Wang ◽  
Shiyng-Yu Lin ◽  
...  

Background Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. Objective We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. Methods Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. Results Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. Conclusions Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.


2019 ◽  
Vol 16 (4) ◽  
pp. 155-169
Author(s):  
N. A. Azeez ◽  
A. A. Ajayi

Since the invention of Information and Communication Technology (ICT), there has been a great shift from the erstwhile traditional approach of handling information across the globe to the usage of this innovation. The application of this initiative cut across almost all areas of human endeavours. ICT is widely utilized in education and production sectors as well as in various financial institutions. It is of note that many people are using it genuinely to carry out their day to day activities while others are using it to perform nefarious activities at the detriment of other cyber users. According to several reports which are discussed in the introductory part of this work, millions of people have become victims of fake Uniform Resource Locators (URLs) sent to their mails by spammers. Financial institutions are not left out in the monumental loss recorded through this illicit act over the years. It is worth mentioning that, despite several approaches currently in place, none could confidently be confirmed to provide the best and reliable solution. According to several research findings reported in the literature, researchers have demonstrated how machine learning algorithms could be employed to verify and confirm compromised and fake URLs in the cyberspace. Inconsistencies have however been noticed in the researchers’ findings and also their corresponding results are not dependable based on the values obtained and conclusions drawn from them. Against this backdrop, the authors carried out a comparative analysis of three learning algorithms (Naïve Bayes, Decision Tree and Logistics Regression Model) for verification of compromised, suspicious and fake URLs and determine which is the best of all based on the metrics (F-Measure, Precision and Recall) used for evaluation. Based on the confusion metrics measurement, the result obtained shows that the Decision Tree (ID3) algorithm achieves the highest values for recall, precision and f-measure. It unarguably provides efficient and credible means of maximizing the detection of compromised and malicious URLs. Finally, for future work, authors are of the opinion that two or more supervised learning algorithms can be hybridized to form a single effective and more efficient algorithm for fake URLs verification.Keywords: Learning-algorithms, Forged-URL, Phoney-URL, performance-comparison


Sign in / Sign up

Export Citation Format

Share Document