Comparative of Machine Learning Algorithms and Datasets to Classify Natural Coverage in the Cajas National Park (Ecuador) Based on GEOBIA Approach

GEOBIA is an alternative to create and update land cover maps. In this work we assessed the combination of geographic datasets of the Cajas National Park (Ecuador) to detect which is the appropriate dataset-algorithm combination for the classification tasks in the Ecuadorian Andean region. The datasets included high resolution data as photogrammetric orthomosaic, DEM and derivated slope. These data were compared with free Sentinel imagery to classify natural land covers. We evaluated two aspects of the classification problem: the appropriate algorithm and the dataset combination. We evaluated SMO, C4.5 and Random Forest algorithms for the selection of attributes and classification of objects. The best results of kappa in the comparison of algorithms of classification were obtained with SMO (0.8182) and Random Forest (0.8117). In the evaluation of datasets the kappa values of the photogrammetry orthomosaic and the combination of Sentinel 1 and 2 have similar values using the C4.5 algorithm.

Download Full-text

Classification of unlabeled online media

Scientific Reports ◽

10.1038/s41598-021-85608-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Social Media ◽

Real World ◽

Graphical Model ◽

Ground Truth ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Modified Decision Tree Technique for Ransomware Detection at Runtime through API Calls

Scientific Programming ◽

10.1155/2020/8845833 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Faizan Ullah ◽

Qaisar Javaid ◽

Abdu Salam ◽

Masood Ahmad ◽

Nadeem Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Feature Vector ◽

Machine Learning Algorithms ◽

The Novel ◽

Proposed Model ◽

Testing Accuracy ◽

Financial Losses

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.

Download Full-text

Damage Classification of Composites Using Machine Learning

Volume 13: Safety Engineering, Risk, and Reliability Analysis ◽

10.1115/imece2019-11851 ◽

2019 ◽

Author(s):

Shweta Dabetwar ◽

Stephen Ekwaro-Osire ◽

João Paulo Dias

Keyword(s):

Machine Learning ◽

Composite Materials ◽

Random Forest ◽

Condition Monitoring ◽

Machine Learning Algorithms ◽

Support Vector ◽

Damage Classification ◽

Combining Data ◽

Ultrasonic Measurements

Abstract Composite materials have tremendous and ever-increasing applications in complex engineering systems; thus, it is important to develop non-destructive and efficient condition monitoring methods to improve damage prediction, thereby avoiding catastrophic failures and reducing standby time. Nondestructive condition monitoring techniques when combined with machine learning applications can contribute towards the stated improvements. Thus, the research question taken into consideration for this paper is “Can machine learning techniques provide efficient damage classification of composite materials to improve condition monitoring using features extracted from acousto-ultrasonic measurements?” In order to answer this question, acoustic-ultrasonic signals in Carbon Fiber Reinforced Polymer (CFRP) composites for distinct damage levels were taken from NASA Ames prognostics data repository. Statistical condition indicators of the signals were used as features to train and test four traditional machine learning algorithms such as K-nearest neighbors, support vector machine, Decision Tree and Random Forest, and their performance was compared and discussed. Results showed higher accuracy for Random Forest with a strong dependency on the feature extraction/selection techniques employed. By combining data analysis from acoustic-ultrasonic measurements in composite materials with machine learning tools, this work contributes to the development of intelligent damage classification algorithms that can be applied to advanced online diagnostics and health management strategies of composite materials, operating under more complex working conditions.

Download Full-text

Classification of unlabeled online media

10.21203/rs.3.rs-107002/v1 ◽

2020 ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Real World ◽

Graphical Model ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Online Media ◽

Media Content ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

Abstract This work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need forground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, thiswork leverages user-user and user-media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) beingspread, without needing to know the actual details of the information itself. To study the inception and evolution of user-userand user-media interactions over time, we create an experimental platform that mimics the functionality of real world socialmedia networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty(entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world socialmedia network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, andwith media content. The discovery that the entropy of user-user, and user-media interactions approximates fake and authenticmedia likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Classifying Images of Drought-Affected Area Using Deep Belief Network, kNN, and Random Forest Learning Techniques

Deep Learning Innovations and Their Convergence With Big Data - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-3015-2.ch006 ◽

2018 ◽

pp. 102-119 ◽

Cited By ~ 1

Author(s):

Sanjiban Sekhar Roy ◽

Pulkit Kulshrestha ◽

Pijush Samui

Keyword(s):

Random Forest ◽

Performance Metrics ◽

Deep Belief Network ◽

Machine Learning Algorithms ◽

Nearest Neighbour ◽

Data Set ◽

Belief Network ◽

Learning Techniques ◽

Severe Shortage

Drought is a condition of land in which the ground water faces a severe shortage. This condition affects the survival of plants and animals. Drought can impact ecosystem and agricultural productivity, severely. Hence, the economy also gets affected by this situation. This paper proposes Deep Belief Network (DBN) learning technique, which is one of the state of the art machine learning algorithms. This proposed work uses DBN, for classification of drought and non-drought images. Also, k nearest neighbour (kNN) and random forest learning methods have been proposed for the classification of the same drought images. The performance of the Deep Belief Network(DBN) has been compared with k nearest neighbour (kNN) and random forest. The data set has been split into 80:20, 70:30 and 60:40 as train and test. Finally, the effectiveness of the three proposed models have been measured by various performance metrics.

Download Full-text

Comparison of Machine Learning Algorithms Using WEKA and Sci-Kit Learn in Classifying Online Shopper Intention

JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING ◽

10.31289/jite.v3i1.2599 ◽

2019 ◽

Vol 3 (1) ◽

pp. 58

Author(s):

Yefta Christian

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Kappa Statistic ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Support Vector ◽

Test Results ◽

Online Shoppers ◽

Online Stores

<p class="8AbstrakBahasaIndonesia"><span>The growth of online stores nowadays is very rapid. This is supported by faster and better internet infrastructure. The increasing growth of online stores makes the competition more difficult in this business field. It is necessary for online stores to have a website or an application that is able to measure and classify consumers’ spending intentions, so that the consumers will have eyes on things on the sites and applications to make purchases eventually. Classification of online shoppers’ intentions can be done by using several algorithms, such as Naïve Bayes, Multi-Layer Perceptron, Support Vector Machine, Random Forest and J48 Decision Trees. In this case, the comparison of algorithms is done with two tools, WEKA and Sci-Kit Learn by comparing the values of F1-Score, accuracy, Kappa Statistic and mean absolute error. There is a difference between the test results using WEKA and Sci-Kit Learn on the Support Vector Machine algorithm. Based on this research, the Random Forest algorithm is the most appropriate algorithm to be used as an algorithm for classifying online shoppers’ intentions.</span></p>

Download Full-text

Efficient algorithm for testing goodness-of-fit for classification of high dimensional data

Lietuvos matematikos rinkinys ◽

10.15388/lmr.2009.52 ◽

2009 ◽

Vol 50 ◽

Author(s):

Gintautas Jakimauskas

Keyword(s):

Goodness Of Fit ◽

Projection Pursuit ◽

Classification Problem ◽

Gaussian Mixture ◽

Sequential Data ◽

Data Partition ◽

Computationally Efficient ◽

Efficient Procedure ◽

Selection Of

Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1, 2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’. An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure.

Download Full-text

Electrodermal Activity Based Emotion Recognition using Time-Frequency Methods and Machine Learning Algorithms

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2021-2220 ◽

2021 ◽

Vol 7 (2) ◽

pp. 863-866

Author(s):

Yedukondala Rao Veeranki ◽

Nagarajan Ganapathy ◽

Ramakrishnan Swaminathan

Keyword(s):

Random Forest ◽

Electrodermal Activity ◽

Machine Learning Algorithms ◽

Support Vector ◽

Emotional States ◽

Time Frequency ◽

Suggested Technique ◽

Clinical Conditions ◽

Short Time

Abstract In this work, the feasibility of time-frequency methods, namely short-time Fourier transform, Choi Williams distribution, and smoothed pseudo-Wigner-Ville distribution in the classification of happy and sad emotional states using Electrodermal activity signals have been explored. For this, the annotated happy and sad signals are obtained from an online public database and decomposed into phasic components. The time-frequency analysis has been performed on the phasic components using three different methods. Four statistical features, namely mean, variance, kurtosis, and skewness are extracted from each method. Four classifiers, namely logistic regression, Naive Bayes, random forest, and support vector machine, have been used for the classification. The combination of the smoothed pseudo-Wigner-Ville distribution and random forest yields the highest F-measure of 68.74% for classifying happy and sad emotional states. Thus, it appears that the suggested technique could be helpful in the diagnosis of clinical conditions linked to happy and sad emotional states.

Download Full-text

RNA editing-based classification of diffuse gliomas: predicting isocitrate dehydrogenase mutation and chromosome 1p/19q codeletion

BMC Bioinformatics ◽

10.1186/s12859-019-3236-0 ◽

2019 ◽

Vol 20 (S19) ◽

Cited By ~ 1

Author(s):

Sean Chun-Chang Chen ◽

Chung-Ming Lo ◽

Shih-Hua Wang ◽

Emily Chia-Yu Su

Keyword(s):

Random Forest ◽

Rna Editing ◽

Isocitrate Dehydrogenase ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Support Vector ◽

Genome Wide ◽

Diffuse Gliomas ◽

Editing Level

Abstract Background Accurate classification of diffuse gliomas, the most common tumors of the central nervous system in adults, is important for appropriate treatment. However, detection of isocitrate dehydrogenase (IDH) mutation and chromosome1p/19q codeletion, biomarkers to classify gliomas, is time- and cost-intensive and diagnostic discordance remains an issue. Adenosine to inosine (A-to-I) RNA editing has emerged as a novel cancer prognostic marker, but its value for glioma classification remains largely unexplored. We aim to (1) unravel the relationship between RNA editing and IDH mutation and 1p/19q codeletion and (2) predict IDH mutation and 1p/19q codeletion status using machine learning algorithms. Results By characterizing genome-wide A-to-I RNA editing signatures of 638 gliomas, we found that tumors without IDH mutation exhibited higher total editing level compared with those carrying it (Kolmogorov-Smirnov test, p < 0.0001). When tumor grade was considered, however, only grade IV tumors without IDH mutation exhibited higher total editing level. According to 10-fold cross-validation, support vector machines (SVM) outperformed random forest and AdaBoost (DeLong test, p < 0.05). The area under the receiver operating characteristic curve (AUC) of SVM in predicting IDH mutation and 1p/19q codeletion were 0.989 and 0.990, respectively. After performing feature selection, AUCs of SVM and AdaBoost in predicting IDH mutation were higher than that of random forest (0.985 and 0.983 vs. 0.977; DeLong test, p < 0.05), but AUCs of the three algorithms in predicting 1p/19q codeletion were similar (0.976–0.982). Furthermore, 67% of the six continuously misclassified samples by our 1p/19q codeletion prediction models were misclassifications in the original labelling after inspection of 1p/19q status and/or pathology report, highlighting the accuracy and clinical utility of our models. Conclusions The study represents the first genome-wide analysis of glioma editome and identifies RNA editing as a novel prognostic biomarker for glioma. Our prediction models provide standardized, accurate, reproducible and objective classification of gliomas. Our models are not only useful in clinical decision-making, but also able to identify editing events that have the potential to serve as biomarkers and therapeutic targets in glioma management and treatment.

Download Full-text

The analysis of VERITAS muon images using convolutional neural networks

Proceedings of the International Astronomical Union ◽

10.1017/s1743921316012734 ◽

2016 ◽

Vol 12 (S325) ◽

pp. 173-179 ◽

Cited By ~ 5

Author(s):

Qi Feng ◽

Tony T. Y. Lin ◽

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Gamma Ray ◽

Cosmic Ray ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Cherenkov Telescopes ◽

Small Set ◽

Regression Problems

AbstractImaging atmospheric Cherenkov telescopes (IACTs) are sensitive to rare gamma-ray photons, buried in the background of charged cosmic-ray (CR) particles, the flux of which is several orders of magnitude greater. The ability to separate gamma rays from CR particles is important, as it is directly related to the sensitivity of the instrument. This gamma-ray/CR-particle classification problem in IACT data analysis can be treated with the rapidly-advancing machine learning algorithms, which have the potential to outperform the traditional box-cut methods on image parameters. We present preliminary results of a precise classification of a small set of muon events using a convolutional neural networks model with the raw images as input features. We also show the possibility of using the convolutional neural networks model for regression problems, such as the radius and brightness measurement of muon events, which can be used to calibrate the throughput efficiency of IACTs.

Download Full-text