Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data

PLoS ONE ◽

10.1371/journal.pone.0258326 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0258326

Author(s):

Wen Bo Liu ◽

Sheng Nan Liang ◽

Xi Wen Qin

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Random Forest ◽

Dimension Reduction ◽

Principal Component ◽

Classification Performance ◽

Kernel Functions ◽

Reduction Algorithm ◽

Expression Data ◽

Weighted Kernel

Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.

Download Full-text

Reliable Identification of Oolong Tea Species: Nondestructive Testing Classification Based on Fluorescence Hyperspectral Technology and Machine Learning

Agriculture ◽

10.3390/agriculture11111106 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1106

Author(s):

Yan Hu ◽

Lijia Xu ◽

Peng Huang ◽

Xiong Luo ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Principal Component ◽

Classification Model ◽

Recursive Feature Elimination ◽

Support Vector ◽

K Nearest Neighbor ◽

Oolong Tea ◽

The Impact ◽

T Distribution

A rapid and nondestructive tea classification method is of great significance in today’s research. This study uses fluorescence hyperspectral technology and machine learning to distinguish Oolong tea by analyzing the spectral features of tea in the wavelength ranging from 475 to 1100 nm. The spectral data are preprocessed by multivariate scattering correction (MSC) and standard normal variable (SNV), which can effectively reduce the impact of baseline drift and tilt. Then principal component analysis (PCA) and t-distribution random neighborhood embedding (t-SNE) are adopted for feature dimensionality reduction and visual display. Random Forest-Recursive Feature Elimination (RF-RFE) is used for feature selection. Decision Tree (DT), Random Forest Classification (RFC), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are used to establish the classification model. The results show that MSC-RF-RFE-SVM is the best model for the classification of Oolong tea in which the accuracy of the training set and test set is 100% and 98.73%, respectively. It can be concluded that fluorescence hyperspectral technology and machine learning are feasible to classify Oolong tea.

Download Full-text

Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data

Atmosphere ◽

10.3390/atmos12070846 ◽

2021 ◽

Vol 12 (7) ◽

pp. 846

Author(s):

Ilseok Noh ◽

Hae-Won Doh ◽

Soo-Ock Kim ◽

Su-Hyun Kim ◽

Seoleun Shin ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Digital Camera ◽

Image Data ◽

Classification Performance ◽

Classification Model ◽

Support Vector ◽

Observation Data ◽

Freezing Resistance

Spring frosts damage crops that have weakened freezing resistance after germination. We developed a machine learning (ML)-based frost-classification model and optimized it for orchard farming environments. First, logistic regression, decision tree, random forest, and support vector machine models were trained using balanced Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS) frost observation data for March from the last 10 years (2008–2017). Random forest and support vector machine models showed good classification performance and were selected as the main techniques, which were optimized for orchard fields based on initial frost occurrence times. The training period was then extended to March–April for 20 years (2000–2019). Finally, the model was applied to the KMA ASOS frost observation data from March to April 2020, which were not used in the previous steps, and RGB data were extracted by digital cameras installed in an orchard in Gyeonggi-do. The developed model successfully classified 117 of 139 frost observation cases from the domestic ASOS data and 35 of 37 orchard camera observations. The assumption of the initial frost occurrence time for training helped the most in improving the frost-classification model. These results clearly indicate that the frost-classification model using ML has applicable accuracy in orchard farming.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Machine learning, transcriptome, and genotyping chip analyses provide insights into SNP markers identifying flower color in Platycodon grandiflorus

Scientific Reports ◽

10.1038/s41598-021-87281-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Go-Eun Yu ◽

Younhee Shin ◽

Sathiyamoorthy Subramaniyam ◽

Sang-Ho Kang ◽

Si-Myung Lee ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Feature Selection Method ◽

Flower Color ◽

Classification Performance ◽

Snp Markers ◽

Rna Seq ◽

Color Classification ◽

Dna Pcr ◽

Selection Of

AbstractBellflower is an edible ornamental gardening plant in Asia. For predicting the flower color in bellflower plants, a transcriptome-wide approach based on machine learning, transcriptome, and genotyping chip analyses was used to identify SNP markers. Six machine learning methods were deployed to explore the classification potential of the selected SNPs as features in two datasets, namely training (60 RNA-Seq samples) and validation (480 Fluidigm chip samples). SNP selection was performed in sequential order. Firstly, 96 SNPs were selected from the transcriptome-wide SNPs using the principal compound analysis (PCA). Then, 9 among 96 SNPs were later identified using the Random forest based feature selection method from the Fluidigm chip dataset. Among six machines, the random forest (RF) model produced higher classification performance than the other models. The 9 SNP marker candidates selected for classifying the flower color classification were verified using the genomic DNA PCR with Sanger sequencing. Our results suggest that this methodology could be used for future selection of breeding traits even though the plant accessions are highly heterogeneous.

Download Full-text

Phybrata Sensors and Machine Learning for Enhanced Neurophysiological Diagnosis and Treatment

Sensors ◽

10.3390/s21217417 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7417

Author(s):

Alex J. Hope ◽

Utkarsh Vashisth ◽

Matthew J. Parker ◽

Andreas B. Ralston ◽

Joshua M. Roper ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Use Case ◽

Signal Features ◽

Test Population

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.

Download Full-text

The prototype device for non-invasive diagnosis of arteriovenous fistula condition using machine learning methods

Scientific Reports ◽

10.1038/s41598-020-72336-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Marcin Grochowina ◽

Lucyna Leniowska ◽

Agnieszka Gala-Błądzińska

Keyword(s):

Machine Learning ◽

Arteriovenous Fistula ◽

Low Cost ◽

Principal Component ◽

Special Kind ◽

Classification Model ◽

Supervised Machine Learning ◽

Signal Acquisition ◽

Non Invasive ◽

Prototype Device

Abstract Pattern recognition and automatic decision support methods provide significant advantages in the area of health protection. The aim of this work is to develop a low-cost tool for monitoring arteriovenous fistula (AVF) with the use of phono-angiography method. This article presents a developed and diagnostic device that implements classification algorithms to identify 38 patients with end stage renal disease, chronically hemodialysed using an AVF, at risk of vascular access stenosis. We report on the design, fabrication, and preliminary testing of a prototype device for non-invasive diagnosis which is very important for hemodialysed patients. The system includes three sub-modules: AVF signal acquisition, information processing and classification and a unit for presenting results. This is a non-invasive and inexpensive procedure for evaluating the sound pattern of bruit produced by AVF. With a special kind of head which has a greater sensitivity than conventional stethoscope, a sound signal from fistula was recorded. The proces of signal acquisition was performed by a dedicated software, written specifically for the purpose of our study. From the obtained phono-angiogram, 23 features were isolated for vectors used in a decision-making algorithm, including 6 features based on the waveform of time domain, and 17 features based on the frequency spectrum. Final definition of the feature vector composition was obtained by using several selection methods: the feature-class correlation, forward search, Principal Component Analysis and Joined-Pairs method. The supervised machine learning technique was then applied to develop the best classification model.

Download Full-text

Distinguishing Focal Cortical Dysplasia From Glioneuronal Tumors in Patients With Epilepsy by Machine Learning

Frontiers in Neurology ◽

10.3389/fneur.2020.548305 ◽

2020 ◽

Vol 11 ◽

Author(s):

Yi Guo ◽

Yushan Liu ◽

Wenjie Ming ◽

Zhongjin Wang ◽

Junming Zhu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Focal Cortical Dysplasia ◽

Cortical Dysplasia ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Seizure Onset ◽

Glioneuronal Tumors ◽

Patients With Epilepsy

Purpose: We are aiming to build a supervised machine learning-based classifier, in order to preoperatively distinguish focal cortical dysplasia (FCD) from glioneuronal tumors (GNTs) in patients with epilepsy.Methods: This retrospective study was comprised of 96 patients who underwent epilepsy surgery, with the final neuropathologic diagnosis of either an FCD or GNTs. Seven classical machine learning algorithms (i.e., Random Forest, SVM, Decision Tree, Logistic Regression, XGBoost, LightGBM, and CatBoost) were employed and trained by our dataset to get the classification model. Ten features [i.e., Gender, Past history, Age at seizure onset, Course of disease, Seizure type, Seizure frequency, Scalp EEG biomarkers, MRI features, Lesion location, Number of antiepileptic drug (AEDs)] were analyzed in our study.Results: We enrolled 56 patients with FCD and 40 patients with GNTs, which included 29 with gangliogliomas (GGs) and 11 with dysembryoplasic neuroepithelial tumors (DNTs). Our study demonstrated that the Random Forest-based machine learning model offered the best predictive performance on distinguishing the diagnosis of FCD from GNTs, with an F1-score of 0.9180 and AUC value of 0.9340. Furthermore, the most discriminative factor between FCD and GNTs was the feature “age at seizure onset” with the Chi-square value of 1,213.0, suggesting that patients who had a younger age at seizure onset were more likely to be diagnosed as FCD.Conclusion: The Random Forest-based machine learning classifier can accurately differentiate FCD from GNTs in patients with epilepsy before surgery. This might lead to improved clinician confidence in appropriate surgical planning and treatment outcomes.

Download Full-text

CPT Data Interpretation Employing Different Machine Learning Techniques

Geosciences ◽

10.3390/geosciences11070265 ◽

2021 ◽

Vol 11 (7) ◽

pp. 265

Author(s):

Stefan Rauter ◽

Franz Tschuchnigg

Keyword(s):

Machine Learning ◽

Grain Size ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Cone Penetration ◽

Tip Resistance ◽

Machine Learning Models

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Download Full-text

METHOD FOR EXPERIMENTAL DETERMINATION OF THE LIQUID DIELECTRIC RESOURCE AND MEASURES FOR ITS RESTORATION

Kontrol Diagnostika ◽

10.14489/td.2019.08.pp.036-041 ◽

2019 ◽

pp. 36-41

Author(s):

S. P. Vysogorets ◽

A. N. Nazarychev ◽

A. A. Pugechov

Keyword(s):

Experimental Determination ◽

Oil Quality ◽

Technical Problem ◽

Transformer Oil ◽

Quality Stability ◽

Transformer Oils ◽

Resource Characteristics ◽

Theoretical Foundations

The theoretical foundations of changes in the transformer oil quality characteristics, depending on the aging degree, are presented. The introduction of a new indicator of the exploitational transformer oils quality – “stability against oxidation” – is substantiated as a way of solving the scientific and technical problem of assessing the resource characteristics of a transformer insulating system. In order to select the best measures to maintain the quality of power transformers insulating oils, a newly developed “Method for the Experimental Determination of the Luquid Dielectric Resource and Measures for its Restoration” is presented.

Download Full-text