scholarly journals Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets

2020 ◽  
Vol 10 (23) ◽  
pp. 8481
Author(s):  
Cesar Federico Caiafa ◽  
Jordi Solé-Casals ◽  
Pere Marti-Puig ◽  
Sun Zhe ◽  
Toshihisa Tanaka

In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.

Author(s):  
Ivan Herreros

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.


Author(s):  
Sherif Kamel ◽  
Rehab Al-harbi

The rapid growth in the number of autism disorder among toddlers needs for the development of easily implemented and effective screening methods. In this current era, the causes of Autism Spectrum Disorder (ASD) do not know yet, however, the diagnosis and detection of ASD is based on behaviours and symptoms. This paper aims to improve ASD disease prediction accuracy among toddlers by using the Logistic Regression model of Machine Learning, through the collected health care dataset and by using an algorithm for rapid classification of the behaviours to check whether the children are having autism diseases or not according to information in the dataset. Therefore, Machine Learning decreasing the time needed to detect the disorder, then providing the necessary health services early for infected toddlers to enhance their lifestyle. In healthcare, most machine learning applications are in the research stage, and to take the advantage of emerging software tools that incorporate artificial intelligence, healthcare organizations first need to overcome a variety of challenges.


2019 ◽  
Vol 43 (4) ◽  
pp. 677-691
Author(s):  
A.A. Sirota ◽  
A.O. Donskikh ◽  
A.V. Akimov ◽  
D.A. Minakov

A problem of non-parametric multivariate density estimation for machine learning and data augmentation is considered. A new mixed density estimation method based on calculating the convolution of independently obtained kernel density estimates for unknown distributions of informative features and a known (or independently estimated) density for non-informative interference occurring during measurements is proposed. Properties of the mixed density estimates obtained using this method are analyzed. The method is compared with a conventional Parzen-Rosenblatt window method applied directly to the training data. The equivalence of the mixed kernel density estimator and the data augmentation procedure based on the known (or estimated) statistical model of interference is theoretically and experimentally proven. The applicability of the mixed density estimators for training of machine learning algorithms for the classification of biological objects (elements of grain mixtures) based on spectral measurements in the visible and near-infrared regions is evaluated.


Author(s):  
Sridharan Naveen Venkatesh ◽  
Vaithiyanathan Sugumaran

Fault diagnosis plays a significant role in enhancing the useful lifetime, power output, and reliability of photovoltaic modules (PVM). Visual faults such as burn marks, delamination, discoloration, glass breakage, and snail trails make detection of faults difficult under harsh environmental conditions. Various researchers have made several attempts to identify visual faults in a PVM. However, much of the previous studies were centered on the identification and analysis of limited number of faults. This article presents the use of a deep convolutional neural network (CNN) to extract image features and perform an effective classification of faults by machine learning (ML) algorithms. In contrast to the present-day work, five different fault conditions were considered in the study. The proposed solution consists of three phases, to effectively analyze various PVM defects. First, the module images are acquired using unmanned aerial vehicles (UAVs) and data augmentation is performed to generate a uniform dataset. Afterward, a pre-trained deep CNN is adopted for image feature extraction. Finally, the extracted image features are classified with the help of various ML classifiers. The final results show the effectiveness of pre-trained deep CNN and accurate performance of ML classifiers. The best-in-class ML classifier for multiple fault classification is suggested based on the performance comparison.


2021 ◽  
Author(s):  
Arif Jahangir

Traumatic Brain Injury is the primary cause of death and disability all over the world. Monitoring the intracranial pressure (ICP) and classifying it for hypertension signals is of crucial importance. This thesis explores the possibility of a better classification of the ICP signal and detection of hypertensive signal prior to the actual occurrence of the hypertensive episodes. This study differ from other approaches astime series is converted into images by Gramian angular field and Markov transition matrix and augmented with data. Due to unbalanced data, the effect of smote extended nearest neighbour algorithm for balancing the data is examined. We use various machine learning algorithms to classify the ICP signals. The results obtained shoe that Ada boost performance is the best among compared algorithms. F1 score of the Ada boost is 0.95 on original dataset, and 0.9967 on balanced and augmented dataset. Quadratic Discriminant Analysis F1 score is 1 when data is augmented and balanced.


Author(s):  
Ozlem Karabiber Cura ◽  
Gulce Cosku Yilmaz ◽  
Hatice Sabiha Ture ◽  
Aydin Akan

2018 ◽  
Vol 77 (16) ◽  
pp. 21305-21327 ◽  
Author(s):  
Eltaf Abdalsalam Mohamed ◽  
Mohd Zuki Yusoff ◽  
Aamir Saeed Malik ◽  
Mohammad Rida Bahloul ◽  
Dalia Mahmoud Adam ◽  
...  

2020 ◽  
Vol 15 ◽  
pp. 1-9
Author(s):  
Paulo Cesar Ossani ◽  
Diogo Francisco Rossoni ◽  
Marcelo Ângelo Cirillo ◽  
Flávio Meira Borém

Sign in / Sign up

Export Citation Format

Share Document