Fujitsu, MIT research leads to AI models that recognize unseen data

Since bearing deterioration patterns are difficult to collect from real, long lifetime scenarios, data-driven research has been directed towards recovering them by imposing accelerated life tests. Consequently, insufficiently recovered features due to rapid damage propagation seem more likely to lead to poorly generalized learning machines. Knowledge-driven learning comes as a solution by providing prior assumptions from transfer learning. Likewise, the absence of true labels was able to create inconsistency related problems between samples, and teacher-given label behaviors led to more ill-posed predictors. Therefore, in an attempt to overcome the incomplete, unlabeled data drawbacks, a new autoencoder has been designed as an additional source that could correlate inputs and labels by exploiting label information in a completely unsupervised learning scheme. Additionally, its stacked denoising version seems to more robustly be able to recover them for new unseen data. Due to the non-stationary and sequentially driven nature of samples, recovered representations have been fed into a transfer learning, convolutional, long–short-term memory neural network for further meaningful learning representations. The assessment procedures were benchmarked against recent methods under different training datasets. The obtained results led to more efficiency confirming the strength of the new learning path.

Download Full-text

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data

Proceedings of the Workshop on Human-In-the-Loop Data Analytics - HILDA'19 ◽

10.1145/3328519.3329126 ◽

2019 ◽

Author(s):

Sergey Redyuk ◽

Sebastian Schelter ◽

Tammo Rukat ◽

Volker Markl ◽

Felix Biessmann

Keyword(s):

Machine Learning ◽

Black Box ◽

Learning Models ◽

Unseen Data ◽

Machine Learning Models

Download Full-text

Predicting Sooting Propensity of Oxygenated Fuels Using Artificial Neural Networks

Processes ◽

10.3390/pr9061070 ◽

2021 ◽

Vol 9 (6) ◽

pp. 1070

Author(s):

Abdul Gani Abdul Jameel

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Functional Groups ◽

Average Error ◽

Ann Model ◽

Oh Groups ◽

Unseen Data ◽

Learning Capabilities ◽

Artificial Neural ◽

Oxygenated Fuels

The self-learning capabilities of artificial neural networks (ANNs) from large datasets have led to their deployment in the prediction of various physical and chemical phenomena. In the present work, an ANN model was developed to predict the yield sooting index (YSI) of oxygenated fuels using the functional group approach. A total of 265 pure compounds comprising six chemical classes, namely paraffins (n and iso), olefins, naphthenes, aromatics, alcohols, and ethers, were dis-assembled into eight constituent functional groups, namely paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic –CH=CH2 groups, naphthenic CH-CH2 groups, aromatic C-CH groups, alcoholic OH groups, and ether O groups. These functional groups, in addition to molecular weight and branching index, were used as inputs to develop the ANN model. A neural network with two hidden layers was used to train the model using the Levenberg–Marquardt (ML) training algorithm. The developed model was tested with 15% of the random unseen data points. A regression coefficient (R2) of 0.99 was obtained when the experimental values were compared with the predicted YSI values from the test set. An average error of 3.4% was obtained, which is less than the experimental uncertainty associated with most reported YSI measurements. The developed model can be used for YSI prediction of hydrocarbon fuels containing alcohol and ether-based oxygenates as additives with a high degree of accuracy.

Download Full-text

AFibNet: an implementation of atrial fibrillation detection with convolutional neural network

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01571-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Bambang Tutuko ◽

Siti Nurmaini ◽

Alexander Edo Tondas ◽

Muhammad Naufal Rachmatullah ◽

Annisa Darmawahyuni ◽

...

Keyword(s):

Neural Network ◽

Atrial Fibrillation ◽

Convolutional Neural Network ◽

Learning System ◽

Single Frequency ◽

Feature Maps ◽

Normal Sinus ◽

Unseen Data ◽

Specific Device ◽

Model Formation

Abstract Background Generalization model capacity of deep learning (DL) approach for atrial fibrillation (AF) detection remains lacking. It can be seen from previous researches, the DL model formation used only a single frequency sampling of the specific device. Besides, each electrocardiogram (ECG) acquisition dataset produces a different length and sampling frequency to ensure sufficient precision of the R–R intervals to determine the heart rate variability (HRV). An accurate HRV is the gold standard for predicting the AF condition; therefore, a current challenge is to determine whether a DL approach can be used to analyze raw ECG data in a broad range of devices. This paper demonstrates powerful results for end-to-end implementation of AF detection based on a convolutional neural network (AFibNet). The method used a single learning system without considering the variety of signal lengths and frequency samplings. For implementation, the AFibNet is processed with a computational cloud-based DL approach. This study utilized a one-dimension convolutional neural networks (1D-CNNs) model for 11,842 subjects. It was trained and validated with 8232 records based on three datasets and tested with 3610 records based on eight datasets. The predicted results, when compared with the diagnosis results indicated by human practitioners, showed a 99.80% accuracy, sensitivity, and specificity. Result Meanwhile, when tested using unseen data, the AF detection reaches 98.94% accuracy, 98.97% sensitivity, and 98.97% specificity at a sample period of 0.02 seconds using the DL Cloud System. To improve the confidence of the AFibNet model, it also validated with 18 arrhythmias condition defined as Non-AF-class. Thus, the data is increased from 11,842 to 26,349 instances for three-class, i.e., Normal sinus (N), AF and Non-AF. The result found 96.36% accuracy, 93.65% sensitivity, and 96.92% specificity. Conclusion These findings demonstrate that the proposed approach can use unknown data to derive feature maps and reliably detect the AF periods. We have found that our cloud-DL system is suitable for practical deployment

Download Full-text

Detection of Anomalous Diffusion with Deep Residual Networks

Entropy ◽

10.3390/e23060649 ◽

2021 ◽

Vol 23 (6) ◽

pp. 649

Author(s):

Miłosz Gajowczyk ◽

Janusz Szwabiński

Keyword(s):

Image Classification ◽

Anomalous Diffusion ◽

Numerical Experiments ◽

Driving Forces ◽

Living Cells ◽

Diffusion Type ◽

Training Time ◽

Initial Network ◽

Unseen Data ◽

Insight Into

Identification of the diffusion type of molecules in living cells is crucial to deduct their driving forces and hence to get insight into the characteristics of the cells. In this paper, deep residual networks have been used to classify the trajectories of molecules. We started from the well known ResNet architecture, developed for image classification, and carried out a series of numerical experiments to adapt it to detection of diffusion modes. We managed to find a model that has a better accuracy than the initial network, but contains only a small fraction of its parameters. The reduced size significantly shortened the training time of the model. Moreover, the resulting network has less tendency to overfitting and generalizes better to unseen data.

Download Full-text

A review: preprocessing techniques and data augmentation for sentiment analysis

Computational Social Networks ◽

10.1186/s40649-020-00080-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Huu-Thanh Duong ◽

Tram-Anh Nguyen-Thi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Supervised Learning ◽

Data Augmentation ◽

Original Data ◽

Training Data ◽

Unseen Data ◽

Augmentation Techniques ◽

User Intervention

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

Download Full-text

Generation of a Complete Profile for Porosity Log While Drilling Complex Lithology by Employing the Artificial Intelligence

10.2118/208642-ms ◽

2021 ◽

Author(s):

Ahmed Al-Sabaa ◽

Hany Gamal ◽

Salaheldin Elkatatny

Keyword(s):

Artificial Intelligence ◽

Prediction Model ◽

Real Time ◽

Storage Capacity ◽

Data Set ◽

Drilling Parameters ◽

Unseen Data ◽

Rock Porosity ◽

Data Points ◽

Logging Tool

Abstract The formation porosity of drilled rock is an important parameter that determines the formation storage capacity. The common industrial technique for rock porosity acquisition is through the downhole logging tool. Usually logging while drilling, or wireline porosity logging provides a complete porosity log for the section of interest, however, the operational constraints for the logging tool might preclude the logging job, in addition to the job cost. The objective of this study is to provide an intelligent prediction model to predict the porosity from the drilling parameters. Artificial neural network (ANN) is a tool of artificial intelligence (AI) and it was employed in this study to build the porosity prediction model based on the drilling parameters as the weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q). The novel contribution of this study is to provide a rock porosity model for complex lithology formations using drilling parameters in real-time. The model was built using 2,700 data points from well (A) with 74:26 training to testing ratio. Many sensitivity analyses were performed to optimize the ANN model. The model was validated using unseen data set (1,000 data points) of Well (B), which is located in the same field and drilled across the same complex lithology. The results showed the high performance for the model either for training and testing or validation processes. The overall accuracy for the model was determined in terms of correlation coefficient (R) and average absolute percentage error (AAPE). Overall, R was higher than 0.91 and AAPE was less than 6.1 % for the model building and validation. Predicting the rock porosity while drilling in real-time will save the logging cost, and besides, will provide a guide for the formation storage capacity and interpretation analysis.

Download Full-text

Performance Evaluation of Convolutional Neural Network Using Synthetic Medical Data Augmentation Generated by GAN

International Journal of Image and Graphics ◽

10.1142/s021946782350002x ◽

2021 ◽

Author(s):

Ramesh Adhikari ◽

Suresh Pokharel

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Data Augmentation ◽

Medical Diagnostics ◽

Generative Adversarial Networks ◽

Generalization Capability ◽

X Ray ◽

Original Dataset ◽

Unseen Data

Data augmentation is widely used in image processing and pattern recognition problems in order to increase the richness in diversity of available data. It is commonly used to improve the classification accuracy of images when the available datasets are limited. Deep learning approaches have demonstrated an immense breakthrough in medical diagnostics over the last decade. A significant amount of datasets are needed for the effective training of deep neural networks. The appropriate use of data augmentation techniques prevents the model from over-fitting and thus increases the generalization capability of the network while testing afterward on unseen data. However, it remains a huge challenge to obtain such a large dataset from rare diseases in the medical field. This study presents the synthetic data augmentation technique using Generative Adversarial Networks to evaluate the generalization capability of neural networks using existing data more effectively. In this research, the convolutional neural network (CNN) model is used to classify the X-ray images of the human chest in both normal and pneumonia conditions; then, the synthetic images of the X-ray from the available dataset are generated by using the deep convolutional generative adversarial network (DCGAN) model. Finally, the CNN model is trained again with the original dataset and augmented data generated using the DCGAN model. The classification performance of the CNN model is improved by 3.2% when the augmented data were used along with the originally available dataset.

Download Full-text

Prediction of neuro-degenerative disorders using sunflower optimisation algorithm and Kernel extreme learning machine: A case-study with Parkinson’s and Alzheimer’s disease

Proceedings of the Institution of Mechanical Engineers Part H Journal of Engineering in Medicine ◽

10.1177/09544119211060989 ◽

2021 ◽

pp. 095441192110609

Author(s):

Kishore Balasubramanian ◽

Ananthamoorthy NP ◽

Ramya K

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Extreme Learning Machine ◽

Feature Subset ◽

Optimisation Algorithm ◽

Unseen Data ◽

Feature Extraction And Selection ◽

Kernel Extreme Learning Machine ◽

Learning Machine ◽

Optimal Feature

Parkinson’s and Alzheimer’s Disease are believed to be most prevalent and common in older people. Several data-mining approaches are employed on the neuro-degenerative data in predicting the disease. A novel method has been built and developed to diagnose Alzheimer’s (AD) and Parkinson’s (PD) in early stages, which includes image acquisition, pre-processing, feature extraction and selection, followed by classification. The challenge lies in selecting the optimal feature subset for classification. In this work, the Sunflower Optimisation Algorithm (SFO) is employed to select the optimal feature set, which is then fed to the Kernel Extreme Learning Machine (KELM) for classification. The method is tested on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and local dataset for AD, the University of California, Irvine (UCI) machine learning repository and the Istanbul dataset for PD. Experimental outcomes have demonstrated a high accuracy level in both AD and PD diagnosis. For AD diagnosis, the highest classification rate is obtained for the AD versus NC classification using the ADNI dataset (99.32%) and local dataset (98.65%). For PD diagnosis, the highest accuracy of 99.52% and 99.45% is achieved on the UCI and Istanbul datasets, respectively. To show the robustness of the method, the method is compared with other similar methods of feature selection and classification with 10-fold cross-validation (CV) and with unseen data. The method proposed has an excellent prospect, bringing greater convenience to clinicians in making a better solid decision in clinical diagnosis of neuro-degenerative diseases.

Download Full-text

Dynamic versus static neural network model for rainfall forecasting at Klang River Basin, Malaysia

Hydrology and Earth System Sciences ◽

10.5194/hess-16-1151-2012 ◽

2012 ◽

Vol 16 (4) ◽

pp. 1151-1169 ◽

Cited By ~ 34

Author(s):

A. El-Shafie ◽

A. Noureldin ◽

M. Taha ◽

A. Hussain ◽

M. Mukhlisin

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Model ◽

Neural Network Model ◽

Network Architecture ◽

Rainfall Time Series ◽

Hydrological Process ◽

Multi Layer Perceptron ◽

Rainfall Forecasting ◽

Unseen Data

Abstract. Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have an accurate model for rainfall forecasting. Recently, several data-driven modeling approaches have been investigated to perform such forecasting tasks as multi-layer perceptron neural networks (MLP-NN). In fact, the rainfall time series modeling involves an important temporal dimension. On the other hand, the classical MLP-NN is a static and has a memoryless network architecture that is effective for complex nonlinear static mapping. This research focuses on investigating the potential of introducing a neural network that could address the temporal relationships of the rainfall series. Two different static neural networks and one dynamic neural network, namely the multi-layer perceptron neural network (MLP-NN), radial basis function neural network (RBFNN) and input delay neural network (IDNN), respectively, have been examined in this study. Those models had been developed for the two time horizons for monthly and weekly rainfall forecasting at Klang River, Malaysia. Data collected over 12 yr (1997–2008) on a weekly basis and 22 yr (1987–2008) on a monthly basis were used to develop and examine the performance of the proposed models. Comprehensive comparison analyses were carried out to evaluate the performance of the proposed static and dynamic neural networks. Results showed that the MLP-NN neural network model is able to follow trends of the actual rainfall, however, not very accurately. RBFNN model achieved better accuracy than the MLP-NN model. Moreover, the forecasting accuracy of the IDNN model was better than that of static network during both training and testing stages, which proves a consistent level of accuracy with seen and unseen data.

Download Full-text