unseen data
Recently Published Documents


TOTAL DOCUMENTS

317
(FIVE YEARS 237)

H-INDEX

14
(FIVE YEARS 6)

Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 188
Author(s):  
Manohar Karki ◽  
Karthik Kantipudi ◽  
Feng Yang ◽  
Hang Yu ◽  
Yi Xiang J. Wang ◽  
...  

Classification of drug-resistant tuberculosis (DR-TB) and drug-sensitive tuberculosis (DS-TB) from chest radiographs remains an open problem. Our previous cross validation performance on publicly available chest X-ray (CXR) data combined with image augmentation, the addition of synthetically generated and publicly available images achieved a performance of 85% AUC with a deep convolutional neural network (CNN). However, when we evaluated the CNN model trained to classify DR-TB and DS-TB on unseen data, significant performance degradation was observed (65% AUC). Hence, in this paper, we investigate the generalizability of our models on images from a held out country’s dataset. We explore the extent of the problem and the possible reasons behind the lack of good generalization. A comparison of radiologist-annotated lesion locations in the lung and the trained model’s localization of areas of interest, using GradCAM, did not show much overlap. Using the same network architecture, a multi-country classifier was able to identify the country of origin of the X-ray with high accuracy (86%), suggesting that image acquisition differences and the distribution of non-pathological and non-anatomical aspects of the images are affecting the generalization and localization of the drug resistance classification model as well. When CXR images were severely corrupted, the performance on the validation set was still better than 60% AUC. The model overfitted to the data from countries in the cross validation set but did not generalize to the held out country. Finally, we applied a multi-task based approach that uses prior TB lesions location information to guide the classifier network to focus its attention on improving the generalization performance on the held out set from another country to 68% AUC.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 515
Author(s):  
Alireza Salimy ◽  
Imene Mitiche ◽  
Philip Boreham ◽  
Alan Nesbitt ◽  
Gordon Morison

Fault signals in high-voltage (HV) power plant assets are captured using the electromagnetic interference (EMI) technique. The extracted EMI signals are taken under different conditions, introducing varying noise levels to the signals. The aim of this work is to address the varying noise levels found in captured EMI fault signals, using a deep-residual-shrinkage-network (DRSN) that implements shrinkage methods with learned thresholds to carry out de-noising for classification, along with a time-frequency signal decomposition method for feature engineering of raw time-series signals. The approach will be to train and validate several alternative DRSN architectures with previously expertly labeled EMI fault signals, with architectures then being tested on previously unseen data, the signals used will firstly be de-noised and a controlled amount of noise will be added to the signals at various levels. DRSN architectures are assessed based on their testing accuracy in the varying controlled noise levels. Results show DRSN architectures using the newly proposed residual-shrinkage-building-unit-2 (RSBU-2) to outperform the residual-shrinkage-building-unit-1 (RSBU-1) architectures in low signal-to-noise ratios. The findings show that implementing thresholding methods in noise environments provides attractive results and their methods prove to work well with real-world EMI fault signals, proving them to be sufficient for real-world EMI fault classification and condition monitoring.


2022 ◽  
Vol 11 (1) ◽  
pp. 43
Author(s):  
Calimanut-Ionut Cira ◽  
Martin Kada ◽  
Miguel-Ángel Manso-Callejo ◽  
Ramón Alcarria ◽  
Borja Bordel Bordel Sanchez

The road surface area extraction task is generally carried out via semantic segmentation over remotely-sensed imagery. However, this supervised learning task is often costly as it requires remote sensing images labelled at the pixel level, and the results are not always satisfactory (presence of discontinuities, overlooked connection points, or isolated road segments). On the other hand, unsupervised learning does not require labelled data and can be employed for post-processing the geometries of geospatial objects extracted via semantic segmentation. In this work, we implement a conditional Generative Adversarial Network to reconstruct road geometries via deep inpainting procedures on a new dataset containing unlabelled road samples from challenging areas present in official cartographic support from Spain. The goal is to improve the initial road representations obtained with semantic segmentation models via generative learning. The performance of the model was evaluated on unseen data by conducting a metrical comparison where a maximum Intersection over Union (IoU) score improvement of 1.3% was observed when compared to the initial semantic segmentation result. Next, we evaluated the appropriateness of applying unsupervised generative learning using a qualitative perceptual validation to identify the strengths and weaknesses of the proposed method in very complex scenarios and gain a better intuition of the model’s behaviour when performing large-scale post-processing with generative learning and deep inpainting procedures and observed important improvements in the generated data.


2021 ◽  
Author(s):  
René Groh ◽  
Zhengdong Lei ◽  
Lisa Martignetti ◽  
Nicole YK Li-Jessen ◽  
Andreas M Kist

Mobile health wearables are often embedded with small processors for signal acquisition and analysis. These embedded wearable systems are, however, limited with low available memory and computational power. Advances in machine learning, especially deep neural networks (DNNs), have been adopted for efficient and intelligent applications to overcome constrained computational environments. In this study, evolutionary optimized DNNs were analyzed to classify three common airway-related symptoms, namely coughs, throat clears and dry swallows. As opposed to typical microphone-acoustic signals, mechano-acoustic data signals, which did not contain identifiable speech information for better privacy protection, were acquired from laboratory-generated and publicly available datasets. The optimized DNNs had a low footprint of less than 150 kB and predicted airway symptoms of interests with 83.7% accuracy on unseen data. By performing explainable AI techniques, namely occlusion experiments and class activation maps, mel-frequency bands up to 8,000 Hz were found as the most important feature for the classification. We further found that DNN decisions were consistently relying on these specific features, fostering trust and transparency of proposed DNNs. Our proposed efficient and explainable DNN is expected to support edge computing on mechano-acoustic sensing wearables for remote, long-term monitoring of airway symptoms.


Author(s):  
Ramesh Adhikari ◽  
Suresh Pokharel

Data augmentation is widely used in image processing and pattern recognition problems in order to increase the richness in diversity of available data. It is commonly used to improve the classification accuracy of images when the available datasets are limited. Deep learning approaches have demonstrated an immense breakthrough in medical diagnostics over the last decade. A significant amount of datasets are needed for the effective training of deep neural networks. The appropriate use of data augmentation techniques prevents the model from over-fitting and thus increases the generalization capability of the network while testing afterward on unseen data. However, it remains a huge challenge to obtain such a large dataset from rare diseases in the medical field. This study presents the synthetic data augmentation technique using Generative Adversarial Networks to evaluate the generalization capability of neural networks using existing data more effectively. In this research, the convolutional neural network (CNN) model is used to classify the X-ray images of the human chest in both normal and pneumonia conditions; then, the synthetic images of the X-ray from the available dataset are generated by using the deep convolutional generative adversarial network (DCGAN) model. Finally, the CNN model is trained again with the original dataset and augmented data generated using the DCGAN model. The classification performance of the CNN model is improved by 3.2% when the augmented data were used along with the originally available dataset.


Cells ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 85
Author(s):  
Julie Sparholt Walbech ◽  
Savvas Kinalis ◽  
Ole Winther ◽  
Finn Cilius Nielsen ◽  
Frederik Otzen Bagger

Autoencoders have been used to model single-cell mRNA-sequencing data with the purpose of denoising, visualization, data simulation, and dimensionality reduction. We, and others, have shown that autoencoders can be explainable models and interpreted in terms of biology. Here, we show that such autoencoders can generalize to the extent that they can transfer directly without additional training. In practice, we can extract biological modules, denoise, and classify data correctly from an autoencoder that was trained on a different dataset and with different cells (a foreign model). We deconvoluted the biological signal encoded in the bottleneck layer of scRNA-models using saliency maps and mapped salient features to biological pathways. Biological concepts could be associated with specific nodes and interpreted in relation to biological pathways. Even in this unsupervised framework, with no prior information about cell types or labels, the specific biological pathways deduced from the model were in line with findings in previous research. It was hypothesized that autoencoders could learn and represent meaningful biology; here, we show with a systematic experiment that this is true and even transcends the training data. This means that carefully trained autoencoders can be used to assist the interpretation of new unseen data.


Metals ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 50
Author(s):  
Nithin Konda ◽  
Raviraj Verma ◽  
Rengaswamy Jayaganthan

The present work focusses on machine learning assisted predictions of the fatigue crack growth rate (FCGR) of Ti6Al4V (Ti64) processed through laser powder bed fusion (L-PBF) and post processing. Various machine learning techniques have provided a flexible approach for explaining the complex mathematical interrelationship among processing-structure-property of the materials. In the present work, four machine learning (ML) algorithms, such as K- Nearest Neighbor (KNN), Decision Trees (DT), Random Forests (RF), and Extreme Gradient Boosting (XGB) algorithms are implemented to analyze the Fatigue Crack growth rate (FCGR) of Ti64 alloy. After tuning the hyper parameters for these algorithms, the trained models were found to estimate the unseen data as equally well as the trained data. The four tested ML models are compared with each other over the training as well as testing phase, based on their mean squared error and R2 scores. Extreme Gradient Boosting has performed better for the FCGR predictions providing least mean squared errors and higher R2 scores compared to other models.


Author(s):  
Kishore Balasubramanian ◽  
Ananthamoorthy NP ◽  
Ramya K

Parkinson’s and Alzheimer’s Disease are believed to be most prevalent and common in older people. Several data-mining approaches are employed on the neuro-degenerative data in predicting the disease. A novel method has been built and developed to diagnose Alzheimer’s (AD) and Parkinson’s (PD) in early stages, which includes image acquisition, pre-processing, feature extraction and selection, followed by classification. The challenge lies in selecting the optimal feature subset for classification. In this work, the Sunflower Optimisation Algorithm (SFO) is employed to select the optimal feature set, which is then fed to the Kernel Extreme Learning Machine (KELM) for classification. The method is tested on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and local dataset for AD, the University of California, Irvine (UCI) machine learning repository and the Istanbul dataset for PD. Experimental outcomes have demonstrated a high accuracy level in both AD and PD diagnosis. For AD diagnosis, the highest classification rate is obtained for the AD versus NC classification using the ADNI dataset (99.32%) and local dataset (98.65%). For PD diagnosis, the highest accuracy of 99.52% and 99.45% is achieved on the UCI and Istanbul datasets, respectively. To show the robustness of the method, the method is compared with other similar methods of feature selection and classification with 10-fold cross-validation (CV) and with unseen data. The method proposed has an excellent prospect, bringing greater convenience to clinicians in making a better solid decision in clinical diagnosis of neuro-degenerative diseases.


2021 ◽  
Author(s):  
Mattia Martinelli ◽  
Ivo Colombo ◽  
Eliana Rosa Russo

Abstract The aim of this work is the development of a fast and reliable method for geomechanical parameters evaluation while drilling using surface logging data. Geomechanical parameters are usually evaluated from cores or sonic logs, which are typically expensive and sometimes difficult to obtain. A novel approach is here proposed, where machine learning algorithms are used to calculate the Young's Modulus from drilling parameters and the gamma ray log. The proposed method combines typical mud logging drilling data (ROP, RPM, Torque, Flow measurements, WOB and SPP), XRF data and well log data (Sonic logs, Bulk Density, Gamma Ray) with several machine learning techniques. The models were trained and tested on data coming from three wells drilled in the same basin in Kuwait, in the same geological units but in different reservoirs. Sonic logs and bulk density are used to evaluate the geomechanical parameters (e.g. Young's Modulus) and to train the model. The training phase and the hyperparameter tuning were performed using data coming from a single well. The model was then tested against previously unseen data coming from the other two wells. The trained model is able to predict the Young's modulus in the test wells with a root mean squared error around 12 GPa. The example here provided demonstrates that a model trained with drilling parameters and gamma ray coming from one well is able to predict the Young Modulus of different wells in the same basin. These outcomes highlight the potentiality of this procedure and point out several implications for the reservoir characterization. Indeed, once the model has been trained, it is possible to predict the Young's Modulus in different wells of the same basin using only surface logging data.


Sign in / Sign up

Export Citation Format

Share Document