Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds

Finding a small subset of data whose linear combination spans other data points, also called column subset selection problem (CSSP), is an important open problem in computer science with many applications in computer vision and deep learning. There are some studies that solve CSSP in a polynomial time complexity w.r.t. the size of the original dataset. A simple and efficient selection algorithm with a linear complexity order, referred to as spectrum pursuit (SP), is proposed that pursuits spectral components of the dataset using available sample points. The proposed non-greedy algorithm aims to iteratively find K data samples whose span is close to that of the first K spectral components of entire data. SP has no parameter to be fine tuned and this desirable property makes it problem-independent. The simplicity of SP enables us to extend the underlying linear model to more complex models such as nonlinear manifolds and graph-based models. The nonlinear extension of SP is introduced as kernel-SP (KSP). The superiority of the proposed algorithms is demonstrated in a wide range of applications.

Download Full-text

Select to better learn: Fast and accurate deep learning using data selection from nonlinear manifolds

10.36227/techrxiv.12084027.v1 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Weijia Wang ◽

Nazanin Rahnavard ◽

...

Keyword(s):

Deep Learning ◽

Small Subset ◽

Original Dataset ◽

Wide Range ◽

Spectral Components ◽

Column Subset Selection ◽

Important Open Problem ◽

Data Points ◽

Using Data ◽

Nonlinear Manifolds

Finding a small subset of data whose linear combination spans other data points, also called column subset selection problem (CSSP), is an important open problem in computer science with many applications in computer vision and deep learning. There are some studies that solve CSSP in a polynomial time complexity w.r.t. the size of the original dataset. A simple and efficient selection algorithm with a linear complexity order, referred to as spectrum pursuit (SP), is proposed that pursuits spectral components of the dataset using available sample points. The proposed non-greedy algorithm aims to iteratively find K data samples whose span is close to that of the first K spectral components of entire data. SP has no parameter to be fine tuned and this desirable property makes it problem-independent. The simplicity of SP enables us to extend the underlying linear model to more complex models such as nonlinear manifolds and graph-based models. The nonlinear extension of SP is introduced as kernel-SP (KSP). The superiority of the proposed algorithms is demonstrated in a wide range of applications.

Download Full-text

Feasibility of Continual Deep Learning-Based Segmentation for Personalized Adaptive Radiation Therapy in Head and Neck Area

Cancers ◽

10.3390/cancers13040702 ◽

2021 ◽

Vol 13 (4) ◽

pp. 702

Author(s):

Nalee Kim ◽

Jaehee Chun ◽

Jee Suk Chang ◽

Chang Geol Lee ◽

Ki Chang Keum ◽

...

Keyword(s):

Deep Learning ◽

Head And Neck ◽

Organs At Risk ◽

Subjective Assessment ◽

Turing Test ◽

Similar Rate ◽

Dice Similarity Coefficient ◽

Adaptive Planning ◽

Test Set ◽

Using Data

This study investigated the feasibility of deep learning-based segmentation (DLS) and continual training for adaptive radiotherapy (RT) of head and neck (H&N) cancer. One-hundred patients treated with definitive RT were included. Based on 23 organs-at-risk (OARs) manually segmented in initial planning computed tomography (CT), modified FC-DenseNet was trained for DLS: (i) using data obtained from 60 patients, with 20 matched patients in the test set (DLSm); (ii) using data obtained from 60 identical patients with 20 unmatched patients in the test set (DLSu). Manually contoured OARs in adaptive planning CT for independent 20 patients were provided as test sets. Deformable image registration (DIR) was also performed. All 23 OARs were compared using quantitative measurements, and nine OARs were also evaluated via subjective assessment from 26 observers using the Turing test. DLSm achieved better performance than both DLSu and DIR (mean Dice similarity coefficient; 0.83 vs. 0.80 vs. 0.70), mainly for glandular structures, whose volume significantly reduced during RT. Based on subjective measurements, DLS is often perceived as a human (49.2%). Furthermore, DLSm is preferred over DLSu (67.2%) and DIR (96.7%), with a similar rate of required revision to that of manual segmentation (28.0% vs. 29.7%). In conclusion, DLS was effective and preferred over DIR. Additionally, continual DLS training is required for an effective optimization and robustness in personalized adaptive RT.

Download Full-text

Deep learning for cephalometric landmark detection: systematic review and meta-analysis

Clinical Oral Investigations ◽

10.1007/s00784-021-03990-w ◽

2021 ◽

Author(s):

Falk Schwendicke ◽

Akhilanand Chaurasia ◽

Lubaina Arsiwala ◽

Jae-Hong Lee ◽

Karim Elhennawy ◽

...

Keyword(s):

Systematic Review ◽

Deep Learning ◽

Meta Analysis ◽

High Accuracy ◽

Risk Of Bias ◽

Automated Detection ◽

Reference Test ◽

Landmark Detection ◽

Future Studies ◽

Using Data

Abstract Objectives Deep learning (DL) has been increasingly employed for automated landmark detection, e.g., for cephalometric purposes. We performed a systematic review and meta-analysis to assess the accuracy and underlying evidence for DL for cephalometric landmark detection on 2-D and 3-D radiographs. Methods Diagnostic accuracy studies published in 2015-2020 in Medline/Embase/IEEE/arXiv and employing DL for cephalometric landmark detection were identified and extracted by two independent reviewers. Random-effects meta-analysis, subgroup, and meta-regression were performed, and study quality was assessed using QUADAS-2. The review was registered (PROSPERO no. 227498). Data From 321 identified records, 19 studies (published 2017–2020), all employing convolutional neural networks, mainly on 2-D lateral radiographs (n=15), using data from publicly available datasets (n=12) and testing the detection of a mean of 30 (SD: 25; range.: 7–93) landmarks, were included. The reference test was established by two experts (n=11), 1 expert (n=4), 3 experts (n=3), and a set of annotators (n=1). Risk of bias was high, and applicability concerns were detected for most studies, mainly regarding the data selection and reference test conduct. Landmark prediction error centered around a 2-mm error threshold (mean; 95% confidence interval: (–0.581; 95 CI: –1.264 to 0.102 mm)). The proportion of landmarks detected within this 2-mm threshold was 0.799 (0.770 to 0.824). Conclusions DL shows relatively high accuracy for detecting landmarks on cephalometric imagery. The overall body of evidence is consistent but suffers from high risk of bias. Demonstrating robustness and generalizability of DL for landmark detection is needed. Clinical significance Existing DL models show consistent and largely high accuracy for automated detection of cephalometric landmarks. The majority of studies so far focused on 2-D imagery; data on 3-D imagery are sparse, but promising. Future studies should focus on demonstrating generalizability, robustness, and clinical usefulness of DL for this objective.

Download Full-text

Data-driven method for training data selection for deep learning

10.3997/2214-4609.202112817 ◽

2021 ◽

Author(s):

C. Lacombe ◽

I. Hammoud ◽

J. Messud ◽

H. Peng ◽

T. Lesieur ◽

...

Keyword(s):

Deep Learning ◽

Training Data ◽

Data Selection ◽

Data Driven ◽

Selection For ◽

Training Data Selection

Download Full-text

Reliable Deep Learning–Based Detection of Misplaced Chest Electrodes During Electrocardiogram Recording: Algorithm Development and Validation

JMIR Medical Informatics ◽

10.2196/25347 ◽

2021 ◽

Vol 9 (4) ◽

pp. e25347

Author(s):

Khaled Rjoob ◽

Raymond Bond ◽

Dewar Finlay ◽

Victoria McGilligan ◽

Stephen J Leslie ◽

...

Keyword(s):

Deep Learning ◽

Ventricular Hypertrophy ◽

Left Ventricular ◽

Intercostal Space ◽

The Past ◽

Body Surface Potential Maps ◽

Body Surface Potential ◽

Using Data ◽

Development And Validation ◽

Electrocardiogram Ecg

Background A 12-lead electrocardiogram (ECG) is the most commonly used method to diagnose patients with cardiovascular diseases. However, there are a number of possible misinterpretations of the ECG that can be caused by several different factors, such as the misplacement of chest electrodes. Objective The aim of this study is to build advanced algorithms to detect precordial (chest) electrode misplacement. Methods In this study, we used traditional machine learning (ML) and deep learning (DL) to autodetect the misplacement of electrodes V1 and V2 using features from the resultant ECG. The algorithms were trained using data extracted from high-resolution body surface potential maps of patients who were diagnosed with myocardial infarction, diagnosed with left ventricular hypertrophy, or a normal ECG. Results DL achieved the highest accuracy in this study for detecting V1 and V2 electrode misplacement, with an accuracy of 93.0% (95% CI 91.46-94.53) for misplacement in the second intercostal space. The performance of DL in the second intercostal space was benchmarked with physicians (n=11 and age 47.3 years, SD 15.5) who were experienced in reading ECGs (mean number of ECGs read in the past year 436.54, SD 397.9). Physicians were poor at recognizing chest electrode misplacement on the ECG and achieved a mean accuracy of 60% (95% CI 56.09-63.90), which was significantly poorer than that of DL (P<.001). Conclusions DL provides the best performance for detecting chest electrode misplacement when compared with the ability of experienced physicians. DL and ML could be used to help flag ECGs that have been incorrectly recorded and flag that the data may be flawed, which could reduce the number of erroneous diagnoses.

Download Full-text

SIPEC: the deep-learning Swiss knife for behavioral data analysis

10.1101/2020.10.26.355115 ◽

2020 ◽

Author(s):

Markus Marks ◽

Jin Qiuhan ◽

Oliver Sturman ◽

Lukas von Ziegler ◽

Sepp Kollmorgen ◽

...

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Animal Behavior ◽

Home Cage ◽

Complex Environments ◽

Or Groups ◽

Multiple Behaviors ◽

Freely Moving ◽

Using Data

ABSTRACTAnalysing the behavior of individuals or groups of animals in complex environments is an important, yet difficult computer vision task. Here we present a novel deep learning architecture for classifying animal behavior and demonstrate how this end-to-end approach can significantly outperform pose estimation-based approaches, whilst requiring no intervention after minimal training. Our behavioral classifier is embedded in a first-of-its-kind pipeline (SIPEC) which performs segmentation, identification, pose-estimation and classification of behavior all automatically. SIPEC successfully recognizes multiple behaviors of freely moving mice as well as socially interacting nonhuman primates in 3D, using data only from simple mono-vision cameras in home-cage setups.

Download Full-text

Application of Machine Learning Algorithms to Depression Screening and Attempt at Pattern Extraction of Patient-Reported Outcomes that Negatively Affect Classification Accuracy (Preprint)

10.2196/preprints.8618 ◽

2017 ◽

Author(s):

Junetae Kim ◽

Byungtae Lee ◽

Sae Byul Lee ◽

Il Yong Chung ◽

Sei Hyun Ahn ◽

...

Keyword(s):

Deep Learning ◽

Missing Data ◽

Patient Reported Outcomes ◽

Machine Learning Algorithms ◽

Depression Screening ◽

Smartphone Applications ◽

Patient Reported ◽

Negative Effect ◽

Using Data ◽

Screening Accuracy

BACKGROUND Smartphone applications have recently been used as a breakthrough technology for monitoring mental health conditions in cancer outpatient settings. However, the use of electronic patient-reported outcomes (ePROs) on mental conditions through smartphone applications raises new concerns, which includes the question of the accuracy of depression screening. Thus, research is essential for improving the depression-screening performance. OBJECTIVE This study aims to (1) test whether deep-learning-based algorithms can overcome the limitations of traditional statistical methods in terms of depression screening accuracy. In addition, the study aims to (2) explore ePRO patterns that adversely affect depression screening accuracy. METHODS As a deep learning-based algorithm, a feedforward neural network algorithm was used. As a traditional statistical method, a random intercept logistic regression was employed. To explore the ePRO patterns that negatively impact model accuracy, mental fluctuations, missing data, and compounding effects between mental fluctuations and missing data were tested. The performances of the algorithms and the effects of the ePRO patterns were measured through the receiver operating characteristic comparison test. RESULTS The results of the study show that the performance of the deep-learning-based models was superior to that of the traditional statistical approach. The study found that mental fluctuations statistically reduced the accuracy of depression-screening models. A weak association between ePRO omissions and screening accuracy was found. Moreover, the compounding effects that had a negative effect on the depression screening accuracy were statistically significant. CONCLUSIONS Although well-trained deep-learning-based models exhibit excellent performance, they still have some limitations. Thus, it is very important to focus on data quality to predict health outcomes when using data that is difficult to quantify, such as mental conditions.

Download Full-text

Deep Learning and Machine Learning Techniques for Analyzing Travelers' Online Reviews

10.4018/978-1-7998-8306-7.ch002 ◽

2022 ◽

pp. 20-39

Author(s):

Elliot Mbunge ◽

Benhildah Muchemwa

Keyword(s):

Machine Learning ◽

Social Media ◽

Deep Learning ◽

Hospitality Industry ◽

Learning Models ◽

Online Data ◽

Social Media Platforms ◽

Using Data ◽

Tourism And Hospitality Industry ◽

Tourism And Hospitality

Social media platforms play a tremendous role in the tourism and hospitality industry. Social media platforms are increasingly becoming a source of information. The complexity and increasing size of tourists' online data make it difficult to extract meaningful insights using traditional models. Therefore, this scoping and comprehensive review aimed to analyze machine learning and deep learning models applied to model tourism data. The study revealed that deep learning and machine learning models are used for forecasting and predicting tourism demand using data from search query data, Google trends, and social media platforms. Also, the study revealed that data-driven models can assist managers and policymakers in mapping and segmenting tourism hotspots and attractions and predicting revenue that is likely to be generated, exploring targeting marketing, segmenting tourists based on their spending patterns, lifestyle, and age group. However, hybrid deep learning models such as inceptionV3, MobilenetsV3, and YOLOv4 are not yet explored in the tourism and hospitality industry.

Download Full-text

Image editing-based data augmentation for illumination-insensitive background subtraction

Journal of Enterprise Information Management ◽

10.1108/jeim-02-2020-0042 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Dimitrios Sakkos ◽

Edmond S. L. Ho ◽

Hubert P. H. Shum ◽

Garry Elvin

Keyword(s):

Deep Learning ◽

Pilot Study ◽

Background Subtraction ◽

Data Augmentation ◽

Point Of View ◽

Content Type ◽

Illumination Changes ◽

Image Appearance ◽

Using Data ◽

Limited Training Samples

PurposeA core challenge in background subtraction (BGS) is handling videos with sudden illumination changes in consecutive frames. In our pilot study published in, Sakkos:SKIMA 2019, we tackle the problem from a data point-of-view using data augmentation. Our method performs data augmentation that not only creates endless data on the fly but also features semantic transformations of illumination which enhance the generalisation of the model.Design/methodology/approachIn our pilot study published in SKIMA 2019, the proposed framework successfully simulates flashes and shadows by applying the Euclidean distance transform over a binary mask generated randomly. In this paper, we further enhance the data augmentation framework by proposing new variations in image appearance both locally and globally.FindingsExperimental results demonstrate the contribution of the synthetics in the ability of the models to perform BGS even when significant illumination changes take place.Originality/valueSuch data augmentation allows us to effectively train an illumination-invariant deep learning model for BGS. We further propose a post-processing method that removes noise from the output binary map of segmentation, resulting in a cleaner, more accurate segmentation map that can generalise to multiple scenes of different conditions. We show that it is possible to train deep learning models even with very limited training samples. The source code of the project is made publicly available at https://github.com/dksakkos/illumination_augmentation

Download Full-text