Select to better learn: Fast and accurate deep learning using data selection from nonlinear manifolds

10.36227/techrxiv.12084027 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Weijia Wang ◽

Nazanin Rahnavard ◽

...

Keyword(s):

Deep Learning ◽

Small Subset ◽

Original Dataset ◽

Wide Range ◽

Spectral Components ◽

Column Subset Selection ◽

Important Open Problem ◽

Data Points ◽

Using Data ◽

Nonlinear Manifolds

Finding a small subset of data whose linear combination spans other data points, also called column subset selection problem (CSSP), is an important open problem in computer science with many applications in computer vision and deep learning. There are some studies that solve CSSP in a polynomial time complexity w.r.t. the size of the original dataset. A simple and efficient selection algorithm with a linear complexity order, referred to as spectrum pursuit (SP), is proposed that pursuits spectral components of the dataset using available sample points. The proposed non-greedy algorithm aims to iteratively find K data samples whose span is close to that of the first K spectral components of entire data. SP has no parameter to be fine tuned and this desirable property makes it problem-independent. The simplicity of SP enables us to extend the underlying linear model to more complex models such as nonlinear manifolds and graph-based models. The nonlinear extension of SP is introduced as kernel-SP (KSP). The superiority of the proposed algorithms is demonstrated in a wide range of applications.

Download Full-text

Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00784 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Weijia Wang ◽

Nazanin Rahnavard ◽

...

Keyword(s):

Deep Learning ◽

Data Selection ◽

Using Data ◽

Nonlinear Manifolds

Download Full-text

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

10.36227/techrxiv.13253945.v1 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Siavash Khodadadeh

Keyword(s):

Deep Learning ◽

Upper Bound ◽

Linear Complexity ◽

Subset Selection ◽

Selection Problem ◽

Theoretical Methods ◽

Novel Technique ◽

Original Dataset ◽

Column Subset Selection ◽

Minimum Number

We propose a novel technique for finding representatives from a large, unsupervised dataset. The approach is based on the concept of self-rank, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. Our proposed algorithm enjoys linear complexity w.r.t. the size of original dataset and simultaneously it provides an adaptive upper bound for approximation ratio. These favorable characteristics result in filling a historical gap between practical and theoretical methods in finding representatives.<br>

Download Full-text

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

10.36227/techrxiv.13253945 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Siavash Khodadadeh

Keyword(s):

Deep Learning ◽

Upper Bound ◽

Linear Complexity ◽

Subset Selection ◽

Selection Problem ◽

Theoretical Methods ◽

Novel Technique ◽

Original Dataset ◽

Column Subset Selection ◽

Minimum Number

We propose a novel technique for finding representatives from a large, unsupervised dataset. The approach is based on the concept of self-rank, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. Our proposed algorithm enjoys linear complexity w.r.t. the size of original dataset and simultaneously it provides an adaptive upper bound for approximation ratio. These favorable characteristics result in filling a historical gap between practical and theoretical methods in finding representatives.<br>

Download Full-text

Utilizing Twitter Data Analysis and Deep Learning to Identify Drug Use (Preprint)

10.2196/preprints.14681 ◽

2019 ◽

Author(s):

Joseph Tassone ◽

Peizhi Yan ◽

Mackenzie Simpson ◽

Chetan Mendhe ◽

Vijay Mago ◽

...

Keyword(s):

Social Media ◽

Logistic Regression ◽

Deep Learning ◽

Decision Tree ◽

Semantic Meaning ◽

Predictive Capability ◽

Logistic Regression Models ◽

Twitter Data ◽

Data Points ◽

Positive Classification

BACKGROUND The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. OBJECTIVE Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined. METHODS Twitter social media tweets and attribute data were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset 3,696,150 rows. The predictive classification power of multiple methods was compared including regression, decision trees, and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets. RESULTS The logistic regression and decision tree models utilized 12,142 data points for training and 1041 data points for testing. The results calculated from the logistic regression models respectively displayed an accuracy of 54.56% and 57.44%, and an AUC of 0.58. While an improvement, the decision tree concluded with an accuracy of 63.40% and an AUC of 0.68. All these values implied a low predictive capability with little to no discrimination. Conversely, the CNN-based classifiers presented a heavy improvement, between the two models tested. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as “smoke”, “cocaine”, and “marijuana” triggering a drug-positive classification. CONCLUSIONS Predictive analysis without a CNN is limited and possibly fruitless. Attribute-based models presented little predictive capability and were not suitable for analyzing this type of data. The semantic meaning of the tweets needed to be utilized, giving the CNN-based classifier an advantage over other solutions. Additionally, commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased scores, improving the predictive capability. CLINICALTRIAL None

Download Full-text

A Generalization Performance Study Using Deep Learning Networks in Embedded Systems

Sensors ◽

10.3390/s21041031 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1031

Author(s):

Joseba Gorospe ◽

Rubén Mulero ◽

Olatz Arbelaitz ◽

Javier Muguerza ◽

Miguel Ángel Antón

Keyword(s):

Deep Learning ◽

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Learning Networks ◽

Performance Study ◽

Learning Techniques ◽

Wide Range ◽

Learning Architectures

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Download Full-text

Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0015 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Malte Seemann ◽

Lennart Bargsten ◽

Alexander Schlaefer

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Deep Learning ◽

Medical Imaging ◽

Computed Tomography Angiography ◽

Data Augmentation ◽

Domain Adaptation ◽

Synthetic Image ◽

Wide Range ◽

The Impact

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.

Download Full-text

Uncertainty-Aware Deep Learning-Based Cardiac Arrhythmias Classification Model of Electrocardiogram Signals

Computers ◽

10.3390/computers10060082 ◽

2021 ◽

Vol 10 (6) ◽

pp. 82

Author(s):

Ahmad O. Aseeri

Keyword(s):

Deep Learning ◽

Cardiac Arrhythmias ◽

Large Scale ◽

Clinical Decision Making ◽

Probabilistic Approach ◽

Classification Model ◽

Gating Mechanism ◽

Uncertainty Estimates ◽

Wide Range

Deep Learning-based methods have emerged to be one of the most effective and practical solutions in a wide range of medical problems, including the diagnosis of cardiac arrhythmias. A critical step to a precocious diagnosis in many heart dysfunctions diseases starts with the accurate detection and classification of cardiac arrhythmias, which can be achieved via electrocardiograms (ECGs). Motivated by the desire to enhance conventional clinical methods in diagnosing cardiac arrhythmias, we introduce an uncertainty-aware deep learning-based predictive model design for accurate large-scale classification of cardiac arrhythmias successfully trained and evaluated using three benchmark medical datasets. In addition, considering that the quantification of uncertainty estimates is vital for clinical decision-making, our method incorporates a probabilistic approach to capture the model’s uncertainty using a Bayesian-based approximation method without introducing additional parameters or significant changes to the network’s architecture. Although many arrhythmias classification solutions with various ECG feature engineering techniques have been reported in the literature, the introduced AI-based probabilistic-enabled method in this paper outperforms the results of existing methods in outstanding multiclass classification results that manifest F1 scores of 98.62% and 96.73% with (MIT-BIH) dataset of 20 annotations, and 99.23% and 96.94% with (INCART) dataset of eight annotations, and 97.25% and 96.73% with (BIDMC) dataset of six annotations, for the deep ensemble and probabilistic mode, respectively. We demonstrate our method’s high-performing and statistical reliability results in numerical experiments on the language modeling using the gating mechanism of Recurrent Neural Networks.

Download Full-text

Risk factors for severity on admission and the disease progression during hospitalisation in a large cohort of patients with COVID-19 in Japan

BMJ Open ◽

10.1136/bmjopen-2020-047007 ◽

2021 ◽

Vol 11 (6) ◽

pp. e047007

Author(s):

Mari Terada ◽

Hiroshi Ohtsu ◽

Sho Saito ◽

Kayoko Hayakawa ◽

Shinya Tsuzuki ◽

...

Keyword(s):

Risk Factors ◽

Mechanical Ventilation ◽

Chronic Respiratory Disease ◽

Major Effect ◽

Secondary Outcome ◽

Healthcare Facilities ◽

Wide Range ◽

Registration Number ◽

Using Data ◽

Obesity Hypertension

ObjectivesTo investigate the risk factors contributing to severity on admission. Additionally, risk factors of worst severity and fatality were studied. Moreover, factors were compared based on three points: early severity, worst severity and fatality.DesignAn observational cohort study using data entered in a Japan nationwide COVID-19 inpatient registry, COVIREGI-JP.SettingAs of 28 September 2020, 10480 cases from 802 facilities have been registered. Participating facilities cover a wide range of hospitals where patients with COVID-19 are admitted in Japan.ParticipantsParticipants who had a positive test result on any applicable SARS-CoV-2 diagnostic tests were admitted to participating healthcare facilities. A total of 3829 cases were identified from 16 January to 31 May 2020, of which 3376 cases were included in this study.Primary and secondary outcome measuresPrimary outcome was severe or nonsevere on admission, determined by the requirement of mechanical ventilation or oxygen therapy, SpO2 or respiratory rate. Secondary outcome was the worst severity during hospitalisation, judged by the requirement of oxygen and/orinvasive mechanical ventilation/extracorporeal membrane oxygenation.ResultsRisk factors for severity on admission were older age, men, cardiovascular disease, chronic respiratory disease, diabetes, obesity and hypertension. Cerebrovascular disease, liver disease, renal disease or dialysis, solid tumour and hyperlipidaemia did not influence severity on admission; however, it influenced worst severity. Fatality rates for obesity, hypertension and hyperlipidaemia were relatively lower.ConclusionsThis study segregated the comorbidities influencing severity and death. It is possible that risk factors for severity on admission, worst severity and fatality are not consistent and may be propelled by different factors. Specifically, while hypertension, hyperlipidaemia and obesity had major effect on worst severity, their impact was mild on fatality in the Japanese population. Some studies contradict our results; therefore, detailed analyses, considering in-hospital treatments, are needed for validation.Trial registration numberUMIN000039873. https://upload.umin.ac.jp/cgi-open-bin/ctr_e/ctr_view.cgi?recptno=R000045453

Download Full-text

Modelling with star-shaped distributions

Dependence Modeling ◽

10.1515/demo-2020-0003 ◽

2020 ◽

Vol 8 (1) ◽

pp. 45-69

Author(s):

Eckhard Liebscher ◽

Wolf-Dieter Richter

Keyword(s):

Probabilistic Models ◽

Arbitrary Dimension ◽

Probability Density Functions ◽

Density Functions ◽

Wide Range ◽

Gaussian Density ◽

Spherical Distributions ◽

Data Points ◽

Non Gaussian ◽

General Method

AbstractWe prove and describe in great detail a general method for constructing a wide range of multivariate probability density functions. We introduce probabilistic models for a large variety of clouds of multivariate data points. In the present paper, the focus is on star-shaped distributions of an arbitrary dimension, where in case of spherical distributions dependence is modeled by a non-Gaussian density generating function.

Download Full-text