On Consensus-Optimality Trade-offs in Collaborative Deep Learning

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.573731 ◽

2021 ◽

Vol 4 ◽

Author(s):

Zhanhong Jiang ◽

Aditya Balu ◽

Chinmay Hegde ◽

Soumik Sarkar

Keyword(s):

Deep Learning ◽

Stochastic Gradient Descent ◽

Model Parameters ◽

Data Sets ◽

Full Spectrum ◽

Strongly Convex ◽

Trade Offs ◽

Private Data ◽

Convex Case ◽

Fundamental Tension

In distributed machine learning, where agents collaboratively learn from diverse private data sets, there is a fundamental tension between consensus and optimality. In this paper, we build on recent algorithmic progresses in distributed deep learning to explore various consensus-optimality trade-offs over a fixed communication topology. First, we propose the incremental consensus-based distributed stochastic gradient descent (i-CDSGD) algorithm, which involves multiple consensus steps (where each agent communicates information with its neighbors) within each SGD iteration. Second, we propose the generalized consensus-based distributed SGD (g-CDSGD) algorithm that enables us to navigate the full spectrum from complete consensus (all agents agree) to complete disagreement (each agent converges to individual model parameters). We analytically establish convergence of the proposed algorithms for strongly convex and nonconvex objective functions; we also analyze the momentum variants of the algorithms for the strongly convex case. We support our algorithms via numerical experiments, and demonstrate significant improvements over existing methods for collaborative deep learning.

Download Full-text

Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation (Preprint)

10.2196/preprints.14064 ◽

2019 ◽

Author(s):

Sven Festag ◽

Cord Spreckelsen

Keyword(s):

Deep Learning ◽

Health Information ◽

Clinical Data ◽

Real World ◽

Privacy Preserving ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Learning Approaches ◽

Protected Health Information ◽

Private Data

BACKGROUND Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection.

Download Full-text

Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation

JMIR Formative Research ◽

10.2196/14064 ◽

2020 ◽

Vol 4 (5) ◽

pp. e14064 ◽

Cited By ~ 2

Author(s):

Sven Festag ◽

Cord Spreckelsen

Keyword(s):

Deep Learning ◽

Health Information ◽

Clinical Data ◽

Real World ◽

Privacy Preserving ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Learning Approaches ◽

Protected Health Information ◽

Private Data

Background Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. Objective In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. Methods The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. Results These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. Conclusions Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection.

Download Full-text

3D 3-C full-wavefield elastic inversion for estimating anisotropic parameters: A feasibility study with synthetic data

Geophysics ◽

10.1190/1.3204766 ◽

2009 ◽

Vol 74 (6) ◽

pp. WCC159-WCC175 ◽

Cited By ~ 22

Author(s):

Hui Chang ◽

George McMechan

Keyword(s):

Synthetic Data ◽

Transversely Isotropic ◽

Model Parameters ◽

Data Sets ◽

Multiple Sources ◽

Trade Offs ◽

Transversely Isotropic Media ◽

Free Data ◽

Isotropic Media ◽

Anisotropy Parameters

Traveltime-based inversions cannot solve for all of the anisotropy parameters for orthorhombic media. Vertical velocities cannot be recovered simultaneously with the dimensionless anisotropy parameters. Also, the density cannot be solved because it does not affect the normal moveout of P and S reflections. These limitations can be overcome using full-wavefield inversion for anisotropy parameters for orthorhombic media and for transversely isotropic media with vertical and horizontal symmetry axes. Tsvankin’s parameters and the orientation of the local (anisotropic) coordinates are inverted from three-component, wide-azimuth data sets containing P reflected and PS converted waves. The inversions are performed in two steps. The first step uses only reflections from the top of an anisotropic layer, whichdoes not constrain the trade-offs between the vertical velocities, the anisotropies, and density, as shown by parameter correlation analysis. The results from the first step are refined by using them as the starting model for the second step, which fits reflections from the top and bottom of the layer. The properties of the target layer influence the amplitudes of top and bottom reflections as well as the traveltime of the bottom reflections; when all these data are used, the inversion is highly overdetermined and all model parameters are estimated accurately. When Gaussian noise is added, the inversion results are very similar to those for the noise-free data because only the coherent signal is fitted. The residual at convergence for the noisy data corresponds to the noise level. Concurrent inversion of data from multiple sources increases the azimuthal illumination of a target.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

Share Market Data Prediction Strategies using Deep Learning Algorithm

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191209093139 ◽

2019 ◽

Vol 13 ◽

Author(s):

A John. ◽

D. Praveen Dominic ◽

M. Adimoolam ◽

N. M. Balamurugan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Stock Market ◽

Predictive Analytics ◽

Learning Algorithm ◽

Market Price ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Mining Machine ◽

Gradient Descent Algorithm

Background:: Predictive analytics has a multiplicity of statistical schemes from predictive modelling, data mining, machine learning. It scrutinizes present and chronological data to make predictions about expectations or if not unexplained measures. Most predictive models are used for business analytics to overcome loses and profit gaining. Predictive analytics is used to exploit the pattern in old and historical data. Objective: People used to follow some strategies for predicting stock value to invest in the more profit-gaining stocks and those strategies to search the stock market prices which are incorporated in some intelligent methods and tools. Such strategies will increase the investor’s profits and also minimize their risks. So prediction plays a vital role in stock market gaining and is also a very intricate and challenging process. Method: The proposed optimized strategies are the Deep Neural Network with Stochastic Gradient for stock prediction. The Neural Network is trained using Back-propagation neural networks algorithm and stochastic gradient descent algorithm as optimal strategies. Results: The experiment is conducted for stock market price prediction using python language with the visual package. In this experiment RELIANCE.NS, TATAMOTORS.NS, and TATAGLOBAL.NS dataset are taken as input dataset and it is downloaded from National Stock Exchange site. The artificial neural network component including Deep Learning model is most effective for more than 100,000 data points to train this model. This proposed model is developed on daily prices of stock market price to understand how to build model with better performance than existing national exchange method.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Theory and Applications of the Unit Gamma/Gompertz Distribution

Mathematics ◽

10.3390/math9161850 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1850

Author(s):

Rashad A. R. Bantan ◽

Farrukh Jamal ◽

Christophe Chesneau ◽

Mohammed Elgarhy

Keyword(s):

Stochastic Ordering ◽

Real Data ◽

Rate Function ◽

The Other ◽

Likelihood Method ◽

Model Parameters ◽

Data Sets ◽

Gompertz Distribution ◽

Probability And Statistics ◽

Analytical Behavior

Unit distributions are commonly used in probability and statistics to describe useful quantities with values between 0 and 1, such as proportions, probabilities, and percentages. Some unit distributions are defined in a natural analytical manner, and the others are derived through the transformation of an existing distribution defined in a greater domain. In this article, we introduce the unit gamma/Gompertz distribution, founded on the inverse-exponential scheme and the gamma/Gompertz distribution. The gamma/Gompertz distribution is known to be a very flexible three-parameter lifetime distribution, and we aim to transpose this flexibility to the unit interval. First, we check this aspect with the analytical behavior of the primary functions. It is shown that the probability density function can be increasing, decreasing, “increasing-decreasing” and “decreasing-increasing”, with pliant asymmetric properties. On the other hand, the hazard rate function has monotonically increasing, decreasing, or constant shapes. We complete the theoretical part with some propositions on stochastic ordering, moments, quantiles, and the reliability coefficient. Practically, to estimate the model parameters from unit data, the maximum likelihood method is used. We present some simulation results to evaluate this method. Two applications using real data sets, one on trade shares and the other on flood levels, demonstrate the importance of the new model when compared to other unit models.

Download Full-text

Fundamental resource trade-offs for encoded distributed optimization

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa026 ◽

2020 ◽

Author(s):

A Salman Avestimehr ◽

Seyed Mohammadreza Mousavi Kalan ◽

Mahdi Soltanolkotabi

Keyword(s):

Computational Time ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Computational Framework ◽

Data Set ◽

Trade Offs ◽

Major Bottleneck ◽

Computing Environments ◽

Analyze Data

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.

Download Full-text

Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts?

Algorithms ◽

10.3390/a14070212 ◽

2021 ◽

Vol 14 (7) ◽

pp. 212

Author(s):

Youssef Skandarani ◽

Pierre-Marc Jodoin ◽

Alain Lalande

Keyword(s):

Deep Learning ◽

Cardiac Mri ◽

Expert Knowledge ◽

Medical Image Analysis ◽

Ground Truth ◽

Cine Mri ◽

Data Sets ◽

Mri Segmentation ◽

Segmentation Evaluation ◽

Ground Truth Data

Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert ground truth for cardiac cine–MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. The results reveal that generalization performances of a segmentation neural network trained on non-expert ground truth data is, to all practical purposes, as good as that trained on expert ground truth data, particularly when the non-expert receives a decent level of training, highlighting an opportunity for the efficient and cost-effective creation of annotations for cardiac data sets.

Download Full-text

Bayesian Inference of Species Trees using Diffusion Models

Systematic Biology ◽

10.1093/sysbio/syaa051 ◽

2020 ◽

Vol 70 (1) ◽

pp. 145-161 ◽

Cited By ~ 1

Author(s):

Marnus Stoltz ◽

Boris Baeumer ◽

Remco Bouckaert ◽

Colin Fox ◽

Gordon Hiscott ◽

...

Keyword(s):

Bayesian Inference ◽

Numerical Algorithms ◽

Diffusion Models ◽

Model Parameters ◽

Data Sets ◽

Species Trees ◽

Computationally Efficient ◽

Data Set ◽

Snp Data ◽

Binary Markers

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]

Download Full-text