scholarly journals Probabilistic Predictions with Federated Learning

Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 41
Author(s):  
Adam Thor Thorgeirsson ◽  
Frank Gauterin

Probabilistic predictions with machine learning are important in many applications. These are commonly done with Bayesian learning algorithms. However, Bayesian learning methods are computationally expensive in comparison with non-Bayesian methods. Furthermore, the data used to train these algorithms are often distributed over a large group of end devices. Federated learning can be applied in this setting in a communication-efficient and privacy-preserving manner but does not include predictive uncertainty. To represent predictive uncertainty in federated learning, our suggestion is to introduce uncertainty in the aggregation step of the algorithm by treating the set of local weights as a posterior distribution for the weights of the global model. We compare our approach to state-of-the-art Bayesian and non-Bayesian probabilistic learning algorithms. By applying proper scoring rules to evaluate the predictive distributions, we show that our approach can achieve similar performance as the benchmark would achieve in a non-distributed setting.

2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nalindren Naicker ◽  
Timothy Adeliyi ◽  
Jeanette Wing

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.


Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 98 ◽  
Author(s):  
Tariq Ahmad ◽  
Allan Ramsay ◽  
Hanady Ahmed

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhixun Zhao ◽  
Xiaocai Zhang ◽  
Fang Chen ◽  
Liang Fang ◽  
Jinyan Li

Abstract Background DNA N4-methylcytosine (4mC) is a critical epigenetic modification and has various roles in the restriction-modification system. Due to the high cost of experimental laboratory detection, computational methods using sequence characteristics and machine learning algorithms have been explored to identify 4mC sites from DNA sequences. However, state-of-the-art methods have limited performance because of the lack of effective sequence features and the ad hoc choice of learning algorithms to cope with this problem. This paper is aimed to propose new sequence feature space and a machine learning algorithm with feature selection scheme to address the problem. Results The feature importance score distributions in datasets of six species are firstly reported and analyzed. Then the impact of the feature selection on model performance is evaluated by independent testing on benchmark datasets, where ACC and MCC measurements on the performance after feature selection increase by 2.3% to 9.7% and 0.05 to 0.19, respectively. The proposed method is compared with three state-of-the-art predictors using independent test and 10-fold cross-validations, and our method outperforms in all datasets, especially improving the ACC by 3.02% to 7.89% and MCC by 0.06 to 0.15 in the independent test. Two detailed case studies by the proposed method have confirmed the excellent overall performance and correctly identified 24 of 26 4mC sites from the C.elegans gene, and 126 out of 137 4mC sites from the D.melanogaster gene. Conclusions The results show that the proposed feature space and learning algorithm with feature selection can improve the performance of DNA 4mC prediction on the benchmark datasets. The two case studies prove the effectiveness of our method in practical situations.


2019 ◽  
Author(s):  
Shufen Pan ◽  
Naiqing Pan ◽  
Hanqin Tian ◽  
Pierre Friedlingstein ◽  
Stephen Sitch ◽  
...  

Abstract. Evapotranspiration (ET) is a critical component in global water cycle and links terrestrial water, carbon and energy cycles. Accurate estimate of terrestrial ET is important for hydrological, meteorological, and agricultural research and applications, such as quantifying surface energy and water budgets, weather forecasting, and scheduling of irrigation. However, direct measurement of global terrestrial ET is not feasible. Here, we first gave a retrospective introduction to the basic theory and recent developments of state-of-the-art approaches for estimating global terrestrial ET, including remote sensing-based physical models, machine learning algorithms and land surface models (LSMs). Then, we utilized six remote sensing-based models (including four physical models and two machine learning algorithms) and fourteen LSMs to analyze the spatial and temporal variations in global terrestrial ET. The results showed that the mean annual global terrestrial ET ranged from 50.7 × 103 km3 yr−1(454 mm yr−1)to 75.7 × 103 km3 yr−1 (6977 mm yr−1), with the average being 65.5 × 103 km3 yr−1 (588 mm yr−1), during 1982–2011. LSMs had significant uncertainty in the ET magnitude in tropical regions especially the Amazon Basin, while remote sensing-based ET products showed larger inter-model range in arid and semi-arid regions than LSMs. LSMs and remote sensing-based physical models presented much larger inter-annual variability (IAV) of ET than machine learning algorithms in southwestern U.S. and the Southern Hemisphere, particularly in Australia. LSMs suggested stronger control of precipitation on ET IAV than remote sensing-based models. The ensemble remote sensing-based physical models and machine-learning algorithm suggested significant increasing trends in global terrestrial ET at the rate of 0.62 mm yr−2 (p  0.05), even though most of the individual LSMs reproduced the increasing trend. Moreover, all models suggested a positive effect of vegetation greening on ET intensification. Spatially, all methods showed that ET significantly increased in western and southern Africa, western India and northeastern Australia, but decreased severely in southwestern U.S., southern South America and Mongolia. Discrepancies in ET trend mainly appeared in tropical regions like the Amazon Basin. The ensemble means of the three ET categories showed generally good consistency, however, considerable uncertainties still exist in both the temporal and spatial variations in global ET estimates. The uncertainties were induced by multiple factors, including parameterization of land processes, meteorological forcing, lack of in situ measurements, remote sensing acquisition and scaling effects. Improvements in the representation of water stress and canopy dynamics are essentially needed to reduce uncertainty in LSM-simulated ET. Utilization of latest satellite sensors and deep learning methods, theoretical advancements in nonequilibrium thermodynamics, and application of integrated methods that fuse different ET estimates or relevant key biophysical variables will improve the accuracy of remote sensing-based models.


Author(s):  
Ben Bright Benuwa ◽  
Yong Zhao Zhan ◽  
Benjamin Ghansah ◽  
Dickson Keddy Wornyo ◽  
Frank Banaseka Kataka

The rapid increase of information and accessibility in recent years has activated a paradigm shift in algorithm design for artificial intelligence. Recently, deep learning (a surrogate of Machine Learning) have won several contests in pattern recognition and machine learning. This review comprehensively summarises relevant studies, much of it from prior state-of-the-art techniques. This paper also discusses the motivations and principles regarding learning algorithms for deep architectures.


2021 ◽  
pp. 1-15
Author(s):  
Mohammed Ayub ◽  
El-Sayed M. El-Alfy

Web technology has become an indispensable part in human’s life for almost all activities. On the other hand, the trend of cyberattacks is on the rise in today’s modern Web-driven world. Therefore, effective countermeasures for the analysis and detection of malicious websites is crucial to combat the rising threats to the cyber world security. In this paper, we systematically reviewed the state-of-the-art techniques and identified a total of about 230 features of malicious websites, which are classified as internal and external features. Moreover, we developed a toolkit for the analysis and modeling of malicious websites. The toolkit has implemented several types of feature extraction methods and machine learning algorithms, which can be used to analyze and compare different approaches to detect malicious URLs. Moreover, the toolkit incorporates several other options such as feature selection and imbalanced learning with flexibility to be extended to include more functionality and generalization capabilities. Moreover, some use cases are demonstrated for different datasets.


Author(s):  
AURI MARCELO RIZZO VINCENZI ◽  
ELISA YUMI NAKAGAWA ◽  
JOSÉ CARLOS MALDONADO ◽  
MÁRCIO EDUARDO DELAMARO ◽  
ROSELI APARECIDA FRANCELIN ROMERO

Mutation testing (Mutation Analysis), although powerful in revealing faults, is considered a computationally expensive criterion, due to the high number of mutants created and the effort to determine the equivalent mutants. Using mutation-based alternative testing criteria it is possible to reduce the number of mutants but it is still necessary to determine the equivalent ones. In this paper the Bayesian Learning(one of the Artificial Intelligence techniques used in machine learning) is investigated to define the Bayesian Learning-Based Equivalent Detection Technique (BaLBEDeT), which provides guidelines to help the tester to analyze the live mutants in order to determine the equivalent ones.


Sign in / Sign up

Export Citation Format

Share Document