logarithmic loss
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 15)

H-INDEX

6
(FIVE YEARS 2)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bianka Kovács ◽  
Gergely Palla

AbstractSeveral observations indicate the existence of a latent hyperbolic space behind real networks that makes their structure very intuitive in the sense that the probability for a connection is decreasing with the hyperbolic distance between the nodes. A remarkable network model generating random graphs along this line is the popularity-similarity optimisation (PSO) model, offering a scale-free degree distribution, high clustering and the small-world property at the same time. These results provide a strong motivation for the development of hyperbolic embedding algorithms, that tackle the problem of finding the optimal hyperbolic coordinates of the nodes based on the network structure. A very promising recent approach for hyperbolic embedding is provided by the noncentered minimum curvilinear embedding (ncMCE) method, belonging to the family of coalescent embedding algorithms. This approach offers a high-quality embedding at a low running time. In the present work we propose a further optimisation of the angular coordinates in this framework that seems to reduce the logarithmic loss and increase the greedy routing score of the embedding compared to the original version, thereby adding an extra improvement to the quality of the inferred hyperbolic coordinates.


2021 ◽  
Vol 8 (3) ◽  
pp. 201525
Author(s):  
Geoffrey R. Hosack ◽  
Adrien Ickowicz ◽  
Keith R. Hayes

The relative risk of disease transmission caused by the potential release of transgenic vectors, such as through sterile insect technique or gene drive systems, is assessed with comparison with wild-type vectors. The probabilistic risk framework is demonstrated with an assessment of the relative risk of lymphatic filariasis, malaria and o'nyong'nyong arbovirus transmission by mosquito vectors to human hosts given a released transgenic strain of Anopheles coluzzii carrying a dominant sterile male gene construct. Harm is quantified by a logarithmic loss function that depends on the causal risk ratio, which is a quotient of basic reproduction numbers derived from mathematical models of disease transmission. The basic reproduction numbers are predicted to depend on the number of generations in an insectary colony and the number of backcrosses between the transgenic and wild-type lineages. Analogous causal risk ratios for short-term exposure to a single cohort release are also derived. These causal risk ratios were parametrized by probabilistic elicitations, and updated with experimental data for adult vector mortality. For the wild-type, high numbers of insectary generations were predicted to reduce the number of infectious human cases compared with uncolonized wild-type. Transgenic strains were predicted to produce fewer infectious cases compared with the uncolonized wild-type.


Author(s):  
Chaitanya Kaul ◽  
Nick Pears ◽  
Hang Dai ◽  
Roderick Murray-Smith ◽  
Suresh Manandhar
Keyword(s):  

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Yuhan Su ◽  
Hongxin Xiang ◽  
Haotian Xie ◽  
Yong Yu ◽  
Shiyan Dong ◽  
...  

The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 F -measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.


2020 ◽  
Vol 66 (7) ◽  
pp. 4183-4202 ◽  
Author(s):  
Yigit Ugur ◽  
Inaki Estella Aguerri ◽  
Abdellatif Zaidi
Keyword(s):  

2020 ◽  
Author(s):  
Hang Qiu ◽  
Lin Luo ◽  
Ziqi Su ◽  
Li Zhou ◽  
Liya Wang ◽  
...  

Abstract Background: Accumulating evidence has linked environmental exposure, such as ambient air pollution and meteorological factors, to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017.Methods: Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to build the predictive models with a unique feature set. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity, precision, and F1 score were used to evaluate the predictive performances of the six models.Results: The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%), precision (0.695), and F1 score (0.725). Feature importance identification indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively.Conclusion: This study suggests that ensemble learning models, especially the LightGBM model, can be used to effectively predict the peak events of CVDs admissions, and therefore could be a very useful decision-making tool for medical resource management.


2020 ◽  
Vol 34 (04) ◽  
pp. 5511-5518
Author(s):  
Ashkan Rezaei ◽  
Rizal Fathony ◽  
Omid Memarrast ◽  
Brian Ziebart

Developing classification methods with high accuracy that also avoid unfair treatment of different groups has become increasingly important for data-driven decision making in social applications. Many existing methods enforce fairness constraints on a selected classifier (e.g., logistic regression) by directly forming constrained optimizations. We instead re-derive a new classifier from the first principles of distributional robustness that incorporates fairness criteria into a worst-case logarithmic loss minimization. This construction takes the form of a minimax game and produces a parametric exponential family conditional distribution that resembles truncated logistic regression. We present the theoretical benefits of our approach in terms of its convexity and asymptotic convergence. We then demonstrate the practical advantages of our approach on three benchmark fairness datasets.


2020 ◽  
Author(s):  
Hang Qiu ◽  
Lin Luo ◽  
Ziqi Su ◽  
Li Zhou ◽  
Liya Wang ◽  
...  

Abstract Background: Accumulating evidence has linked environmental exposures, such as ambient air pollution and meteorological factors to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017. Methods: Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to build the predictive models with a unique feature set. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity, precision, and F1 score were used to evaluate the predictive performances between the six models. Results: The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%), precision (0.695), and F1 score (0.725). Feature importance identification indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively. Conclusion: This study suggests that ensemble learning models, especially the LightGBM model, can be used to effectively predict the peak events of CVDs admissions, and therefore could be a very useful decision making tool for medical resource management.


Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 151 ◽  
Author(s):  
Abdellatif Zaidi ◽  
Iñaki Estella-Aguerri ◽  
Shlomo Shamai (Shitz)

This tutorial paper focuses on the variants of the bottleneck problem taking an information theoretic perspective and discusses practical methods to solve it, as well as its connection to coding and learning aspects. The intimate connections of this setting to remote source-coding under logarithmic loss distortion measure, information combining, common reconstruction, the Wyner–Ahlswede–Korner problem, the efficiency of investment information, as well as, generalization, variational inference, representation learning, autoencoders, and others are highlighted. We discuss its extension to the distributed information bottleneck problem with emphasis on the Gaussian model and highlight the basic connections to the uplink Cloud Radio Access Networks (CRAN) with oblivious processing. For this model, the optimal trade-offs between relevance (i.e., information) and complexity (i.e., rates) in the discrete and vector Gaussian frameworks is determined. In the concluding outlook, some interesting problems are mentioned such as the characterization of the optimal inputs (“features”) distributions under power limitations maximizing the “relevance” for the Gaussian information bottleneck, under “complexity” constraints.


Sign in / Sign up

Export Citation Format

Share Document