scholarly journals Measuring directed triadic closure with closure coefficients

2020 ◽  
Vol 8 (4) ◽  
pp. 551-573 ◽  
Author(s):  
Hao Yin ◽  
Austin R. Benson ◽  
Johan Ugander

AbstractRecent work studying triadic closure in undirected graphs has drawn attention to the distinction between measures that focus on the “center” node of a wedge (i.e., length-2 path) versus measures that focus on the “initiator,” a distinction with considerable consequences. Existing measures in directed graphs, meanwhile, have all been center-focused. In this work, we propose a family of eight directed closure coefficients that measure the frequency of triadic closure in directed graphs from the perspective of the node initiating closure. The eight coefficients correspond to different labeled wedges, where the initiator and center nodes are labeled, and we observe dramatic empirical variation in these coefficients on real-world networks, even in cases when the induced directed triangles are isomorphic. To understand this phenomenon, we examine the theoretical behavior of our closure coefficients under a directed configuration model. Our analysis illustrates an underlying connection between the closure coefficients and moments of the joint in- and out-degree distributions of the network, offering an explanation of the observed asymmetries. We also use our directed closure coefficients as predictors in two machine learning tasks. We find interpretable models with AUC scores above 0.92 in class-balanced binary prediction, substantially outperforming models that use traditional center-focused measures.

Author(s):  
Gaël Aglin ◽  
Siegfried Nijssen ◽  
Pierre Schaus

Decision Trees (DTs) are widely used Machine Learning (ML) models with a broad range of applications. The interest in these models has increased even further in the context of Explainable AI (XAI), as decision trees of limited depth are very interpretable models. However, traditional algorithms for learning DTs are heuristic in nature; they may produce trees that are of suboptimal quality under depth constraints. We introduce PyDL8.5, a Python library to infer depth-constrained Optimal Decision Trees (ODTs). PyDL8.5 provides an interface for DL8.5, an efficient algorithm for inferring depth-constrained ODTs. The library provides an easy-to-use scikit-learn compatible interface. It cannot only be used for classification tasks, but also for regression, clustering, and other tasks. We introduce an interface that allows users to easily implement these other learning tasks. We provide a number of examples of how to use this library.


Author(s):  
Angela Fan ◽  
Claire Gardent ◽  
Chloé Braud ◽  
Antoine Bordes

Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting generative Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialog modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images, and human-written dialog utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.


2021 ◽  
Vol 52 (1) ◽  
pp. 9-11
Author(s):  
Sarvagya Upadhyay

Over the past two decades, machine learning has seen tremendous development in practice. Technological advancement and increased computational resources have enabled several learning algorithms to become quite useful in practice. Although many families of learning algorithms are heuristic in nature, their usefulness cannot be understated. Empirical observations coupled with abundance of new datasets have led to development of novel algorithmic techniques that aim to accomplish a variety of learning tasks efficiently on real-world problems. But what makes these algorithms work on such real-world problems? Clearly, producing correct solutions is one aspect of it. The other aspect is efficiency. While many of these algorithms solve hard problems and cannot be theoretically efficient (under plausible complexity-theoretic assumptions), they seemingly do work on real-world problems. It begets the question: are there conditions under which these algorithms become tractable? Having an answer to this fundamental question sheds light on the power and limitations of these algorithmic techniques. This book focuses on different learning models and problems, and sets out to capture the assumptions that make certain algorithms tractable. The emphasis is on models and algorithmic techniques that make learning an efficient endeavor.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


2021 ◽  
Vol 51 (3) ◽  
pp. 9-16
Author(s):  
José Suárez-Varela ◽  
Miquel Ferriol-Galmés ◽  
Albert López ◽  
Paul Almasan ◽  
Guillermo Bernárdez ◽  
...  

During the last decade, Machine Learning (ML) has increasingly become a hot topic in the field of Computer Networks and is expected to be gradually adopted for a plethora of control, monitoring and management tasks in real-world deployments. This poses the need to count on new generations of students, researchers and practitioners with a solid background in ML applied to networks. During 2020, the International Telecommunication Union (ITU) has organized the "ITU AI/ML in 5G challenge", an open global competition that has introduced to a broad audience some of the current main challenges in ML for networks. This large-scale initiative has gathered 23 different challenges proposed by network operators, equipment manufacturers and academia, and has attracted a total of 1300+ participants from 60+ countries. This paper narrates our experience organizing one of the proposed challenges: the "Graph Neural Networking Challenge 2020". We describe the problem presented to participants, the tools and resources provided, some organization aspects and participation statistics, an outline of the top-3 awarded solutions, and a summary with some lessons learned during all this journey. As a result, this challenge leaves a curated set of educational resources openly available to anyone interested in the topic.


Animals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1549
Author(s):  
Robert D. Chambers ◽  
Nathanael C. Yoder ◽  
Aletha B. Carson ◽  
Christian Junge ◽  
David E. Allen ◽  
...  

Collar-mounted canine activity monitors can use accelerometer data to estimate dog activity levels, step counts, and distance traveled. With recent advances in machine learning and embedded computing, much more nuanced and accurate behavior classification has become possible, giving these affordable consumer devices the potential to improve the efficiency and effectiveness of pet healthcare. Here, we describe a novel deep learning algorithm that classifies dog behavior at sub-second resolution using commercial pet activity monitors. We built machine learning training databases from more than 5000 videos of more than 2500 dogs and ran the algorithms in production on more than 11 million days of device data. We then surveyed project participants representing 10,550 dogs, which provided 163,110 event responses to validate real-world detection of eating and drinking behavior. The resultant algorithm displayed a sensitivity and specificity for detecting drinking behavior (0.949 and 0.999, respectively) and eating behavior (0.988, 0.983). We also demonstrated detection of licking (0.772, 0.990), petting (0.305, 0.991), rubbing (0.729, 0.996), scratching (0.870, 0.997), and sniffing (0.610, 0.968). We show that the devices’ position on the collar had no measurable impact on performance. In production, users reported a true positive rate of 95.3% for eating (among 1514 users), and of 94.9% for drinking (among 1491 users). The study demonstrates the accurate detection of important health-related canine behaviors using a collar-mounted accelerometer. We trained and validated our algorithms on a large and realistic training dataset, and we assessed and confirmed accuracy in production via user validation.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alan Brnabic ◽  
Lisa M. Hess

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document