scholarly journals ADGraph: Accurate, Distributed Training on Large Graphs

2021 ◽  
Author(s):  
Lizhi Zhang ◽  
Zhiquan Lai ◽  
Feng Liu ◽  
Zhejiang Ran

Graph neural networks (GNNs) have been emerging as powerful learning tools for recommendation systems, social networks and knowledge graphs. In these domains, the scale of graph data is immense, so that distributed graph learning is required for efficient GNNs training. Graph partition-based methods are widely adopted to scale the graph training. However, most of the previous works focus on scalability other than the accuracy and are not thoroughly evaluated on large-scale graphs. In this paper, we introduce ADGraph (accurate and distributed training on large graphs), exploring how to improve accuracy while keeping large-scale graph training scalability. Firstly, to maintain complete neighbourhood information of the training nodes after graph partitioning, we assign l-hop neighbours of the training nodes to the same partition. We also analyse the accuracy and runtime performance of graph training, with different l-hop settings. Secondly, multi-layer neighbourhood sampling is performed on each partition, so that the mini-batch generated can accurately train target nodes. We study the relationship between convergence accuracy and the sampled layers. We also find that partial neighbourhood sampling can achieve better performance than full neighbourhood sampling. Thirdly, to further overcome the generalization error caused by large-batch training, we choose to reduce batchsize after graph partitioned and apply the linear scaling rule in distributed optimization. We evaluate ADGraph using GraphSage and GAT models with ogbn-products and Reddit datasets on 32 GPUs. Experimental results show that ADGraph achieves better performance than the benchmark accuracy of GraphSage and GAT, while getting 24-29 times speedup on 32 GPUs.

2020 ◽  
Vol 34 (05) ◽  
pp. 9596-9603
Author(s):  
Xuanyu Zhang

Question answering on complex tables is a challenging task for machines. In the Spider, a large-scale complex table dataset, relationships between tables and columns can be easily modeled as graph. But most of graph neural networks (GNNs) ignore the relationship of sibling nodes and use summation as aggregation function to model the relationship of parent-child nodes. It may cause nodes with less degrees, like column nodes in schema graph, to obtain little information. And the context information is important for natural language. To leverage more context information flow comprehensively, we propose novel cross flow graph neural networks in this paper. The information flows of parent-child and sibling nodes cross with history states between different layers. Besides, we use hierarchical encoding layer to obtain contextualized representation in tables. Experiments on the Spider show that our approach achieves substantial performance improvement comparing with previous GNN models and their variants.


Author(s):  
Nesreen Ahmed ◽  
Sasikanth Avancha ◽  
Evangelos Georganas ◽  
Alexander Heinecke ◽  
Dhiraj D. Kalamkar ◽  
...  

2013 ◽  
Vol 22 (04) ◽  
pp. 1350020 ◽  
Author(s):  
ANTONIO R. ANAYA ◽  
JESÚS G. BOTICARIO

Collaborative learning environments require intensive, regular and frequent analysis of the increasing amount of interaction data generated by students to assess that collaborative learning takes place. To support timely assessments that may benefit students and teachers the method of analysis must provide meaningful evaluations while the interactions take place. This research proposes machine learning-based techniques to infer the relationship between student collaboration and some quantitative domain-independent statistical indicators derived from large-scale evaluation analysis of student interactions. This paper (i) compares a set of metrics to identify the most suitable to assess student collaboration, (ii) reports on student evaluations of the metacognitive tools that display collaboration assessments from a new collaborative learning experience and (iii) extends previous findings to clarify modeling and usage issues. The advantages of the approach are: (1) it is based on domain-independent and generally observable features, (2) it provides regular and frequent data mining analysis with minimal teacher or student intervention, thereby supporting metacognition for the learners and corrective actions for the teachers, and (3) it can be easily transferred to other e-learning environments and include transferability features that are intended to facilitate its usage in other collaborative and social learning tools.


VASA ◽  
2020 ◽  
pp. 1-6
Author(s):  
Hanji Zhang ◽  
Dexin Yin ◽  
Yue Zhao ◽  
Yezhou Li ◽  
Dejiang Yao ◽  
...  

Summary: Our meta-analysis focused on the relationship between homocysteine (Hcy) level and the incidence of aneurysms and looked at the relationship between smoking, hypertension and aneurysms. A systematic literature search of Pubmed, Web of Science, and Embase databases (up to March 31, 2020) resulted in the identification of 19 studies, including 2,629 aneurysm patients and 6,497 healthy participants. Combined analysis of the included studies showed that number of smoking, hypertension and hyperhomocysteinemia (HHcy) in aneurysm patients was higher than that in the control groups, and the total plasma Hcy level in aneurysm patients was also higher. These findings suggest that smoking, hypertension and HHcy may be risk factors for the development and progression of aneurysms. Although the heterogeneity of meta-analysis was significant, it was found that the heterogeneity might come from the difference between race and disease species through subgroup analysis. Large-scale randomized controlled studies of single species and single disease species are needed in the future to supplement the accuracy of the results.


2020 ◽  
pp. 27-34
Author(s):  
Vladimir Batiuk

In this article, the ''Cold War'' is understood as a situation where the relationship between the leading States is determined by ideological confrontation and, at the same time, the presence of nuclear weapons precludes the development of this confrontation into a large-scale armed conflict. Such a situation has developed in the years 1945–1989, during the first Cold War. We see that something similar is repeated in our time-with all the new nuances in the ideological struggle and in the nuclear arms race.


2020 ◽  
Author(s):  
Amir Karami ◽  
Brandon Bookstaver ◽  
Melissa Nolan

BACKGROUND The COVID-19 pandemic has impacted nearly all aspects of life and has posed significant threats to international health and the economy. Given the rapidly unfolding nature of the current pandemic, there is an urgent need to streamline literature synthesis of the growing scientific research to elucidate targeted solutions. While traditional systematic literature review studies provide valuable insights, these studies have restrictions, including analyzing a limited number of papers, having various biases, being time-consuming and labor-intensive, focusing on a few topics, incapable of trend analysis, and lack of data-driven tools. OBJECTIVE This study fills the mentioned restrictions in the literature and practice by analyzing two biomedical concepts, clinical manifestations of disease and therapeutic chemical compounds, with text mining methods in a corpus containing COVID-19 research papers and find associations between the two biomedical concepts. METHODS This research has collected papers representing COVID-19 pre-prints and peer-reviewed research published in 2020. We used frequency analysis to find highly frequent manifestations and therapeutic chemicals, representing the importance of the two biomedical concepts. This study also applied topic modeling to find the relationship between the two biomedical concepts. RESULTS We analyzed 9,298 research papers published through May 5, 2020 and found 3,645 disease-related and 2,434 chemical-related articles. The most frequent clinical manifestations of disease terminology included COVID-19, SARS, cancer, pneumonia, fever, and cough. The most frequent chemical-related terminology included Lopinavir, Ritonavir, Oxygen, Chloroquine, Remdesivir, and water. Topic modeling provided 25 categories showing relationships between our two overarching categories. These categories represent statistically significant associations between multiple aspects of each category, some connections of which were novel and not previously identified by the scientific community. CONCLUSIONS Appreciation of this context is vital due to the lack of a systematic large-scale literature review survey and the importance of fast literature review during the current COVID-19 pandemic for developing treatments. This study is beneficial to researchers for obtaining a macro-level picture of literature, to educators for knowing the scope of literature, to journals for exploring most discussed disease symptoms and pharmaceutical targets, and to policymakers and funding agencies for creating scientific strategic plans regarding COVID-19.


2019 ◽  
Vol 22 (3) ◽  
pp. 365-380 ◽  
Author(s):  
Matthias Olthaar ◽  
Wilfred Dolfsma ◽  
Clemens Lutz ◽  
Florian Noseleit

In a competitive business environment at the Bottom of the Pyramid smallholders supplying global value chains may be thought to be at the whims of downstream large-scale players and local market forces, leaving no room for strategic entrepreneurial behavior. In such a context we test the relationship between the use of strategic resources and firm performance. We adopt the Resource Based Theory and show that seemingly homogenous smallholders deploy resources differently and, consequently, some do outperform others. We argue that the ‘resource-based theory’ results in a more fine-grained understanding of smallholder performance than approaches generally applied in agricultural economics. We develop a mixed-method approach that allows one to pinpoint relevant, industry-specific resources, and allows for empirical identification of the relative contribution of each resource to competitive advantage. The results show that proper use of quality labor, storage facilities, time of selling, and availability of animals are key capabilities.


Author(s):  
Richard Culliford ◽  
Alex J. Cornish ◽  
Philip J. Law ◽  
Susan M. Farrington ◽  
Kimmo Palin ◽  
...  

Abstract Background Epidemiological studies of the relationship between gallstone disease and circulating levels of bilirubin with risk of developing colorectal cancer (CRC) have been inconsistent. To address possible confounding and reverse causation, we examine the relationship between these potential risk factors and CRC using Mendelian randomisation (MR). Methods We used two-sample MR to examine the relationship between genetic liability to gallstone disease and circulating levels of bilirubin with CRC in 26,397 patients and 41,481 controls. We calculated the odds ratio per genetically predicted SD unit increase in log bilirubin levels (ORSD) for CRC and tested for a non-zero causal effect of gallstones on CRC. Sensitivity analysis was applied to identify violations of estimator assumptions. Results No association between either gallstone disease (P value = 0.60) or circulating levels of bilirubin (ORSD = 1.00, 95% confidence interval (CI) = 0.96–1.03, P value = 0.90) with CRC was shown. Conclusions Despite the large scale of this study, we found no evidence for a causal relationship between either circulating levels of bilirubin or gallstone disease with risk of developing CRC. While the magnitude of effect suggested by some observational studies can confidently be excluded, we cannot exclude the possibility of smaller effect sizes and non-linear relationships.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Siddharth Arora ◽  
Alexandra Brintrup

AbstractThe relationship between a firm and its supply chain has been well studied, however, the association between the position of firms in complex supply chain networks and their performance has not been adequately investigated. This is primarily due to insufficient availability of empirical data on large-scale networks. To addresses this gap in the literature, we investigate the relationship between embeddedness patterns of individual firms in a supply network and their performance using empirical data from the automotive industry. In this study, we devise three measures that characterize the embeddedness of individual firms in a supply network. These are namely: centrality, tier position, and triads. Our findings caution us that centrality impacts individual performance through a diminishing returns relationship. The second measure, tier position, allows us to investigate the concept of tiers in supply networks because we find that as networks emerge, the boundaries between tiers become unclear. Performance of suppliers degrade as they move away from the focal firm (i.e., Toyota). The final measure, triads, investigates the effect of buying and selling to firms that supply the same customer, portraying the level of competition and cooperation in a supplier’s network. We find that increased coopetition (i.e., cooperative competition) is a performance enhancer, however, excessive complexity resulting from being involved in both upstream and downstream coopetition results in diminishing performance. These original insights help understand the drivers of firm performance from a network perspective and provide a basis for further research.


Sign in / Sign up

Export Citation Format

Share Document