ADGraph: Accurate, Distributed Training on Large Graphs

Graph neural networks (GNNs) have been emerging as powerful learning tools for recommendation systems, social networks and knowledge graphs. In these domains, the scale of graph data is immense, so that distributed graph learning is required for efficient GNNs training. Graph partition-based methods are widely adopted to scale the graph training. However, most of the previous works focus on scalability other than the accuracy and are not thoroughly evaluated on large-scale graphs. In this paper, we introduce ADGraph (accurate and distributed training on large graphs), exploring how to improve accuracy while keeping large-scale graph training scalability. Firstly, to maintain complete neighbourhood information of the training nodes after graph partitioning, we assign l-hop neighbours of the training nodes to the same partition. We also analyse the accuracy and runtime performance of graph training, with different l-hop settings. Secondly, multi-layer neighbourhood sampling is performed on each partition, so that the mini-batch generated can accurately train target nodes. We study the relationship between convergence accuracy and the sampled layers. We also find that partial neighbourhood sampling can achieve better performance than full neighbourhood sampling. Thirdly, to further overcome the generalization error caused by large-batch training, we choose to reduce batchsize after graph partitioned and apply the linear scaling rule in distributed optimization. We evaluate ADGraph using GraphSage and GAT models with ogbn-products and Reddit datasets on 32 GPUs. Experimental results show that ADGraph achieves better performance than the benchmark accuracy of GraphSage and GAT, while getting 24-29 times speedup on 32 GPUs.

Download Full-text

CFGNN: Cross Flow Graph Neural Networks for Question Answering on Complex Tables

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6506 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9596-9603

Author(s):

Xuanyu Zhang

Keyword(s):

Neural Networks ◽

Large Scale ◽

Question Answering ◽

Cross Flow ◽

Context Information ◽

Flow Graph ◽

Graph Neural Networks ◽

Relationship Of ◽

The Relationship ◽

Parent Child

Question answering on complex tables is a challenging task for machines. In the Spider, a large-scale complex table dataset, relationships between tables and columns can be easily modeled as graph. But most of graph neural networks (GNNs) ignore the relationship of sibling nodes and use summation as aggregation function to model the relationship of parent-child nodes. It may cause nodes with less degrees, like column nodes in schema graph, to obtain little information. And the context information is important for natural language. To leverage more context information flow comprehensively, we propose novel cross flow graph neural networks in this paper. The information flows of parent-child and sibling nodes cross with history states between different layers. Besides, we use hierarchical encoding layer to obtain contextualized representation in tables. Experiments on the Spider show that our approach achieves substantial performance improvement comparing with previous GNN models and their variants.

Download Full-text

Implementation of the article "DistGNN: scalable distributed training for large-scale graph neural networks"

Artifact Digital Object Group ◽

10.1145/3476483 ◽

2021 ◽

Author(s):

Nesreen Ahmed ◽

Sasikanth Avancha ◽

Evangelos Georganas ◽

Alexander Heinecke ◽

Dhiraj D. Kalamkar ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Distributed Training ◽

Graph Neural Networks

Download Full-text

A DOMAIN-INDEPENDENT, TRANSFERABLE AND TIMELY ANALYSIS APPROACH TO ASSESS STUDENT COLLABORATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500206 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350020 ◽

Cited By ~ 1

Author(s):

ANTONIO R. ANAYA ◽

JESÚS G. BOTICARIO

Keyword(s):

Collaborative Learning ◽

Learning Environments ◽

Large Scale ◽

Learning Experience ◽

Learning Tools ◽

Student Interactions ◽

Student Collaboration ◽

E Learning ◽

The Relationship ◽

Domain Independent

Collaborative learning environments require intensive, regular and frequent analysis of the increasing amount of interaction data generated by students to assess that collaborative learning takes place. To support timely assessments that may benefit students and teachers the method of analysis must provide meaningful evaluations while the interactions take place. This research proposes machine learning-based techniques to infer the relationship between student collaboration and some quantitative domain-independent statistical indicators derived from large-scale evaluation analysis of student interactions. This paper (i) compares a set of metrics to identify the most suitable to assess student collaboration, (ii) reports on student evaluations of the metacognitive tools that display collaboration assessments from a new collaborative learning experience and (iii) extends previous findings to clarify modeling and usage issues. The advantages of the approach are: (1) it is based on domain-independent and generally observable features, (2) it provides regular and frequent data mining analysis with minimal teacher or student intervention, thereby supporting metacognition for the learners and corrective actions for the teachers, and (3) it can be easily transferred to other e-learning environments and include transferability features that are intended to facilitate its usage in other collaborative and social learning tools.

Download Full-text

Relationship between total plasma homocysteine and the risk of aneurysms – a meta-analysis

VASA ◽

10.1024/0301-1526/a000891 ◽

2020 ◽

pp. 1-6

Author(s):

Hanji Zhang ◽

Dexin Yin ◽

Yue Zhao ◽

Yezhou Li ◽

Dejiang Yao ◽

...

Keyword(s):

Large Scale ◽

Meta Analysis ◽

Single Species ◽

Total Plasma ◽

Control Groups ◽

Randomized Controlled ◽

Randomized Controlled Studies ◽

The Difference ◽

The Relationship ◽

Healthy Participants

Summary: Our meta-analysis focused on the relationship between homocysteine (Hcy) level and the incidence of aneurysms and looked at the relationship between smoking, hypertension and aneurysms. A systematic literature search of Pubmed, Web of Science, and Embase databases (up to March 31, 2020) resulted in the identification of 19 studies, including 2,629 aneurysm patients and 6,497 healthy participants. Combined analysis of the included studies showed that number of smoking, hypertension and hyperhomocysteinemia (HHcy) in aneurysm patients was higher than that in the control groups, and the total plasma Hcy level in aneurysm patients was also higher. These findings suggest that smoking, hypertension and HHcy may be risk factors for the development and progression of aneurysms. Although the heterogeneity of meta-analysis was significant, it was found that the heterogeneity might come from the difference between race and disease species through subgroup analysis. Large-scale randomized controlled studies of single species and single disease species are needed in the future to supplement the accuracy of the results.

Download Full-text

New ''Cold War''

Diplomatic Service ◽

10.33920/vne-01-2001-04 ◽

2020 ◽

pp. 27-34

Author(s):

Vladimir Batiuk

Keyword(s):

Cold War ◽

Nuclear Weapons ◽

Armed Conflict ◽

Large Scale ◽

Arms Race ◽

Ideological Struggle ◽

Nuclear Arms Race ◽

The Cold War ◽

The Relationship ◽

Nuclear Arms

In this article, the ''Cold War'' is understood as a situation where the relationship between the leading States is determined by ideological confrontation and, at the same time, the presence of nuclear weapons precludes the development of this confrontation into a large-scale armed conflict. Such a situation has developed in the years 1945–1989, during the first Cold War. We see that something similar is repeated in our time-with all the new nuances in the ideological struggle and in the nuclear arms race.

Download Full-text

Exploring the Relationship Between Distributed Training, Integrated Learning Environments, and Immersive Training Environments

10.21236/ada463181 ◽

2007 ◽

Author(s):

Adrienne Y. Lee

Keyword(s):

Learning Environments ◽

Integrated Learning ◽

Distributed Training ◽

The Relationship

Download Full-text

Investigating Diseases and Chemicals in COVID-19 Literature with Text Mining (Preprint)

10.2196/preprints.21503 ◽

2020 ◽

Author(s):

Amir Karami ◽

Brandon Bookstaver ◽

Melissa Nolan

Keyword(s):

Text Mining ◽

Literature Review ◽

Topic Modeling ◽

Large Scale ◽

Clinical Manifestations ◽

International Health ◽

Research Papers ◽

Strategic Plans ◽

Funding Agencies ◽

The Relationship

BACKGROUND The COVID-19 pandemic has impacted nearly all aspects of life and has posed significant threats to international health and the economy. Given the rapidly unfolding nature of the current pandemic, there is an urgent need to streamline literature synthesis of the growing scientific research to elucidate targeted solutions. While traditional systematic literature review studies provide valuable insights, these studies have restrictions, including analyzing a limited number of papers, having various biases, being time-consuming and labor-intensive, focusing on a few topics, incapable of trend analysis, and lack of data-driven tools. OBJECTIVE This study fills the mentioned restrictions in the literature and practice by analyzing two biomedical concepts, clinical manifestations of disease and therapeutic chemical compounds, with text mining methods in a corpus containing COVID-19 research papers and find associations between the two biomedical concepts. METHODS This research has collected papers representing COVID-19 pre-prints and peer-reviewed research published in 2020. We used frequency analysis to find highly frequent manifestations and therapeutic chemicals, representing the importance of the two biomedical concepts. This study also applied topic modeling to find the relationship between the two biomedical concepts. RESULTS We analyzed 9,298 research papers published through May 5, 2020 and found 3,645 disease-related and 2,434 chemical-related articles. The most frequent clinical manifestations of disease terminology included COVID-19, SARS, cancer, pneumonia, fever, and cough. The most frequent chemical-related terminology included Lopinavir, Ritonavir, Oxygen, Chloroquine, Remdesivir, and water. Topic modeling provided 25 categories showing relationships between our two overarching categories. These categories represent statistically significant associations between multiple aspects of each category, some connections of which were novel and not previously identified by the scientific community. CONCLUSIONS Appreciation of this context is vital due to the lack of a systematic large-scale literature review survey and the importance of fast literature review during the current COVID-19 pandemic for developing treatments. This study is beneficial to researchers for obtaining a macro-level picture of literature, to educators for knowing the scope of literature, to journals for exploring most discussed disease symptoms and pharmaceutical targets, and to policymakers and funding agencies for creating scientific strategic plans regarding COVID-19.

Download Full-text

Strategic resources and smallholder performance at the bottom of the pyramid

International Food and Agribusiness Management Review ◽

10.22434/ifamr2018.0111 ◽

2019 ◽

Vol 22 (3) ◽

pp. 365-380 ◽

Cited By ~ 1

Author(s):

Matthias Olthaar ◽

Wilfred Dolfsma ◽

Clemens Lutz ◽

Florian Noseleit

Keyword(s):

Large Scale ◽

Business Environment ◽

Agricultural Economics ◽

Local Market ◽

Bottom Of The Pyramid ◽

Fine Grained ◽

Strategic Resources ◽

Relative Contribution ◽

The Relationship ◽

Resource Based Theory

In a competitive business environment at the Bottom of the Pyramid smallholders supplying global value chains may be thought to be at the whims of downstream large-scale players and local market forces, leaving no room for strategic entrepreneurial behavior. In such a context we test the relationship between the use of strategic resources and firm performance. We adopt the Resource Based Theory and show that seemingly homogenous smallholders deploy resources differently and, consequently, some do outperform others. We argue that the ‘resource-based theory’ results in a more fine-grained understanding of smallholder performance than approaches generally applied in agricultural economics. We develop a mixed-method approach that allows one to pinpoint relevant, industry-specific resources, and allows for empirical identification of the relative contribution of each resource to competitive advantage. The results show that proper use of quality labor, storage facilities, time of selling, and availability of animals are key capabilities.

Download Full-text

Lack of an association between gallstone disease and bilirubin levels with risk of colorectal cancer: a Mendelian randomisation analysis

British Journal of Cancer ◽

10.1038/s41416-020-01211-x ◽

2021 ◽

Author(s):

Richard Culliford ◽

Alex J. Cornish ◽

Philip J. Law ◽

Susan M. Farrington ◽

Kimmo Palin ◽

...

Keyword(s):

Colorectal Cancer ◽

Large Scale ◽

Causal Effect ◽

Gallstone Disease ◽

Epidemiological Studies ◽

Mendelian Randomisation ◽

Linear Relationships ◽

Potential Risk Factors ◽

The Relationship ◽

Circulating Levels

Abstract Background Epidemiological studies of the relationship between gallstone disease and circulating levels of bilirubin with risk of developing colorectal cancer (CRC) have been inconsistent. To address possible confounding and reverse causation, we examine the relationship between these potential risk factors and CRC using Mendelian randomisation (MR). Methods We used two-sample MR to examine the relationship between genetic liability to gallstone disease and circulating levels of bilirubin with CRC in 26,397 patients and 41,481 controls. We calculated the odds ratio per genetically predicted SD unit increase in log bilirubin levels (ORSD) for CRC and tested for a non-zero causal effect of gallstones on CRC. Sensitivity analysis was applied to identify violations of estimator assumptions. Results No association between either gallstone disease (P value = 0.60) or circulating levels of bilirubin (ORSD = 1.00, 95% confidence interval (CI) = 0.96–1.03, P value = 0.90) with CRC was shown. Conclusions Despite the large scale of this study, we found no evidence for a causal relationship between either circulating levels of bilirubin or gallstone disease with risk of developing CRC. While the magnitude of effect suggested by some observational studies can confidently be excluded, we cannot exclude the possibility of smaller effect sizes and non-linear relationships.

Download Full-text

How does the position of firms in the supply chain affect their performance? An empirical study

Applied Network Science ◽

10.1007/s41109-021-00364-9 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Siddharth Arora ◽

Alexandra Brintrup

Keyword(s):

Supply Chain ◽

Empirical Data ◽

Large Scale ◽

Individual Performance ◽

Supply Network ◽

Supply Chain Networks ◽

Focal Firm ◽

Cooperative Competition ◽

Large Scale Networks ◽

The Relationship

AbstractThe relationship between a firm and its supply chain has been well studied, however, the association between the position of firms in complex supply chain networks and their performance has not been adequately investigated. This is primarily due to insufficient availability of empirical data on large-scale networks. To addresses this gap in the literature, we investigate the relationship between embeddedness patterns of individual firms in a supply network and their performance using empirical data from the automotive industry. In this study, we devise three measures that characterize the embeddedness of individual firms in a supply network. These are namely: centrality, tier position, and triads. Our findings caution us that centrality impacts individual performance through a diminishing returns relationship. The second measure, tier position, allows us to investigate the concept of tiers in supply networks because we find that as networks emerge, the boundaries between tiers become unclear. Performance of suppliers degrade as they move away from the focal firm (i.e., Toyota). The final measure, triads, investigates the effect of buying and selling to firms that supply the same customer, portraying the level of competition and cooperation in a supplier’s network. We find that increased coopetition (i.e., cooperative competition) is a performance enhancer, however, excessive complexity resulting from being involved in both upstream and downstream coopetition results in diminishing performance. These original insights help understand the drivers of firm performance from a network perspective and provide a basis for further research.

Download Full-text