Large Scale Graph Mining with MapReduce

Author(s):  
Charalampos E. Tsourakakis

In this chapter, the authors present state of the art work on large scale graph mining using MapReduce. They survey research work on an important graph mining problem, estimating the diameter of a graph and the eccentricities/radii of its vertices. Thanks to the algorithm they present in the following, the authors are able to mine graphs with billions of edges, and thus extract surprising patterns. The source code is publicly available at the URL http://www.cs.cmu.edu/~pegasus/.

Author(s):  
Charalampos E. Tsourakakis

In this Chapter, we present state of the art work on large scale graph mining using MapReduce. We survey research work on an important graph mining problem, counting the number of triangles in large-real world networks. We present the most important applications related to the count of triangles and two families of algorithms, a spectral and a combinatorial one, which solve the problem efficiently.


Polymers ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 2115
Author(s):  
Meghan E. Lamm ◽  
Lu Wang ◽  
Vidya Kishore ◽  
Halil Tekinalp ◽  
Vlastimil Kunc ◽  
...  

Wood and lignocellulosic-based material components are explored in this review as functional additives and reinforcements in composites for extrusion-based additive manufacturing (AM) or 3D printing. The motivation for using these sustainable alternatives in 3D printing includes enhancing material properties of the resulting printed parts, while providing a green alternative to carbon or glass filled polymer matrices, all at reduced material costs. Previous review articles on this topic have focused only on introducing the use of natural fillers with material extrusion AM and discussion of their subsequent material properties. This review not only discusses the present state of materials extrusion AM using natural filler-based composites but will also fill in the knowledge gap regarding state-of-the-art applications of these materials. Emphasis will also be placed on addressing the challenges associated with 3D printing using these materials, including use with large-scale manufacturing, while providing insight to overcome these issues in the future.


Author(s):  
Xiaodong Gu ◽  
Hongyu Zhang ◽  
Dongmei Zhang ◽  
Sunghun Kim

Computer programs written in one language are often required to be ported to other languages to support multiple devices and environments. When programs use language specific APIs (Application Programming Interfaces), it is very challenging to migrate these APIs to the corresponding APIs written in other languages. Existing approaches mine API mappings from projects that have corresponding versions in two languages. They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings. In this paper, we propose an intelligent system called DeepAM for automatically mining API mappings from a large-scale code corpus without bilingual projects. The key component of DeepAM is based on the multi-modal sequence to sequence learning architecture that aims to learn joint semantic representations of bilingual API sequences from big source code data. Experimental results indicate that DeepAM significantly increases the accuracy of API mappings as well as the number of API mappings when compared with the state-of-the-art approaches.


Author(s):  
Khadija Slimani ◽  
Mohamed Kas ◽  
Youssef El Merabet ◽  
Yassine Ruichek ◽  
Rochdi Messoussi

Notwithstanding the recent technological advancement, the identification of facial and emotional expressions is still one of the greatest challenges scientists have ever faced. Generally, the human face is identified as a composition made up of textures arranged in micro-patterns. Currently, there has been a tremendous increase in the use of local binary pattern based texture algorithms which have invariably been identified to being essential in the completion of a variety of tasks and in the extraction of essential attributes from an image. Over the years, lots of LBP variants have been literally reviewed. However, what is left is a thorough and comprehensive analysis of their independent performance. This research work aims at filling this gap by performing a large-scale performance evaluation of 46 recent state-of-the-art LBP variants for facial expression recognition. Extensive experimental results on the well-known challenging and benchmark KDEF, JAFFE, CK and MUG databases taken under different facial expression conditions, indicate that a number of evaluated state-of-the-art LBP-like methods achieve promising results, which are better or competitive than several recent state-of-the-art facial recognition systems. Recognition rates of 100%, 98.57%, 95.92% and 100% have been reached for CK, JAFFE, KDEF and MUG databases, respectively.


2021 ◽  
Vol 9 ◽  
pp. 176-194
Author(s):  
Xiaozhi Wang ◽  
Tianyu Gao ◽  
Zhaocheng Zhu ◽  
Zhengyan Zhang ◽  
Zhiyuan Liu ◽  
...  

Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity embeddings, but conventional KE models cannot take full advantage of the abundant textual information. In this paper, we propose a unified model for Knowledge Embedding and Pre-trained LanguagERepresentation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs. In KEPLER, we encode textual entity descriptions with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. Experimental results show that KEPLER achieves state-of-the-art performances on various NLP tasks, and also works remarkably well as an inductive KE model on KG link prediction. Furthermore, for pre-training and evaluating KEPLER, we construct Wikidata5M1 , a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. The source code can be obtained from https://github.com/THU-KEG/KEPLER.


Author(s):  
Xing Hu ◽  
Ge Li ◽  
Xin Xia ◽  
David Lo ◽  
Shuai Lu ◽  
...  

Code summarization, aiming to generate succinct natural language description of source code, is extremely useful for code search and code comprehension. It has played an important role in software maintenance and evolution. Previous approaches generate summaries by retrieving summaries from similar code snippets. However, these approaches heavily rely on whether similar code snippets can be retrieved, how similar the snippets are, and fail to capture the API knowledge in the source code, which carries vital information about the functionality of the source code. In this paper, we propose a novel approach, named TL-CodeSum, which successfully uses API knowledge learned in a different but related task to code summarization. Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5073
Author(s):  
Khalil Khan ◽  
Waleed Albattah ◽  
Rehan Ullah Khan ◽  
Ali Mustafa Qamar ◽  
Durre Nayab

Real time crowd analysis represents an active area of research within the computer vision community in general and scene analysis in particular. Over the last 10 years, various methods for crowd management in real time scenario have received immense attention due to large scale applications in people counting, public events management, disaster management, safety monitoring an so on. Although many sophisticated algorithms have been developed to address the task; crowd management in real time conditions is still a challenging problem being completely solved, particularly in wild and unconstrained conditions. In the proposed paper, we present a detailed review of crowd analysis and management, focusing on state-of-the-art methods for both controlled and unconstrained conditions. The paper illustrates both the advantages and disadvantages of state-of-the-art methods. The methods presented comprise the seminal research works on crowd management, and monitoring and then culminating state-of-the-art methods of the newly introduced deep learning methods. Comparison of the previous methods is presented, with a detailed discussion of the direction for future research work. We believe this review article will contribute to various application domains and will also augment the knowledge of the crowd analysis within the research community.


Author(s):  
Chunlei Li ◽  
Chris McMahon ◽  
Linda Newnes

In many engineering fields, a great deal of development is based on information processing, in particular the storing, retrieving, interpretation, and re-use of existing data. To be more competitive, the fast developing Product Lifecycle Management (PLM) systems are widely deployed by large scale enterprises. In order to improve the efficiency of data management and communication, annotation technology is considered as a promising approach to aid collaboration between design teams in concurrent design and to aid various needs during the entire product lifecycle. In this paper, a classification of approaches to annotation based on an investigation of the state-of-the-art is presented. Cases are used to illustrate how these approaches aid different phases of the product life cycle. Finally, future challenges in the use of annotation in engineering are discussed. Through this research, the contribution of the use of annotation is demonstrated, and further research work is proposed based on the findings.


1984 ◽  
Vol 29 (4) ◽  
pp. 344-346
Author(s):  
Peter A. Magaro

Sign in / Sign up

Export Citation Format

Share Document