Combining Apache Spark & OrientDb to Find the Influence of a Scientific Paper in a Citation Network

In our study, we examine the impact of citation network structures on the ability to discern valuable research topics in Computer Science literature. We use the bibliographic information available in the DBLP database to extract candidate phrases from scientific paper abstracts. Following that, we construct citation networks based on direct citation, co-citation and bibliographic coupling relationships between the papers. The candidate research topics, in the form of keyphrases and n-grammes, are subsequently ranked and filtered by a graph-text ranking algorithm. This selection of the highest ranked potential topics is further evaluated by domain experts and through the Wikipedia knowledge base. The results obtained from these citation networks are complementary, returning valid but non-overlapping output phrases between some pairs of networks. In particular, bibliographic coupling appears to capture more unique information than either direct citation or co-citation. These findings point towards the possible added value in combining bibliographic coupling analysis with other structures. At the same time, combining direct citation and co-citation is put into question. We expect our findings to be utilised in method design for research topic identification.

Download Full-text

Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization

2015 IEEE International Conference on Data Mining Workshop (ICDMW) ◽

10.1109/icdmw.2015.105 ◽

2015 ◽

Author(s):

Yue Su ◽

Sibai Sun ◽

Yuan Xuan ◽

Lei Shi

Keyword(s):

Citation Network ◽

Scientific Paper ◽

Through Flow

Download Full-text

Fusion of text and graph information for machine learning problems on networks

PeerJ Computer Science ◽

10.7717/peerj-cs.526 ◽

2021 ◽

Vol 7 ◽

pp. e526

Author(s):

Ilya Makarov ◽

Mikhail Makarov ◽

Dmitrii Kiselev

Keyword(s):

Machine Learning ◽

Link Prediction ◽

Citation Network ◽

Representation Learning ◽

Scientific Paper ◽

Learning Problems ◽

Network Embedding ◽

Network Properties ◽

Text Information ◽

Low Dimensional

Today, increased attention is drawn towards network representation learning, a technique that maps nodes of a network into vectors of a low-dimensional embedding space. A network embedding constructed this way aims to preserve nodes similarity and other specific network properties. Embedding vectors can later be used for downstream machine learning problems, such as node classification, link prediction and network visualization. Naturally, some networks have text information associated with them. For instance, in a citation network, each node is a scientific paper associated with its abstract or title; in a social network, all users may be viewed as nodes of a network and posts of each user as textual attributes. In this work, we explore how combining existing methods of text and network embeddings can increase accuracy for downstream tasks and propose modifications to popular architectures to better capture textual information in network embedding and fusion frameworks.

Download Full-text

Supplemental Material for The Psychology of Men and Masculinities: Using Citation Network Analysis to Understand Research Domains, Collaborations, and Grant Competitiveness

Psychology of Men & Masculinity ◽

10.1037/men0000139.supp ◽

2017 ◽

Keyword(s):

Network Analysis ◽

Citation Network ◽

Citation Network Analysis ◽

Psychology Of Men ◽

Men And Masculinities ◽

Research Domains

Download Full-text

Exploring key transformism properties of patent citation network: the base of hybrid rice

Advances in Industrial Engineering, Information and Water Resources ◽

10.2495/aie120261 ◽

2012 ◽

Author(s):

Tianhua Song ◽

Guang Yu ◽

Chunsheng Shi ◽

Hui Zhou ◽

Bo Zou

Keyword(s):

Hybrid Rice ◽

Citation Network ◽

Patent Citation ◽

Patent Citation Network

Download Full-text

Analysis of Retail Data using Apache Spark

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.11621165 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1162-1165

Author(s):

Himani Agnihotri ◽

Bharti Nagpal

Keyword(s):

Apache Spark

Download Full-text

DISTRIBUTED PROCESSING OF LARGE VOLUMES OF TRANSACTIONAL DATA

Naukovyi visnyk Donetskoho natsionalnoho tekhnichnoho universytetu ◽

10.31474/2415-7902-2020-1(4)-2(5)-27-36 ◽

2020 ◽

pp. 27-36

Author(s):

O. Dmytriieva ◽

◽

D. Nikulin

Keyword(s):

Distributed Processing ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Transactional Data

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark

Download Full-text

Who Talks and Who Listens? A Bibliometric Analysis of Economics Journals Co-Citation Network

SSRN Electronic Journal ◽

10.2139/ssrn.2885126 ◽

2016 ◽

Author(s):

Tianhao Wu

Keyword(s):

Bibliometric Analysis ◽

Citation Network ◽

Economics Journals

Download Full-text

A Review of Minimum Quantity Lubrication (MQL) based on Bibliometry

Current Materials Science ◽

10.2174/2666145413999201222104811 ◽

2020 ◽

Vol 13 ◽

Author(s):

Gaurav Gaurav ◽

Abhay Sharma ◽

G S Dangayach ◽

M L Meena

Keyword(s):

Descriptive Analysis ◽

Minimum Quantity Lubrication ◽

Citation Network ◽

Machining Process ◽

Cutting Fluid ◽

Future Research ◽

Research Directions ◽

Minimum Quantity ◽

Research Areas ◽

Future Research Directions

Background: Minimum quantity lubrication (MQL) is one of the most promising machining techniques that can yield a reduction in consumption of cutting fluid more than 90 % while ensuring the surface quality and tool life. The significance of the MQL in machining makes it imperative to consolidate and analyse the current direction and status of research in MQL. Objective: This study aims to assess global research publication trends and hot topics in the field of MQL among machining process. The bibliometric and descriptive analysis are the tools that the investigation aims to use for the data analysis of related literature collected from Scopus databases. Methods: Various performance parameters are extracted, such as document types and languages of publication, annual scientific production, total documents, total citations, and citations per article. The top 20 of the most relevant and productive sources, authors, affiliations, countries, word cloud, and word dynamics are assessed. The graphical visualisation of the bibliometric data is presented in terms of bibliographic coupling, citation, and co-citation network. Results: The investigation reveals that the International Journal of Machine Tools and Manufacture (2611 citations, 31 hindex) is the most productive journal that publishes on MQL. The most productive institution is the University of Michigan (32 publications), the most cited country is Germany (1879 citations), and the most productive country in MQL is China (124 publications). The study shows that ‘Cryogenic Machining’, ‘Sustainable Machining’, ‘Sustainability’, ‘Nanofluid’ and ‘Titanium alloy’ are the most recent keywords and indications of the hot topics and future research directions in the MQL field. Conclusion: The analysis finds that MQL is progressing in publications and the emerging with issues that are strongly associated with the research. This study is expected to help the researchers to find the most current research areas through the author’s keywords and future research directions in MQL and thereby expand their research interests.

Download Full-text

Increase the Performance of K-Means Clustering Algorithm Using Apache Spark

The International Journal of Internet of Things and its Applications ◽

10.21742/ijiota.2017.1.1.02 ◽

2017 ◽

Vol 1 (1) ◽

pp. 13-28 ◽

Cited By ~ 1

Author(s):

Chang Xie ◽

Keyword(s):

Clustering Algorithm ◽

Apache Spark

Download Full-text