Scalable Recommendation Using Large Scale Graph Partitioning With Pregel and Giraph

Social Big Data is generated by interactions of connected users on social network. Sharing of opinions and contents amongst users, reviews of users for products, result in social Big Data. If any user intends to select products such as movies, books, etc., from e-commerce sites or view any topic or opinion on social networking sites, there are a lot of options and these options result in information overload. Social recommendation systems assist users to make better selection as per their likings. Recent research works have improved recommendation systems by using matrix factorization, social regularization or social trust inference. Furthermore, these improved systems are able to alleviate cold start and sparsity, but not efficient for scalability. The main focus of this article is to improve scalability in terms of locality and throughput and provides better recommendations to users with large-scale data in less response time. In this article, the social big graph is partitioned and distributed on different nodes based on Pregel and Giraph. In the proposed approach ScaleRec, partitioning is based on direct as well as indirect trust between users and comparison with state-of-the-art approaches proves that statistically better partitioning quality is achieved using proposed approach. In ScaleRec, hyperedge and transitive closure are used to enhance social trust amongst users. Experiment analysis on standard datasets such as Epinions and LiveJournal proves that better locality and recommendation accuracy is achieved by using ScaleRec.

Download Full-text

The Recommended Online Social Networks (OSN) and Big Data System Using Large Scale Graph Partitioning with Mahout and PowerGraph

10.21203/rs.3.rs-45888/v1 ◽

2020 ◽

Author(s):

Ford Lumban Gaol ◽

Tokuro Matsuo

Keyword(s):

Big Data ◽

Online Social Networks ◽

Social Networking Sites ◽

Large Scale ◽

Information Overload ◽

Social Trust ◽

Recommendation Systems ◽

Large Scale Data ◽

Social Big Data ◽

Recommendation Accuracy

Abstract Introduction : Social Big data is generated by interactions of connected users on social network. Sharing of opinions and contents amongst users, reviews of users for products, result in Social Big data. If any user intends to select product such as movies, books etc. from e-commerce sites or view any topic or opinion on social networking sites, there are a lot of options and these options result in information overload. Case Description : Social recommendation systems assist users to make better selection as per their likings. Recent research works have improved recommendation systems by using matrix factorization, social regularization or social trust inference. Furthermore, these improved systems are able to alleviate cold start and sparsity but not efficient for scalability. Discussion and Evaluation: The main focus of this paper is to improve scalability and provide better recommendations to users with large-scale data in less response time. We have partitioned social big graph and distributed it on different nodes based on Mahout and PowerGraph like system. Conclusion : In our approach, partitioning is based on direct as well as indirect trust between users and comparison with state-of-the-art approaches proves that statistically better partitioning quality is achieved using our approach. In our proposed approach ScaleRec, hyperedge and transitive closure are used to enhance social trust amongst users. Experiment analysis on standard datasets proves that better locality and recommendation accuracy is achieved by using our proposed approach.

Download Full-text

Microcomputers in Political Science

News for Teachers of Political Science ◽

10.1017/s0197901900005079 ◽

1983 ◽

Vol 38 ◽

pp. 1-9

Author(s):

Herbert F. Weisberg

Keyword(s):

Data Analysis ◽

Political Science ◽

Large Scale ◽

Turnaround Time ◽

General Purpose ◽

Batch Mode ◽

New Era ◽

Large Scale Data ◽

The Social ◽

Frequency Counts

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.

Download Full-text

The Perceived Impact of Social Networking Sites and Apps on the Social Capital of Saudi Postgraduate Students: A Case Study

Future Internet ◽

10.3390/fi13010020 ◽

2021 ◽

Vol 13 (1) ◽

pp. 20

Author(s):

Abdulelah A. Alghamdi ◽

Margaret Plunkett

Keyword(s):

Social Capital ◽

Life Satisfaction ◽

Social Networking ◽

Social Networking Sites ◽

Civic Participation ◽

Social Trust ◽

Political Engagement ◽

Postgraduate Students ◽

The Social ◽

The Impact

With the increased use of Social Networking Sites and Apps (SNSAs) in Saudi Arabia, it is important to consider the impact of this on the social lives of tertiary students, who are heavy users of such technology. A mixed methods study exploring the effect of SNSAs use on the social capital of Saudi postgraduate students was conducted using a multidimensional construct of social capital, which included the components of life satisfaction, social trust, civic participation, and political engagement. Data were collected through surveys and interviews involving 313 male and 293 female postgraduate students from Umm Al-Qura University (UQU) in Makkah. Findings show that male and female participants perceived SNSAs use impacting all components of social capital at a moderate and mainly positive level. Correlational analysis demonstrated medium to large positive correlations among components of social capital. Gender differences were not evident in the life satisfaction and social trust components; however, females reported more involvement with SNSAs for the purposes of political engagement while males reported more use for civic participation, which is an interesting finding, in light of the norms and traditional culture of Saudi society.

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Data Lake Ecosystem Workflow

10.21079/11681/40203 ◽

2021 ◽

Author(s):

R. Salter ◽

Quyen Dong ◽

Cody Coleman ◽

Maria Seale ◽

Alicia Ruvinsky ◽

...

Keyword(s):

Big Data ◽

Language Processing ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Lake Ecosystem ◽

Data Governance ◽

Government Organizations ◽

Large Scale Data ◽

Scale Data

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

Download Full-text

A Systematic Analysis of Big Image Data Methodologies in Various Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2307.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 483-487

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Scale ◽

Image Data ◽

Computational Time ◽

Process Data ◽

Systematic Analysis ◽

Large Scale Data ◽

Learning Techniques ◽

Effective Performance

Big data is large-scale data collected for knowledge discovery, it has been widely used in various applications. Big data often has image data from the various applications and requires effective technique to process data. In this paper, survey has been done in the big image data researches to analysis the effective performance of the methods. Deep learning techniques provides the effective performance compared to other methods included wavelet based methods. The deep learning techniques has the problem of requiring more computational time, and this can be overcome by lightweight methods.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Web Services ◽

10.4018/978-1-5225-7501-6.ch052 ◽

2019 ◽

pp. 953-978

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

New Frontiers for E-Learning in Education

Optimizing Student Engagement in Online Learning Environments - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-3634-5.ch010 ◽

2018 ◽

pp. 220-240

Author(s):

Mohammad Zubair Khan ◽

Yasser M. Alginahi

Keyword(s):

Big Data ◽

Large Scale ◽

Data Repositories ◽

Useful Knowledge ◽

Leading Role ◽

Large Scale Data ◽

E Learning ◽

Base Management ◽

Wide Group ◽

Scale Data

Big Data research is playing a leading role in investigating a wide group of issues fundamentally emerging concerning Database, Data Warehousing, and Data Mining research. Analytics research is intended to develop complex procedures running over large-scale data repositories with the objective of extracting useful knowledge hidden in such repositories. A standout amongst the most noteworthy application situations where Big Data emerge is, without uncertainty, logical figuring. Here, researchers and analysts create immense measures of information everyday by means of investigations (e.g., disciplines like high vitality material science, space science, bioinformatics, etc.). Nevertheless, separating helpful learning for basic leadership purposes from these enormous, vast scale data repositories are practically inconceivable for genuine Data Base Management Systems (DBMS), is inspired investigation tools.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch002 ◽

2017 ◽

pp. 14-40 ◽

Cited By ~ 2

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Download Full-text

Neural Network for Big Data Sets

10.4018/978-1-6684-2408-7.ch003 ◽

2022 ◽

pp. 41-67

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).

Download Full-text