large scale data processing Recently Published Documents

And Performance ◽

Abstract: MPC (multi-party computation) is a comprehensive cryptographic concept that can be used to do computations while maintaining anonymity. MPC allows a group of people to work together on a function without revealing the plaintext's true input or output. Privacy-preserving voting, arithmetic calculation, and large-scale data processing are just a few of the applications of MPC. Each MPC party can run on a single computing node from a system perspective. Multiple parties' computing nodes could be homogenous or heterogeneous; nevertheless, MPC protocols' distributed workloads are always homogeneous (symmetric). We investigate the system performance of a representative MPC framework and a collection of MPC applications in this paper. On homogeneous and heterogeneous compute nodes, we describe the complete online calculation workflow of a state-of-the-art MPC protocol and examine the fundamental cause of its stall time and performance limitation. Keywords: Cloud Computing, IoT, MPC, Amazon Service, Virtualization.

Thinking with GDPR: A guide to better system design

Information Services & Use ◽

10.3233/isu-210107 ◽

2021 ◽

pp. 1-9

Author(s):

Andrew Cormack

Keyword(s):

Data Processing ◽

System Design ◽

Large Scale ◽

Source Material ◽

General Data Protection Regulation ◽

Large Scale Data ◽

General Data ◽

Design Perspective ◽

Europe’s General Data Protection Regulation (GDPR) has a fearsome reputation as “the law that can fine you €20 million.” But behind that scary slogan lies a text that can be a very helpful guide to designing data processing systems. This paper explores that side of the GDPR: how understanding it can produce more effective - and more trustworthy - systems. Three popular myths often take designers down the wrong track: that GDPR is about stopping processing, is about users, and is about consent. Instead we consider, from a design perspective, the GDPR’s source material, its Principles, and its Lawful Bases for processing. Three examples - from the field of education, but widely applicable - show how “thinking with GDPR” has improved both the effectiveness and safety of large-scale data processing systems.

Scalable analysis of multi-modal biomedical data

GigaScience ◽

10.1093/gigascience/giab058 ◽

2021 ◽

Vol 10 (9) ◽

Cited By ~ 1

Author(s):

Jaclyn Smith ◽

Yao Shi ◽

Michael Benedikt ◽

Milos Nikolic

Keyword(s):

Data Integration ◽

Large Scale ◽

Treatment Options ◽

Complex Data ◽

Biomedical Data ◽

Data Types ◽

Scalable Analysis ◽

Targeted Medicine ◽

The Impact

Abstract Background Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. Solution To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on “flattening” complex data structures, and runs efficiently when alternative approaches are unable to perform at all.

A spark-based parallel distributed posterior decoding algorithm for big data hidden Markov models decoding problem

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i3.pp789-800 ◽

2021 ◽

Vol 10 (3) ◽

pp. 789

Author(s):

Imad Sassi ◽

Samir Anter ◽

Abdelkrim Bekkhoucha

Keyword(s):

Big Data ◽

Hidden Markov Models ◽

Large Scale ◽

Markov Models ◽

Hidden Markov ◽

Machine Learning Algorithms ◽

Decoding Algorithm ◽

Large Scale Data ◽

Mapreduce Paradigm

Hidden Markov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.

Applications of a Stochastic–Fuzzy Approach to Modeling and Optimal Control of Discrete Time Systems by Using Large Scale Data Processing: An advanced Approach

10.9734/bpi/ctmcs/v10/4153f ◽

2021 ◽

pp. 53-61

Author(s):

Anna Walaszek-Babiszewska

Keyword(s):

Optimal Control ◽

Data Processing ◽

Discrete Time ◽

Large Scale ◽

Fuzzy Approach ◽

Large Scale Data ◽

Discrete Time Systems ◽

Time Systems ◽

Sentiment Analysis of Public Opinion Related to Rapid Test Using LDA Method

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i4.3139 ◽

2021 ◽

Vol 5 (4) ◽

pp. 672-679

Author(s):

Viny Gilang Ramadhan ◽

Yuliant Sibaroni

Keyword(s):

Public Opinion ◽

Sentiment Analysis ◽

Large Scale ◽

Rapid Test ◽

Test Results ◽

Indonesian Government ◽

Large Scale Data ◽

Rbf Kernel ◽

In 2020 the world will be shocked by an outbreak of a disease that has developed tremendously. This disease is the Coronavirus. The Indonesian government, in overcoming conducted a Rapid early detection test in the spread of the Coronavirus. The steps of the Indonesian government have received rejection in several areas because people consume hoax news on social media. Indonesians widely use Twitter in conversations about the Coronavirus. Previous research was carried out using large-scale data, which affected the performance of the topic extraction method. The classification used resulted in poor accuracy using LDA to find the probability of topics in existing documents. LDA excels in large-scale data processing and is more consistent in generating the topic proportion value and word probability. Aspect-based sentiment analysis on public opinion regarding the rapid test on Twitter using LDA can determine aspects and public opinion on the rapid test. The test results of this study obtained 7000 tweets, four aspects of the results of topic using LDA, and getting the best accuracy using the RBF kernel by 95%. The sentiment of the Indonesian people towards the Rapid test is positive, with 4,305 sentiments.

Real-Time Structural Monitoring of the Multi-Point Hoisting of a Long-Span Converter Station Steel Structure

Sensors ◽

10.3390/s21144737 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4737

Author(s):

Yunfeng Zhu ◽

Yi Gao ◽

Qinghe Zeng ◽

Jin Liao ◽

Zhen Liu ◽

...

Keyword(s):

Real Time ◽

Monitoring System ◽

Large Scale ◽

Steel Structure ◽

Automatic Monitoring ◽

Structural Monitoring ◽

Monitoring Methods ◽

Long Span ◽

Monitoring Process ◽

Large Scale Data Processing

In the process of using a long-span converter station steel structure, engineering disasters can easily occur. Structural monitoring is an important method to reduce hoisting risk. In previous engineering cases, the structural monitoring of long-span converter station steel structure hoisting is rare. Thus, no relevant hoisting experience can be referenced. Traditional monitoring methods have a small scope of application, making it difficult to coordinate monitoring and construction control. In the monitoring process, many problems arise, such as complicated installation processes, large-scale data processing, and large-scale installation errors. With a real-time structural monitoring system, the mechanical changes in the long-span converter station steel structure during the hoisting process can be monitored in real-time in order to achieve real-time warning of engineering disasters, timely identification of engineering issues, and allow for rapid decision-making, thus avoiding the occurrence of engineering disasters. Based on this concept, automatic monitoring and manual measurement of the mechanical changes in the longest long-span converter station steel structure in the world is carried out, and the monitoring results were compared with the corresponding numerical simulation results in order to develop a real-time structural monitoring system for the whole long-span converter station steel structure’s multi-point lifting process. This approach collects the monitoring data and outputs the deflection, stress, strain, wind force, and temperature of the long-span converter station steel structure in real-time, enabling real-time monitoring to ensure the safety of the lifting process. This research offers a new method and basis for the structural monitoring of the multi-point hoisting of a long-span converter station steel structure.

Variable Expanding Structure for Data Center Interconnection Networks

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0013 ◽

2021 ◽

Vol 25 (1) ◽

pp. 13-22

Author(s):

Jianfei Zhang ◽

◽

Yuchen Jiang ◽

Yan Liu

Keyword(s):

Data Center ◽

Interconnection Networks ◽

High Performance ◽

Large Scale ◽

Data Centers ◽

Interconnection Network ◽

Routing Algorithm ◽

Address Space ◽

Novel Structure ◽

Large Scale Data Processing

Data centers are fundamental facilities that support high-performance computing and large-scale data processing. To guarantee that a data center can provide excellent properties of expanding and routing, the interconnection network of a data center should be designed elaborately. Herein, we propose a novel structure for the interconnection network of data centers that can be expanded with a variable coefficient, also known as a variable expanding structure (VES). A VES is designed in a hierarchical manner and built iteratively. A VES can include hundreds of thousands and millions of servers with only a few layers. Meanwhile, a VES has an extremely short diameter, which implies better performance on routing between every pair of servers. Furthermore, we design an address space for the servers and switches in a VES. In addition, we propose a construction algorithm and routing algorithm associated with the address space. The results and analysis of simulations verify that the expanding rate of a VES depends on three factors: n, m, and k where the n is the number of ports on a switch, the m is the expanding speed and the k is the number of layers. However, the factor m yields the optimal effect. Hence, a VES can be designed with factor m to achieve the expected expanding rate and server scale based on the initial planning objectives.

Scalable Analysis of Multi-Modal Biomedical Data

10.1101/2020.12.14.422781 ◽

2020 ◽

Author(s):

Jaclyn M Smith ◽

Yao Shi ◽

Michael Benedikt ◽

Milos Nikolic

Keyword(s):

Data Integration ◽

Large Scale ◽

Treatment Options ◽

Complex Data ◽

Biomedical Data ◽

Data Types ◽

Scalable Analysis ◽

Targeted Medicine ◽

The Impact

Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on flattening complex data structures, and runs efficiently when alternative approaches are unable to perform at all.

PENGELOMPOKAN PERSENTASE BUTA HURUF UMUR 15-44 MENURUT PROVINSI MENGGUNAKAN ALGORITMA K-MEANS

KLIK - KUMPULAN JURNAL ILMU KOMPUTER ◽

10.20527/klik.v7i3.329 ◽

2020 ◽

Vol 7 (3) ◽

pp. 230

Author(s):

Saifullah Saifullah ◽

Nani Hidayati

Keyword(s):

Data Mining ◽

Human Resources ◽

Data Clustering ◽

Large Scale ◽

Market Basket ◽

Large Scale Data ◽

The Government ◽

Data Mining is a method that is often needed in large-scale data processing, so data mining has important access to the fields of life including industry, finance, weather, science and technology. In data mining techniques there are methods that can be used, namely classification, clustering, regression, variable selection, and market basket analysis. Illiteracy is one of the factors that hinder the quality of human resources. One of the basic things that must be fulfilled to improve the quality of human resources is the eradication of illiteracy among the community. The purpose of this study is to determine the clustering of illiterate communities based on provinces in Indonesia. The results of the study are illiterate data clustering according to the age proportion of 15-44 namely 1 high group node, low group has 27 nodes, and medium group 6 nodes. The results of this study become input for the government to determine illiteracy eradication policies in Indonesia based on provinces.Kata Kunci: Illiterate, Data mining, K-Means ClusteringData Mining termasuk metode yang sering dibutuhkan dalam pengolahan data berskala besar, maka data mining mempunyai akses penting pada bidang kehidupan diantaranya yaitu bidang industri, bidang keuangan, cuaca, ilmu dan teknologi. Pada teknik data mining terdapat metode-metode yang dapat digunakan yaitu klasifikasi, clustering, regresi, seleksi variabel, dan market basket analisis. Buta huruf merupakan salah satu faktor yang menghambat kualitas sumber daya manusia. Salah satu hal mendasar yang harus dipenuhi untuk meningkatkan kualitas sumber daya manusia adalah pemberantasan buta huruf di kalangan masyarakat Adapun tujuan penelitian ini adalah menetukan clustering masyarakat buta huruf berdasarkan propinsi di Indonesia. Hasil dari penelitian adalah data clustering buta huruf menurut propisi umur 15-44 yaitu 1 node kelompok tinggi, kelompok rendah memiliki 27 node, dan kelompok sedang 6 node. Hasil penelitian ini menjadi bahan masukan kepada pemerintah untuk menentukan kebijakan pemberantasan buta huruf di Indonesia berdasarakn propinsi.Kata Kunci: Buta Huruf, Data mining, K-Means Clustering