large scale data processing
Recently Published Documents


TOTAL DOCUMENTS

95
(FIVE YEARS 26)

H-INDEX

8
(FIVE YEARS 2)

Author(s):  
Surabhi Kumari

Abstract: MPC (multi-party computation) is a comprehensive cryptographic concept that can be used to do computations while maintaining anonymity. MPC allows a group of people to work together on a function without revealing the plaintext's true input or output. Privacy-preserving voting, arithmetic calculation, and large-scale data processing are just a few of the applications of MPC. Each MPC party can run on a single computing node from a system perspective. Multiple parties' computing nodes could be homogenous or heterogeneous; nevertheless, MPC protocols' distributed workloads are always homogeneous (symmetric). We investigate the system performance of a representative MPC framework and a collection of MPC applications in this paper. On homogeneous and heterogeneous compute nodes, we describe the complete online calculation workflow of a state-of-the-art MPC protocol and examine the fundamental cause of its stall time and performance limitation. Keywords: Cloud Computing, IoT, MPC, Amazon Service, Virtualization.


2021 ◽  
pp. 1-9
Author(s):  
Andrew Cormack

Europe’s General Data Protection Regulation (GDPR) has a fearsome reputation as “the law that can fine you €20 million.” But behind that scary slogan lies a text that can be a very helpful guide to designing data processing systems. This paper explores that side of the GDPR: how understanding it can produce more effective - and more trustworthy - systems. Three popular myths often take designers down the wrong track: that GDPR is about stopping processing, is about users, and is about consent. Instead we consider, from a design perspective, the GDPR’s source material, its Principles, and its Lawful Bases for processing. Three examples - from the field of education, but widely applicable - show how “thinking with GDPR” has improved both the effectiveness and safety of large-scale data processing systems.


GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Jaclyn Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Abstract Background Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. Solution To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on “flattening” complex data structures, and runs efficiently when alternative approaches are unable to perform at all.


Author(s):  
Imad Sassi ◽  
Samir Anter ◽  
Abdelkrim Bekkhoucha

<span lang="EN-US">Hidden </span><span lang="IN">M</span><span lang="EN-US">arkov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.</span>


2021 ◽  
Vol 5 (4) ◽  
pp. 672-679
Author(s):  
Viny Gilang Ramadhan ◽  
Yuliant Sibaroni

In 2020 the world will be shocked by an outbreak of a disease that has developed tremendously. This disease is the Coronavirus. The Indonesian government, in overcoming conducted a Rapid early detection test in the spread of the Coronavirus. The steps of the Indonesian government have received rejection in several areas because people consume hoax news on social media. Indonesians widely use Twitter in conversations about the Coronavirus. Previous research was carried out using large-scale data, which affected the performance of the topic extraction method. The classification used resulted in poor accuracy using LDA to find the probability of topics in existing documents. LDA excels in large-scale data processing and is more consistent in generating the topic proportion value and word probability. Aspect-based sentiment analysis on public opinion regarding the rapid test on Twitter using LDA can determine aspects and public opinion on the rapid test. The test results of this study obtained 7000 tweets, four aspects of the results of topic using LDA, and getting the best accuracy using the RBF kernel by 95%. The sentiment of the Indonesian people towards the Rapid test is positive, with 4,305 sentiments.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4737
Author(s):  
Yunfeng Zhu ◽  
Yi Gao ◽  
Qinghe Zeng ◽  
Jin Liao ◽  
Zhen Liu ◽  
...  

In the process of using a long-span converter station steel structure, engineering disasters can easily occur. Structural monitoring is an important method to reduce hoisting risk. In previous engineering cases, the structural monitoring of long-span converter station steel structure hoisting is rare. Thus, no relevant hoisting experience can be referenced. Traditional monitoring methods have a small scope of application, making it difficult to coordinate monitoring and construction control. In the monitoring process, many problems arise, such as complicated installation processes, large-scale data processing, and large-scale installation errors. With a real-time structural monitoring system, the mechanical changes in the long-span converter station steel structure during the hoisting process can be monitored in real-time in order to achieve real-time warning of engineering disasters, timely identification of engineering issues, and allow for rapid decision-making, thus avoiding the occurrence of engineering disasters. Based on this concept, automatic monitoring and manual measurement of the mechanical changes in the longest long-span converter station steel structure in the world is carried out, and the monitoring results were compared with the corresponding numerical simulation results in order to develop a real-time structural monitoring system for the whole long-span converter station steel structure’s multi-point lifting process. This approach collects the monitoring data and outputs the deflection, stress, strain, wind force, and temperature of the long-span converter station steel structure in real-time, enabling real-time monitoring to ensure the safety of the lifting process. This research offers a new method and basis for the structural monitoring of the multi-point hoisting of a long-span converter station steel structure.


Author(s):  
Jianfei Zhang ◽  
◽  
Yuchen Jiang ◽  
Yan Liu

Data centers are fundamental facilities that support high-performance computing and large-scale data processing. To guarantee that a data center can provide excellent properties of expanding and routing, the interconnection network of a data center should be designed elaborately. Herein, we propose a novel structure for the interconnection network of data centers that can be expanded with a variable coefficient, also known as a variable expanding structure (VES). A VES is designed in a hierarchical manner and built iteratively. A VES can include hundreds of thousands and millions of servers with only a few layers. Meanwhile, a VES has an extremely short diameter, which implies better performance on routing between every pair of servers. Furthermore, we design an address space for the servers and switches in a VES. In addition, we propose a construction algorithm and routing algorithm associated with the address space. The results and analysis of simulations verify that the expanding rate of a VES depends on three factors: n, m, and k where the n is the number of ports on a switch, the m is the expanding speed and the k is the number of layers. However, the factor m yields the optimal effect. Hence, a VES can be designed with factor m to achieve the expected expanding rate and server scale based on the initial planning objectives.


2020 ◽  
Author(s):  
Jaclyn M Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on flattening complex data structures, and runs efficiently when alternative approaches are unable to perform at all.


2020 ◽  
Vol 7 (3) ◽  
pp. 230
Author(s):  
Saifullah Saifullah ◽  
Nani Hidayati

<p><em>Data Mining is a method that is often needed in large-scale data processing, so data mining has important access to the fields of life including industry, finance, weather, science and technology. In data mining techniques there are methods that can be used, namely classification, clustering, regression, variable selection, and market basket analysis. Illiteracy is one of the factors that hinder the quality of human resources. One of the basic things that must be fulfilled to improve the quality of human resources is the eradication of illiteracy among the community. The purpose of this study is to determine the clustering of illiterate communities based on provinces in Indonesia. The results of the study are illiterate data clustering according to the age proportion of 15-44 namely 1 high group node, low group has 27 nodes, and medium group 6 nodes. The results of this study become input for the government to determine illiteracy eradication policies in Indonesia based on provinces.</em></p><p><strong>Kata Kunci</strong>: <em>Illiterate</em><em>, Data mining, K-Means Clustering</em></p><p><em>Data Mining termasuk metode yang sering dibutuhkan dalam pengolahan data berskala besar, maka data mining mempunyai akses penting pada bidang kehidupan diantaranya yaitu bidang industri, bidang keuangan, cuaca, ilmu dan teknologi. Pada teknik data mining terdapat metode-metode yang dapat digunakan yaitu klasifikasi, clustering, regresi, seleksi variabel, dan market basket analisis. Buta huruf merupakan salah satu faktor yang menghambat kualitas sumber daya manusia. Salah satu hal mendasar yang harus dipenuhi untuk meningkatkan kualitas sumber daya manusia adalah pemberantasan buta huruf di kalangan masyarakat</em><em> </em><em>Adapun tujuan penelitian ini adalah menetukan clustering masyarakat buta huruf</em><em> berdasarkan propinsi di Indonesia</em><em>.</em><em> </em><em>Hasil dari penelitian adalah data clustering buta huruf menurut propisi umur 15-44 yaitu</em><em> 1 node</em><em> kelompok tinggi</em><em>,  kelompok rendah memiliki 27 node</em><em>, dan kelompok  sedang  6 node. Ha</em><em>sil penelitian ini menjadi bahan masukan kepada pemerintah untuk menentukan kebijakan</em><em> </em><em>pemberantasan buta huruf di Indonesia berdasarakn propinsi</em><em>.</em></p><p><strong>Kata Kunci</strong>: Buta Huruf, Data mining, <em>K-Means Clustering</em><em></em></p>


Sign in / Sign up

Export Citation Format

Share Document