distributed processing
Recently Published Documents


TOTAL DOCUMENTS

1422
(FIVE YEARS 271)

H-INDEX

46
(FIVE YEARS 7)

2022 ◽  
Vol 25 (1) ◽  
pp. 1-25
Author(s):  
Sibghat Ullah Bazai ◽  
Julian Jang-Jaccard ◽  
Hooman Alavizadeh

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.


In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.


Author(s):  
Yihao Tian

Data management is an administrative mechanism that involves the acquisitions, validations, storage, protection, and processing of data needed by its users to ensure that data are accessible, reliable, and timely. It is a challenging task to manage protections for information properties. With the emphasis on distributed systems and Internet-accessible systems, the need for efficient information security management is increasingly important. In the paper, artificial intelligence-assisted dynamic modeling (AI-DM) is used for data management in a distributed system. Distributed processing is an effective way to enhance the efficiency of database systems. Therefore, each distributed database structure’s functionality depends significantly on its proper architecture in implementing fragmentation, allocation, and replication processes. The proposed model is a dynamically distributed internet database architecture. This suggested model enables complex decision-making on fragmentation, distribution, and duplication. It provides users with links from anywhere to the distributed database. AI-DM has an improved allocation and replication strategy where no query performance information is accessible at the initial stage of the distributed database design. AI-DM findings show that the proposed database model leads to the reliability and efficiency of the enhanced system. The final results are obtained by analyzing the dynamic modeling ratio is 87.6%, increasing decision support ratio is 88.7%, the logistic regression ratio is 84.5%, the data reliability ratio is 82.2%, and the system ratio is 93.8%.


2021 ◽  
Vol 12 (1) ◽  
pp. 122
Author(s):  
Jongtae Lim ◽  
Byounghoon Kim ◽  
Hyeonbyeong Lee ◽  
Dojin Choi ◽  
Kyoungsoo Bok ◽  
...  

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.


2021 ◽  
Author(s):  
Kyuhyun Choi ◽  
Eugenio Piasini ◽  
Luigim Cifuentes Vargas ◽  
Nathan T Henderson ◽  
Edgar Diaz Hernandez ◽  
...  

Fronto-striatal circuits have been extensively implicated in the cognitive control of behavioral output for both social and appetitive rewards. The functional diversity of prefrontal cortical populations is strongly dependent on their synaptic targets, with control of motor output strongly mediated by connectivity to the dorsal striatum. Despite evidence for functional diversity along the anterior-posterior axis of the dorsomedial striatum (DMS), it is unclear how distinct fronto-striatal sub-circuits support neural computations essential for action selection. Here we identify prefrontal populations targeting distinct DMS subregions and characterize their functional roles. We first performed neural circuit tracing to reveal segregated prefrontal populations defined by anterior/posterior dorsomedial striatal target. We then probed the functional relevance of these parallel circuits via in vivo calcium imaging and temporally precise causal manipulations during a feedback-based 2-alternative choice task. Single-photon imaging revealed circuit-specific representations of task-relevant information with prelimbic neurons targeting anterior DMS (PL::A-DMS) uniquely encoded choices and responses to negative outcomes, while prelimbic neurons targeting posterior DMS (PL::P-DMS) encoded internal representations of value and positive outcomes contingent on prior choice. Consistent with this distributed coding, optogenetic inhibition of PL::A-DMS circuits strongly impacted choice monitoring and behavioral control in response to negative outcomes while perturbation of PL::P-DMS signals impaired task engagement and strategies following positive outcomes. Di-synaptic retrograde tracing uncovered differences in afferent connectivity that may underlie these pathways functional divergence. Together our data uncover novel PL populations engaged in distributed processing for action control.


2021 ◽  
Vol 4 ◽  
Author(s):  
Toon Albers ◽  
Elena Lazovik ◽  
Mostafa Hadadian Nejad Yousefi ◽  
Alexander Lazovik

Distributed data processing systems have become the standard means for big data analytics. These systems are based on processing pipelines where operations on data are performed in a chain of consecutive steps. Normally, the operations performed by these pipelines are set at design time, and any changes to their functionality require the applications to be restarted. This is not always acceptable, for example, when we cannot afford downtime or when a long-running calculation would lose significant progress. The introduction of variation points to distributed processing pipelines allows for on-the-fly updating of individual analysis steps. In this paper, we extend such basic variation point functionality to provide fully automated reconfiguration of the processing steps within a running pipeline through an automated planner. We have enabled pipeline modeling through constraints. Based on these constraints, we not only ensure that configurations are compatible with type but also verify that expected pipeline functionality is achieved. Furthermore, automating the reconfiguration process simplifies its use, in turn allowing users with less development experience to make changes. The system can automatically generate and validate pipeline configurations that achieve a specified goal, selecting from operation definitions available at planning time. It then automatically integrates these configurations into the running pipeline. We verify the system through the testing of a proof-of-concept implementation. The proof of concept also shows promising results when reconfiguration is performed frequently.


2021 ◽  
Author(s):  
◽  
Seyed Reza Mir Alavi

<p>Communication is performed by transmitting signals through a medium. It is common that signals originating from different sources are mixed in the transport medium. The operation of separating source signals without prior information about the sources is referred to as blind source separation (BSS). Blind source separation for wireless sensor networks has recently received attention because of low cost and the easy coverage of large areas. Distributed processing is attractive as it is scalable and consumes low power. Existing distributed BSS algorithms either require a fully connected pattern of connectivity, to ensure the good performance, or require a high computational load at each sensor node, to enhance the scalability. This motivates us to develop distributed BSS algorithms that can be implemented over any arbitrary graph with fully shared computations and with good performance.  This thesis presents three studies on distributed algorithms. The first two studies are on existing distributed algorithms that are used in linearly constrained convex optimization problems, which are common in signal processing and machine learning. The studies are aimed at improving the algorithms in terms of computational complexity, communication cost, processors coordination and scalability. This makes them more suitable for implementation on sensor networks, thus forming a basis for the development of distributed BSS algorithms on sensor networks in our third study.  In the first study, we consider constrained problems in which the constraint includes a weighted sum of all the decision variables. By formulating a constrained dual problem associated to the original constrained problem, we were able to develop a distributed algorithm that can be run both synchronously and asynchronously on any arbitrary graph with lower communication cost than traditional distributed algorithms.  In the second study, we consider constrained problems in which the constraint is separable. By making use of the augmented Lagrangian function and splitting the dual variable (Lagrange multiplier) associated to each partial constraint, we were able to develop a distributed fully asynchronous algorithm with lower computational complexity than traditional distributed algorithms. The simplicity of the algorithm is the consequence of approximating the constraint on the equality of the decoupled dual variables. We also provide a measure of the inaccuracy in such an approximation on the optimal value of the primal objective function. Finally, in the third study, we investigate distributed processing solutions for BSS on sensor networks. We propose two distributed processing schemes for BSS that we refer to as scheme 1 and scheme 2. In scheme 1, each sensor node estimates one specific source signal while in scheme 2, by formulating a consensus optimization problem, each sensor node estimates all source signals in a fully shared computation manner. Our proposed algorithms carry the following features: low computational complexity, low power consumption, low data transmission rate, scalability and excellent performance over arbitrary graphs. Although all of our proposed algorithms share the aforementioned properties, each of them is superior in one or some of the features compared to the others. Comparative experimental results show that among all our proposed distributed BSS algorithms, a variant of scheme 1 performs best when all features are considered. This is achieved by making use of the concept of pairwise mutual information along with adding a sparsity assumption on the parameters of the model that is used in BSS.</p>


2021 ◽  
Author(s):  
◽  
Seyed Reza Mir Alavi

<p>Communication is performed by transmitting signals through a medium. It is common that signals originating from different sources are mixed in the transport medium. The operation of separating source signals without prior information about the sources is referred to as blind source separation (BSS). Blind source separation for wireless sensor networks has recently received attention because of low cost and the easy coverage of large areas. Distributed processing is attractive as it is scalable and consumes low power. Existing distributed BSS algorithms either require a fully connected pattern of connectivity, to ensure the good performance, or require a high computational load at each sensor node, to enhance the scalability. This motivates us to develop distributed BSS algorithms that can be implemented over any arbitrary graph with fully shared computations and with good performance.  This thesis presents three studies on distributed algorithms. The first two studies are on existing distributed algorithms that are used in linearly constrained convex optimization problems, which are common in signal processing and machine learning. The studies are aimed at improving the algorithms in terms of computational complexity, communication cost, processors coordination and scalability. This makes them more suitable for implementation on sensor networks, thus forming a basis for the development of distributed BSS algorithms on sensor networks in our third study.  In the first study, we consider constrained problems in which the constraint includes a weighted sum of all the decision variables. By formulating a constrained dual problem associated to the original constrained problem, we were able to develop a distributed algorithm that can be run both synchronously and asynchronously on any arbitrary graph with lower communication cost than traditional distributed algorithms.  In the second study, we consider constrained problems in which the constraint is separable. By making use of the augmented Lagrangian function and splitting the dual variable (Lagrange multiplier) associated to each partial constraint, we were able to develop a distributed fully asynchronous algorithm with lower computational complexity than traditional distributed algorithms. The simplicity of the algorithm is the consequence of approximating the constraint on the equality of the decoupled dual variables. We also provide a measure of the inaccuracy in such an approximation on the optimal value of the primal objective function. Finally, in the third study, we investigate distributed processing solutions for BSS on sensor networks. We propose two distributed processing schemes for BSS that we refer to as scheme 1 and scheme 2. In scheme 1, each sensor node estimates one specific source signal while in scheme 2, by formulating a consensus optimization problem, each sensor node estimates all source signals in a fully shared computation manner. Our proposed algorithms carry the following features: low computational complexity, low power consumption, low data transmission rate, scalability and excellent performance over arbitrary graphs. Although all of our proposed algorithms share the aforementioned properties, each of them is superior in one or some of the features compared to the others. Comparative experimental results show that among all our proposed distributed BSS algorithms, a variant of scheme 1 performs best when all features are considered. This is achieved by making use of the concept of pairwise mutual information along with adding a sparsity assumption on the parameters of the model that is used in BSS.</p>


Sign in / Sign up

Export Citation Format

Share Document