Parallel Computation of Rough Set Approximations in Information Systems with Missing Decision Data

The paper discusses the use of parallel computation to obtain rough set approximations from large-scale information systems where missing data exist in both condition and decision attributes. To date, many studies have focused on missing condition data, but very few have accounted for missing decision data, especially in enlarging datasets. One of the approaches for dealing with missing data in condition attributes is named twofold rough approximations. The paper aims to extend the approach to deal with missing data in the decision attribute. In addition, computing twofold rough approximations is very intensive, thus the approach is not suitable when input datasets are large. We propose parallel algorithms to compute twofold rough approximations in large-scale datasets. Our method is based on MapReduce, a distributed programming model for processing large-scale data. We introduce the original sequential algorithm first and then the parallel version is introduced. Comparison between the two approaches through experiments shows that our proposed parallel algorithms are suitable for and perform efficiently on large-scale datasets that have missing data in condition and decision attributes.

Download Full-text

Optimization of large-scale water transfer networks: Conic integer programming model and distributed parallel algorithms

AIChE Journal ◽

10.1002/aic.15505 ◽

2016 ◽

Vol 63 (5) ◽

pp. 1566-1581 ◽

Cited By ~ 2

Author(s):

Li-Juan Li ◽

Rui-Jie Zhou

Keyword(s):

Integer Programming ◽

Parallel Algorithms ◽

Large Scale ◽

Programming Model ◽

Water Transfer ◽

Integer Programming Model

Download Full-text

A Parallel Apriori-Transaction Reduction Algorithm Using Hadoop-Mapreduce in Cloud

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2018/v1i124719 ◽

2018 ◽

pp. 1-24

Author(s):

A. L. Sayeth Saabith ◽

Elankovan Sundararajan ◽

Azuraliza Abu Bakar

Keyword(s):

Parallel Algorithms ◽

Execution Time ◽

Large Scale ◽

Programming Model ◽

Main Memory ◽

Frequent Itemset ◽

Massive Data ◽

Apriori Algorithm ◽

Hadoop Mapreduce ◽

High Level

Apriori algorithm is a classical algorithm of association rule mining and widely used for generating frequent item sets. However, the original Apriori algorithm has some limitation such as it needs to scan the dataset many times to discover all frequent itemset and generate huge number of candidate itemset. To overcome these limitations, researchers have made a lot of improvements to the Apriori such as candidate generation, without candidate generation, transaction reduction, partitioning, and sampling. When it comes to mine massive data, these algorithms failed to prove efficiency because limitation of the processing capacity, storage capacity, and main memory constraints. Therefore, parallel and distributed algorithms are developed to perform large-scale computing in ARM on multiple processors. However, the problems with most of the parallel and distributed framework are overheads of managing distributed system, lack of high level parallel programming language, and node failures. Hadoop-MapReduce is an efficient, scalable, and simplified programming model for massive data processing and it also available on cloud environment. Cloud computing offers huge computing resources, and capacities to solve big data challenges. Recently many parallel algorithms have been proposed on Hadoop-MapReduce to enhance the performance of Apriori algorithm but there are some drawbacks: since multiple scan over the dataset is needed to generate candidate itemset, it consume more execution time. The aim of this study is to propose a parallel Transaction Reduction MapReduce Apriori algorithm (TRMR-Apriori) which is reduce unnecessary transaction values and transactions from the dataset in parallel manner to overcome above problems. The experiments show that TRMR-Apriori is able to achieve better execution time to discover frequent itemset those of previous sequential ARM algorithms such as Apriori, AprioriTid, Eclat, and FP-Growth and the previous parallel algorithms such as PApriori, MRApriori, and Modified Apriori with different condition on homogeneous computing environment using Hadoop-MapReduce platform in cloud. Overall, the TRMR-Apriori shows the strength to extract the frequent itemset from massive dataset in cloud.

Download Full-text

A Bridge Too Far?

10.1093/oso/9780198733249.003.0006 ◽

2018 ◽

Author(s):

Charlotte P. Lee ◽

Kjeld Schmidt

Keyword(s):

Information Systems ◽

Science And Technology Studies ◽

Large Scale ◽

Cooperative Work ◽

Computer Supported Cooperative Work ◽

Practical Action ◽

Systems Research ◽

Information Infrastructures ◽

Information Systems Research ◽

Critical Investigation

The study of computing infrastructures has grown significantly due to the rapid proliferation and ubiquity of large-scale IT-based installations. At the same time, recognition has also grown of the usefulness of such studies as a means for understanding computing infrastructures as material complements of practical action. Subsequently the concept of “infrastructure” (or “information infrastructures,” “cyberinfrastructures,” and “infrastructuring”) has gained increasing importance in the area of Computer-Supported Cooperative Work (CSCW) as well as in neighboring areas such as Information Systems research (IS) and Science and Technology Studies (STS). However, as such studies have unfolded, the very concept of “infrastructure” is being applied in different discourses, for different purposes, in myriad different senses. Consequently, the concept of “infrastructure” has become increasingly muddled and needs clarification. The chapter presents a critical investigation of the vicissitudes of the concept of “infrastructure” over the last 35 years.

Download Full-text

Multiple Granulation Rough Set Approach to Interval-Valued Intuitionistic Fuzzy Ordered Information Systems

Symmetry ◽

10.3390/sym13060949 ◽

2021 ◽

Vol 13 (6) ◽

pp. 949

Author(s):

Zhen Li ◽

Xiaoyan Zhang

Keyword(s):

Information Theory ◽

Pattern Recognition ◽

Information System ◽

Information Systems ◽

Rough Set ◽

Fuzzy Set ◽

Equivalence Relation ◽

Target Concept ◽

Actual Case ◽

Interval Valued

As a further extension of the fuzzy set and the intuitive fuzzy set, the interval-valued intuitive fuzzy set (IIFS) is a more effective tool to deal with uncertain problems. However, the classical rough set is based on the equivalence relation, which do not apply to the IIFS. In this paper, we combine the IIFS with the ordered information system to obtain the interval-valued intuitive fuzzy ordered information system (IIFOIS). On this basis, three types of multiple granulation rough set models based on the dominance relation are established to effectively overcome the limitation mentioned above, which belongs to the interdisciplinary subject of information theory in mathematics and pattern recognition. First, for an IIFOIS, we put forward a multiple granulation rough set (MGRS) model from two completely symmetry positions, which are optimistic and pessimistic, respectively. Furthermore, we discuss the approximation representation and a few essential characteristics for the target concept, besides several significant rough measures about two kinds of MGRS symmetry models are discussed. Furthermore, a more general MGRS model named the generalized MGRS (GMGRS) model is proposed in an IIFOIS, and some important properties and rough measures are also investigated. Finally, the relationships and differences between the single granulation rough set and the three types of MGRS are discussed carefully by comparing the rough measures between them in an IIFOIS. In order to better utilize the theory to realistic problems, an actual case shows the methods of MGRS models in an IIFOIS is given in this paper.

Download Full-text

Unpacking the semantics of source and usage to perform semantic reconciliation in large-scale information systems

ACM SIGMOD Record ◽

10.1145/309844.309878 ◽

1999 ◽

Vol 28 (1) ◽

pp. 26-31 ◽

Cited By ~ 14

Author(s):

Ken Smith ◽

Leo Obrst

Keyword(s):

Information Systems ◽

Large Scale

Download Full-text

Kernel weighted least square approach for imputing missing values of metabolomics data

Scientific Reports ◽

10.1038/s41598-021-90654-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Masahiro Sugimoto

Keyword(s):

Missing Data ◽

Large Scale ◽

Missing Values ◽

Kernel Weight ◽

Least Square ◽

Data Matrix ◽

Data Imputation ◽

Metabolomics Data ◽

Missing Value ◽

Missing Data Imputation

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.

Download Full-text

A robust-heuristic optimization approach to a green supply chain design with consideration of assorted vehicle types and carbon policies under uncertainty

Annals of Operations Research ◽

10.1007/s10479-021-03985-6 ◽

2021 ◽

Author(s):

Zahra Homayouni ◽

Mir Saman Pishvaee ◽

Hamed Jahani ◽

Dmitry Ivanov

Keyword(s):

Supply Chains ◽

Large Scale ◽

Programming Model ◽

Supply Chain Design ◽

Heuristic Optimization ◽

Optimization Approach ◽

Economic Uncertainty ◽

Cap And Trade ◽

Sustainable Logistics ◽

Goal Programming Model

AbstractAdoption of carbon regulation mechanisms facilitates an evolution toward green and sustainable supply chains followed by an increased complexity. Through the development and usage of a multi-choice goal programming model solved by an improved algorithm, this article investigates sustainability strategies for carbon regulations mechanisms. We first propose a sustainable logistics model that considers assorted vehicle types and gas emissions involved with product transportation. We then construct a bi-objective model that minimizes total cost as the first objective function and follows environmental considerations in the second one. With our novel robust-heuristic optimization approach, we seek to support the decision-makers in comparison and selection of carbon emission policies in supply chains in complex settings with assorted vehicle types, demand and economic uncertainty. We deploy our model in a case-study to evaluate and analyse two carbon reduction policies, i.e., carbon-tax and cap-and-trade policies. The results demonstrate that our robust-heuristic methodology can efficiently deal with demand and economic uncertainty, especially in large-scale problems. Our findings suggest that governmental incentives for a cap-and-trade policy would be more effective for supply chains in lowering pollution by investing in cleaner technologies and adopting greener practices.

Download Full-text

Reliable Dissemination For Large-Scale Wide-Area Information Systems

10.1109/hpcs.1995.662014 ◽

2005 ◽

Author(s):

R. Yavatkar ◽

J. Griffioen

Keyword(s):

Information Systems ◽

Large Scale ◽

Wide Area

Download Full-text

Solution of the mixed integer large scale unit commitment problem by means of a continuous Stochastic linear programming model

Energy Systems ◽

10.1007/s12667-013-0107-z ◽

2013 ◽

Vol 5 (2) ◽

pp. 269-284 ◽

Cited By ~ 10

Author(s):

D. Siface ◽

M. T. Vespucci ◽

A. Gelmini

Keyword(s):

Linear Programming ◽

Large Scale ◽

Unit Commitment ◽

Programming Model ◽

Linear Programming Model ◽

Mixed Integer ◽

Stochastic Linear Programming ◽

Unit Commitment Problem ◽

Scale Unit ◽

Commitment Problem

Download Full-text

Research on RS-WNN Integrated Modeling

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.105-107.2169 ◽

2011 ◽

Vol 105-107 ◽

pp. 2169-2173

Author(s):

Zong Chang Xu ◽

Xue Qin Tang ◽

Shu Feng Huang

Keyword(s):

Neural Network ◽

Rough Set ◽

Attribute Reduction ◽

Wavelet Neural Network ◽

Integrated Modeling ◽

Generalization Ability ◽

Decision Attributes ◽

Optimum Decision ◽

Structure Complexity ◽

Modeling Algorithm

Wavelet Neural Network (WNN) integration modeling based on Rough Set (RS) is studied. An integration modeling algorithm named RS-WNN, which first introduces a heuristic attribute reduction recursion algorithm to determine the optimum decision attributes and then conducts WNN modeling, is proposed. This method is adopted to more effectively eliminate the redundant attributes, lower the structure complexity of WNN, which reduce the time of training and improve the generalization ability of WNN. The result of the experiment shows this method is superior and efficient.

Download Full-text