scholarly journals Parallel Computation of Rough Set Approximations in Information Systems with Missing Decision Data

Computers ◽  
2018 ◽  
Vol 7 (3) ◽  
pp. 44 ◽  
Author(s):  
Thinh Cao ◽  
Koichi Yamada ◽  
Muneyuki Unehara ◽  
Izumi Suzuki ◽  
Do Nguyen

The paper discusses the use of parallel computation to obtain rough set approximations from large-scale information systems where missing data exist in both condition and decision attributes. To date, many studies have focused on missing condition data, but very few have accounted for missing decision data, especially in enlarging datasets. One of the approaches for dealing with missing data in condition attributes is named twofold rough approximations. The paper aims to extend the approach to deal with missing data in the decision attribute. In addition, computing twofold rough approximations is very intensive, thus the approach is not suitable when input datasets are large. We propose parallel algorithms to compute twofold rough approximations in large-scale datasets. Our method is based on MapReduce, a distributed programming model for processing large-scale data. We introduce the original sequential algorithm first and then the parallel version is introduced. Comparison between the two approaches through experiments shows that our proposed parallel algorithms are suitable for and perform efficiently on large-scale datasets that have missing data in condition and decision attributes.

Author(s):  
A. L. Sayeth Saabith ◽  
Elankovan Sundararajan ◽  
Azuraliza Abu Bakar

Apriori algorithm is a classical algorithm of association rule mining and widely used for generating frequent item sets. However, the original Apriori algorithm has some limitation such as it needs to scan the dataset many times to discover all frequent itemset and generate huge number of candidate itemset. To overcome these limitations, researchers have made a lot of improvements to the Apriori such as candidate generation, without candidate generation, transaction reduction, partitioning, and sampling. When it comes to mine massive data, these algorithms failed to prove efficiency because limitation of the processing capacity, storage capacity, and main memory constraints. Therefore, parallel and distributed algorithms are developed to perform large-scale computing in ARM on multiple processors. However, the problems with most of the parallel and distributed framework are overheads of managing distributed system, lack of high level parallel programming language, and node failures. Hadoop-MapReduce is an efficient, scalable, and simplified programming model for massive data processing and it also available on cloud environment. Cloud computing offers huge computing resources, and capacities to solve big data challenges. Recently many parallel algorithms have been proposed on Hadoop-MapReduce to enhance the performance of Apriori algorithm but there are some drawbacks: since multiple scan over the dataset is needed to generate candidate itemset, it consume more execution time. The aim of this study is to propose a parallel Transaction Reduction MapReduce Apriori algorithm (TRMR-Apriori) which is reduce unnecessary transaction values and transactions from the dataset in parallel manner to overcome above problems. The experiments show that TRMR-Apriori is able to achieve better execution time to discover frequent itemset those of previous sequential ARM algorithms such as Apriori, AprioriTid, Eclat, and FP-Growth and the previous parallel algorithms such as PApriori, MRApriori, and Modified Apriori with different condition on homogeneous computing environment using Hadoop-MapReduce platform in cloud. Overall, the TRMR-Apriori shows the strength to extract the frequent itemset from massive dataset in cloud.  


Author(s):  
Charlotte P. Lee ◽  
Kjeld Schmidt

The study of computing infrastructures has grown significantly due to the rapid proliferation and ubiquity of large-scale IT-based installations. At the same time, recognition has also grown of the usefulness of such studies as a means for understanding computing infrastructures as material complements of practical action. Subsequently the concept of “infrastructure” (or “information infrastructures,” “cyberinfrastructures,” and “infrastructuring”) has gained increasing importance in the area of Computer-Supported Cooperative Work (CSCW) as well as in neighboring areas such as Information Systems research (IS) and Science and Technology Studies (STS). However, as such studies have unfolded, the very concept of “infrastructure” is being applied in different discourses, for different purposes, in myriad different senses. Consequently, the concept of “infrastructure” has become increasingly muddled and needs clarification. The chapter presents a critical investigation of the vicissitudes of the concept of “infrastructure” over the last 35 years.


Symmetry ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 949
Author(s):  
Zhen Li ◽  
Xiaoyan Zhang

As a further extension of the fuzzy set and the intuitive fuzzy set, the interval-valued intuitive fuzzy set (IIFS) is a more effective tool to deal with uncertain problems. However, the classical rough set is based on the equivalence relation, which do not apply to the IIFS. In this paper, we combine the IIFS with the ordered information system to obtain the interval-valued intuitive fuzzy ordered information system (IIFOIS). On this basis, three types of multiple granulation rough set models based on the dominance relation are established to effectively overcome the limitation mentioned above, which belongs to the interdisciplinary subject of information theory in mathematics and pattern recognition. First, for an IIFOIS, we put forward a multiple granulation rough set (MGRS) model from two completely symmetry positions, which are optimistic and pessimistic, respectively. Furthermore, we discuss the approximation representation and a few essential characteristics for the target concept, besides several significant rough measures about two kinds of MGRS symmetry models are discussed. Furthermore, a more general MGRS model named the generalized MGRS (GMGRS) model is proposed in an IIFOIS, and some important properties and rough measures are also investigated. Finally, the relationships and differences between the single granulation rough set and the three types of MGRS are discussed carefully by comparing the rough measures between them in an IIFOIS. In order to better utilize the theory to realistic problems, an actual case shows the methods of MGRS models in an IIFOIS is given in this paper.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nishith Kumar ◽  
Md. Aminul Hoque ◽  
Masahiro Sugimoto

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.


Author(s):  
Zahra Homayouni ◽  
Mir Saman Pishvaee ◽  
Hamed Jahani ◽  
Dmitry Ivanov

AbstractAdoption of carbon regulation mechanisms facilitates an evolution toward green and sustainable supply chains followed by an increased complexity. Through the development and usage of a multi-choice goal programming model solved by an improved algorithm, this article investigates sustainability strategies for carbon regulations mechanisms. We first propose a sustainable logistics model that considers assorted vehicle types and gas emissions involved with product transportation. We then construct a bi-objective model that minimizes total cost as the first objective function and follows environmental considerations in the second one. With our novel robust-heuristic optimization approach, we seek to support the decision-makers in comparison and selection of carbon emission policies in supply chains in complex settings with assorted vehicle types, demand and economic uncertainty. We deploy our model in a case-study to evaluate and analyse two carbon reduction policies, i.e., carbon-tax and cap-and-trade policies. The results demonstrate that our robust-heuristic methodology can efficiently deal with demand and economic uncertainty, especially in large-scale problems. Our findings suggest that governmental incentives for a cap-and-trade policy would be more effective for supply chains in lowering pollution by investing in cleaner technologies and adopting greener practices.


2011 ◽  
Vol 105-107 ◽  
pp. 2169-2173
Author(s):  
Zong Chang Xu ◽  
Xue Qin Tang ◽  
Shu Feng Huang

Wavelet Neural Network (WNN) integration modeling based on Rough Set (RS) is studied. An integration modeling algorithm named RS-WNN, which first introduces a heuristic attribute reduction recursion algorithm to determine the optimum decision attributes and then conducts WNN modeling, is proposed. This method is adopted to more effectively eliminate the redundant attributes, lower the structure complexity of WNN, which reduce the time of training and improve the generalization ability of WNN. The result of the experiment shows this method is superior and efficient.


Sign in / Sign up

Export Citation Format

Share Document