parallel mining
Recently Published Documents


TOTAL DOCUMENTS

99
(FIVE YEARS 20)

H-INDEX

11
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Jalal Khalil ◽  
Da Yan ◽  
Guimu Guo ◽  
Lyuheng Yuan
Keyword(s):  

2021 ◽  
Author(s):  
Long Chen ◽  
Xiaoming Hu ◽  
Ge Wang ◽  
Dongpu Cao ◽  
Lingxi Li ◽  
...  

Author(s):  
Nesma Youssef ◽  
Hatem Abdulkader ◽  
Amira Abdelwahab

mSystems ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
Zhongyou Li ◽  
Katja Koeppen ◽  
Victoria I. Holden ◽  
Samuel L. Neff ◽  
Liviu Cengher ◽  
...  

ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species.


Webology ◽  
2020 ◽  
Vol 17 (2) ◽  
pp. 31-43
Author(s):  
Vandna Dahiya and Sandeep Dalal

Utility Itemset Mining (UIM) is a fundamental technique to find out various itemsets with interestingness measures in addition to their quantity. It helps in finding valuable items that cannot be tracked with frequent itemset mining. There are many techniques to mine the itemsets based on their utilities, but the need of the hour is to mine them from larger datasets. This paper presents a brief overview of various approaches for utility mining, which mine using the parallel framework to enhance the pace of computation. The paper is concluded with a discussion on various challenges and openings in the field of parallel mining and provides away for further development of the prevailing methodologies of big data.


2020 ◽  
Vol 12 (8) ◽  
pp. 125 ◽  
Author(s):  
Shihab Shahriar Hazari ◽  
Qusay H. Mahmoud

A blockchain is a distributed ledger forming a distributed consensus on a history of transactions, and is the underlying technology for the Bitcoin cryptocurrency. Its applications are far beyond the financial sector. The transaction verification process for cryptocurrencies is much slower than traditional digital transaction systems. One approach to scalability or the speed at which transactions are processed is to design a solution that offers faster Proof of Work. In this paper, we propose a method for accelerating the process of Proof of Work based on parallel mining rather than solo mining. The goal is to ensure that no more than two or more miners put the same effort into solving a specific block. The proposed method includes a process for selection of a manager, distribution of work and a reward system. This method has been implemented in a test environment that contains all the characteristics needed to perform Proof of Work for Bitcoin and has been tested, using a variety of case scenarios, by varying the difficulty level and number of validators. Experimental evaluations were performed locally and in a cloud environment, and experimental results demonstrate the feasibility the proposed method.


2020 ◽  
pp. 106-117
Author(s):  
Ahmed Sultan Alhegami ◽  
Hussein Alkhader Alsaeedi

Association rule mining plays a very important role in the distributed environment for Big Data analysis. The massive volume of data creates imminent needs to design novel, parallel and incremental algorithms for the association rule mining in order to handle Big Data. In this paper, a framework is proposed for incremental parallel interesting association rule mining algorithm for Big Data. The proposed framework incorporates interestingness measures during the process of mining. The proposed framework works to process the incremental data, which usually comes at different times, the user's important knowledge is explored by processing of new data only, without having to return from scratch. One of the main features of this framework is to consider the user domain knowledge, which is monotonically increased. The model that incorporates the users’ belief during the extraction of patterns is attractive, effective and efficient. The proposed framework is implemented on public datasets as well as it is evaluated based on the interesting results that are found.


Sign in / Sign up

Export Citation Format

Share Document