Data mining-based hierarchical transaction model for multi-level consistency management in large-scale replicated databases

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

The three paradoxes of the energy transition - Assessing sustainability of large-scale solar photovoltaic through multi-level and multi-scalar perspective in Rwanda

Journal of Cleaner Production ◽

10.1016/j.jclepro.2020.125519 ◽

2021 ◽

Vol 288 ◽

pp. 125519

Author(s):

Carole Brunet ◽

Oumarou Savadogo ◽

Pierre Baptiste ◽

Michel A. Bouchard ◽

Céline Cholez ◽

...

Keyword(s):

Large Scale ◽

Energy Transition ◽

Solar Photovoltaic ◽

Multi Level

Download Full-text

A new multi-level algorithm for balanced partition problem on large scale directed graphs

Advances in Aerodynamics ◽

10.1186/s42774-021-00074-x ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Xianyue Li ◽

Yufei Pang ◽

Chenxia Zhao ◽

Yang Liu ◽

Qingzhen Dong

Keyword(s):

Large Scale ◽

Directed Graphs ◽

Vlsi Design ◽

Graph Partition ◽

Partition Problem ◽

Multi Level ◽

The Stability ◽

Recursive Partition ◽

Balanced Partition ◽

Partition Method

AbstractGraph partition is a classical combinatorial optimization and graph theory problem, and it has a lot of applications, such as scientific computing, VLSI design and clustering etc. In this paper, we study the partition problem on large scale directed graphs under a new objective function, a new instance of graph partition problem. We firstly propose the modeling of this problem, then design an algorithm based on multi-level strategy and recursive partition method, and finally do a lot of simulation experiments. The experimental results verify the stability of our algorithm and show that our algorithm has the same good performance as METIS. In addition, our algorithm is better than METIS on unbalanced ratio.

Download Full-text

The use of cultural algorithms with evolutionary programming to control the data mining of large-scale spatio-temporal databases

1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation ◽

10.1109/icsmc.1997.637338 ◽

2002 ◽

Cited By ~ 1

Author(s):

R. Reynolds ◽

H. Al-Shehri

Keyword(s):

Data Mining ◽

Large Scale ◽

Evolutionary Programming ◽

Temporal Databases ◽

Cultural Algorithms ◽

Spatio Temporal

Download Full-text

Large scale gene regulatory network inference with a multi-level strategy

Molecular BioSystems ◽

10.1039/c5mb00560d ◽

2016 ◽

Vol 12 (2) ◽

pp. 588-597 ◽

Cited By ~ 14

Author(s):

Jun Wu ◽

Xiaodong Zhao ◽

Zongli Lin ◽

Zhifeng Shao

Keyword(s):

Gene Regulatory Network ◽

Regulatory Network ◽

Large Scale ◽

Network Inference ◽

Biological Processes ◽

Molecular Processes ◽

Gene Regulatory Network Inference ◽

Cell Functions ◽

Multi Level ◽

Gene Regulatory

Transcriptional regulation is a basis of many crucial molecular processes and an accurate inference of the gene regulatory network is a helpful and essential task to understand cell functions and gain insights into biological processes of interest in systems biology.

Download Full-text