Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

The use of cultural algorithms with evolutionary programming to control the data mining of large-scale spatio-temporal databases

1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation ◽

10.1109/icsmc.1997.637338 ◽

2002 ◽

Cited By ~ 1

Author(s):

R. Reynolds ◽

H. Al-Shehri

Keyword(s):

Data Mining ◽

Large Scale ◽

Evolutionary Programming ◽

Temporal Databases ◽

Cultural Algorithms ◽

Spatio Temporal

Download Full-text

Classification and metaclassification in large scale data mining application for estimation of software projects

2010 IEEE 9th International Conference on Cyberntic Intelligent Systems ◽

10.1109/ukricis.2010.5898136 ◽

2010 ◽

Cited By ~ 1

Author(s):

Dorota Dzega ◽

Wieslaw Pietruszkiewicz

Keyword(s):

Data Mining ◽

Large Scale ◽

Software Projects ◽

Large Scale Data ◽

Data Mining Application ◽

Scale Data

Download Full-text

Large-Scale Data Mining to Optimize Patient-Centered Scheduling at Health Centers

Journal of Healthcare Informatics Research ◽

10.1007/s41666-018-0030-0 ◽

2018 ◽

Vol 3 (1) ◽

pp. 1-18

Author(s):

Kislaya Kunjan ◽

Huanmei Wu ◽

Tammy R. Toscos ◽

Bradley N. Doebbeling

Keyword(s):

Data Mining ◽

Large Scale ◽

Health Centers ◽

Patient Centered ◽

Large Scale Data ◽

Scale Data

Download Full-text

A Review of Machine Learning and Data Mining Approaches for Business Applications in Social Networks

International Journal of E-Business Research ◽

10.4018/jebr.2013010103 ◽

2013 ◽

Vol 9 (1) ◽

pp. 36-53

Author(s):

Evis Trandafili ◽

Marenglen Biba

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Networks ◽

Large Scale ◽

Viral Marketing ◽

Social Network Mining ◽

Business Applications ◽

Mining Community ◽

Recent Developments ◽

Analysis Of Social Networks

Social networks have an outstanding marketing value and developing data mining methods for viral marketing is a hot topic in the research community. However, most social networks remain impossible to be fully analyzed and understood due to prohibiting sizes and the incapability of traditional machine learning and data mining approaches to deal with the new dimension in the learning process related to the large-scale environment where the data are produced. On one hand, the birth and evolution of such networks has posed outstanding challenges for the learning and mining community, and on the other has opened the possibility for very powerful business applications. However, little understanding exists regarding these business applications and the potential of social network mining to boost marketing. This paper presents a review of the most important state-of-the-art approaches in the machine learning and data mining community regarding analysis of social networks and their business applications. The authors review the problems related to social networks and describe the recent developments in the area discussing important achievements in the analysis of social networks and outlining future work. The focus of the review in not only on the technical aspects of the learning and mining approaches applied to social networks but also on the business potentials of such methods.

Download Full-text

Construction of a century solar chromosphere data set for solar activity related research

Solar-Terrestrial Physics ◽

10.12737/stp-3220171 ◽

2017 ◽

Vol 3 (2) ◽

pp. 5-8

Author(s):

Линь Ганхуа ◽

Lin Ganghua ◽

Ван Сяо-Фань ◽

Wang Xiao Fan ◽

Ян Сяо ◽

...

Keyword(s):

Data Mining ◽

Solar Activity ◽

Space Weather ◽

Large Scale ◽

Abnormal Behavior ◽

Solar Chromosphere ◽

Solar Cycles ◽

Data Set ◽

Related Research ◽

Data Mining Algorithms

This article introduces our ongoing project “Construction of a Century Solar Chromosphere Data Set for Solar Activity Related Research”. Solar activities are the major sources of space weather that affects human lives. Some of the serious space weather consequences, for instance, include interruption of space communication and navigation, compromising the safety of astronauts and satellites, and damaging power grids. Therefore, the solar activity research has both scientific and social impacts. The major database is built up from digitized and standardized film data obtained by several observatories around the world and covers a timespan more than 100 years. After careful calibration, we will develop feature extraction and data mining tools and provide them together with the comprehensive database for the astronomical community. Our final goal is to address several physical issues: filament behavior in solar cycles, abnormal behavior of solar cycle 24, large-scale solar eruptions, and sympathetic remote brightenings. Significant progresses are expected in data mining algorithms and software development, which will benefit the scientific analysis and eventually advance our understanding of solar cycles.

Download Full-text

Demand Supply Oriented Taxi Suggestion System for Vehicular Social Networks with Fuel Charging Mechanism

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit19515 ◽

2019 ◽

pp. 38-44

Author(s):

Selvi C ◽

Keerthana D

Keyword(s):

Data Mining ◽

Real Time ◽

Large Scale ◽

Recommendation System ◽

Important Research ◽

Taxi Drivers ◽

Charging Station ◽

Gps Trajectories ◽

Charging Mechanism ◽

Supply Level

Data mining depends on large-scale taxi traces is an important research concepts. A vital direction for analyzing taxi GPS dataset is to suggest cruising areas for taxi drivers. The project first investigates the real-time demand-supply level for taxis, and then makes an adaptive tradeoff between the utilities of drivers and passengers for different hotspots. This project constructs a recommendation system by jointly considering the profits of both drivers and passengers. At last, the qualified candidates are suggested to drivers based on analysis. The project also provides a real-time charging station recommendation system for EV taxis via large-scale GPS data mining. By combining each EV taxi’s historical recharging actions and real-time GPS trajectories, the present operational state of each taxi is predicted. Based on this information, for an EV taxi requesting a recommendation, recommend a charging station that leads to the minimal total time before its recharging starts.

Download Full-text