Tools for Aggregating, Analyzing and Mining Combinatorial Data

2009 ◽  
Vol 1159 ◽  
Author(s):  
Wesley Jones ◽  
Changwon Suh ◽  
Peter A Graf ◽  
Daniel Korytina ◽  
Craig Swank ◽  
...  

AbstractWe demonstrate how data mining techniques can be applied to complex combinatorial data sets and how data from multiple sources can be aggregated via the developed scientific data management system. An example is shown for the case of aggregated combinatorial data for the study of composition, processing, structure, and property relationships of transparent conducting oxides by applying data mining techniques such as principal component analysis. Data mappings of mined results are shown to effectively enable visualization of data trends, identification of anomalies in Fourier transform infrared spectroscopy patterns, and scientifically interesting libraries and spectral regions.

2012 ◽  
Vol 1425 ◽  
Author(s):  
Changwon Suh ◽  
Kristin Munch ◽  
David Biagioni ◽  
Stephen Glynn ◽  
John Scharf ◽  
...  

ABSTRACTWe discuss our current research focus on photovoltaic (PV) informatics, which is dedicated to functionality enhancement of solar materials through data management and data mining-aided, integrated computational materials engineering (ICME) for rapid screening and identification of multi-scale processing/structure/property/performance relationships. Our current PV informatics research ranges from transparent conducting oxides (TCO) to solar absorber materials. As a test bed, we report on examples of our current data management system for PV research and advanced data mining to improve the performance of solar cells such as CuInxGa1-xSe2 (CIGS) aiming at low-cost and high-rate processes. For the PV data management, we show recent developments of a strategy for data modeling, collection and aggregation methods, and construction of data interfaces, which enable proper archiving and data handling for data mining. For scientific data mining, the value of high-dimensional visualizations and non-linear dimensionality reduction is demonstrated to quantitatively assess how process conditions or properties are interconnected in the context of the development of Al-doped ZnO (AZO) thin films as the TCO layers for CIGS devices. Such relationships between processing and property of TCOs lead to optimal process design toward enhanced performance of CIGS cells/devices.


Author(s):  
Edy Irwansyah ◽  
Ebiet Salim Pratama ◽  
Margaretha Ohyver

Cardiovascular disease is the number one cause of death in the world and Quoting from WHO, around 31% of deaths in the world are caused by cardiovascular diseases and more than 75% of deaths occur in developing countries. The results of patients with cardiovascular disease produce many medical records that can be used for further patient management. This study aims to develop a method of data mining by grouping patients with cardiovascular disease to determine the level of patient complications in the two clusters. The method applied is principal component analysis (PCA) which aims to reduce the dimensions of the large data available and the techniques of data mining in the form of cluster analysis which implements the K-Medoids algorithm. The results of data reduction with PCA resulted in five new components with a cumulative proportion variance of 0.8311. The five new components are implemented for cluster formation using the K-Medoids algorithm which results in the form of two clusters with a silhouette coefficient of 0.35. Combination of techniques of Data reduction by PCA and the application of the K-Medoids clustering algorithm are new ways for grouping data of patients with cardiovascular disease based on the level of patient complications in each cluster of data generated.


2012 ◽  
Vol 23 (4) ◽  
pp. 289-296 ◽  
Author(s):  
Ivana Ćavar ◽  
Zvonko Kavran ◽  
Marjana Petrović

Official road classification is used for general purposes but for deep traffic analysis this classification is not sufficient. Today there are efficient ways to collect large amounts of data from multiple sources that can be used for different causes. These large amounts of data cannot be analysed with traditional methods and new state-of-the-art algorithms should be used. The paper presents the methodology for urban road classification based on GPS (Global Positioning System) vehicle tracks and data on infrastructural characteristics of road subsegments. The process of defining road categories includes data collection and analysis, data cleansing and fusion, multiple regression, principal component analysis (PCA) as well as cross-validation and k-nearest neighbour (kNN) classification procedure. Results of such continuum can be used as base for further traffic analysis as travel time prediction, optimal route detection etc.


2017 ◽  
Vol 25 (6) ◽  
pp. 949-966 ◽  
Author(s):  
G Asencio-Cortés ◽  
F Martínez-Álvarez ◽  
A Morales-Esteban ◽  
J Reyes ◽  
A Troncoso

Abstract Increasing attention has been paid to the prediction of earthquakes with data mining techniques during the last decade. Several works have already proposed the use of certain features serving as inputs for supervised classifiers. However, they have been successfully used without any further transformation so far. In this work, the use of principal component analysis (PCA) to reduce data dimensionality and generate new datasets is proposed. In particular, this step is inserted in a successfully already used methodology to predict earthquakes. Tokyo, one of the cities mostly threatened by large earthquakes occurrence in Japan, is studied. Several well-known classifiers combined with PCA have been used. Noticeable improvement in the results is reported.


Author(s):  
Dimitrios Katsaros ◽  
Gökhan Yavas ◽  
Alexandros Nanopoulos ◽  
Murat Karakaya ◽  
Özgür Ulusoy ◽  
...  

During the past years, we have witnessed an explosive growth in our capabilities to both generate and collect data. Advances in scientific data collection, the computerization of many businesses, and the recording (logging) of clients’ accesses to networked resources have generated a vast amount of data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


2019 ◽  
Vol 1 (1) ◽  
pp. 121-131
Author(s):  
Ali Fauzi

The existence of big data of Indonesian FDI (foreign direct investment)/ CDI (capital direct investment) has not been exploited somehow to give further ideas and decision making basis. Example of data exploitation by data mining techniques are for clustering/labeling using K-Mean and classification/prediction using Naïve Bayesian of such DCI categories. One of DCI form is the ‘Quick-Wins’, a.k.a. ‘Low-Hanging-Fruits’ Direct Capital Investment (DCI), or named shortly as QWDI. Despite its mentioned unfavorable factors, i.e. exploitation of natural resources, low added-value creation, low skill-low wages employment, environmental impacts, etc., QWDI , to have great contribution for quick and high job creation, export market penetration and advancement of technology potential. By using some basic data mining techniques as complements to usual statistical/query analysis, or analysis by similar studies or researches, this study has been intended to enable government planners, starting-up companies or financial institutions for further CDI development. The idea of business intelligence orientation and knowledge generation scenarios is also one of precious basis. At its turn, Information and Communication Technology (ICT)’s enablement will have strategic role for Indonesian enterprises growth and as a fundamental for ‘knowledge based economy’ in Indonesia.


Author(s):  
S. K. Saravanan ◽  
G. N. K. Suresh Babu

In contemporary days the more secured data transfer occurs almost through internet. At same duration the risk also augments in secure data transfer. Having the rise and also light progressiveness in e – commerce, the usage of credit card (CC) online transactions has been also dramatically augmenting. The CC (credit card) usage for a safety balance transfer has been a time requirement. Credit-card fraud finding is the most significant thing like fraudsters that are augmenting every day. The intention of this survey has been assaying regarding the issues associated with credit card deception behavior utilizing data-mining methodologies. Data mining has been a clear procedure which takes data like input and also proffers throughput in the models forms or patterns forms. This investigation is very beneficial for any credit card supplier for choosing a suitable solution for their issue and for the researchers for having a comprehensive assessment of the literature in this field.


Author(s):  
Jean Claude Turiho ◽  
◽  
Wilson Cheruiyot ◽  
Anne Kibe ◽  
Irénée Mungwarakarama ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document