Using Data Mining for Forecasting Data Management Needs

2008 ◽  
pp. 2088-2104
Author(s):  
Qingyu Zhang ◽  
Richard S. Segall

This chapter illustrates the use of data mining as a computational intelligence methodology for forecasting data management needs. Specifically, this chapter discusses the use of data mining with multidimensional databases for determining data management needs for the selected biotechnology data of forest cover data (63,377 rows and 54 attributes) and human lung cancer data set (12,600 rows of transcript sequences and 156 columns of gene types). The data mining is performed using four selected software of SAS® Enterprise MinerTM, Megaputer PolyAnalyst® 5.0, NeuralWare Predict®, and Bio- Discovery GeneSight®. The analysis and results will be used to enhance the intelligence capabilities of biotechnology research by improving data visualization and forecasting for organizations. The tools and techniques discussed here can be representative of those applicable in a typical manufacturing and production environment. Screen shots of each of the four selected software are presented, as are conclusions and future directions.

Author(s):  
Qingyu Zhang ◽  
Richard S. Segall

This chapter illustrates the use of data mining as a computational intelligence methodology for forecasting data management needs. Specifically, this chapter discusses the use of data mining with multidimensional databases for determining data management needs for the selected biotechnology data of forest cover data (63,377 rows and 54 attributes) and human lung cancer data set (12,600 rows of transcript sequences and 156 columns of gene types). The data mining is performed using four selected software of SAS® Enterprise MinerTM, Megaputer PolyAnalyst® 5.0, NeuralWare Predict®, and Bio- Discovery GeneSight®. The analysis and results will be used to enhance the intelligence capabilities of biotechnology research by improving data visualization and forecasting for organizations. The tools and techniques discussed here can be representative of those applicable in a typical manufacturing and production environment. Screen shots of each of the four selected software are presented, as are conclusions and future directions.


Author(s):  
Anindita Desarkar ◽  
Ajanta Das

Huge amount of data is generated from Healthcare transactions where data are complex, voluminous and heterogeneous in nature. This large dataset can be used as an ideal store which can be analyzed for knowledge discovery as well as various future predictions. So, Data mining is becoming increasingly popular as it offers set of innovative tools and techniques to handle this kind of data set whereas traditional methods have limitations for that. In summary, providing the better patient care and reduction in healthcare cost are two major goals of application of data mining in healthcare. Initially, this chapter explores on the various types of eHealth data and its characteristics. Subsequently it explores various domains in healthcare sector and shows how data mining plays a major role in those domains. Finally, it describes few common data mining techniques and their applications in eHealth domain.


2012 ◽  
Vol 32 (1) ◽  
pp. 184-196 ◽  
Author(s):  
Rubens A. C. Lamparelli ◽  
Jerry A. Johann ◽  
Éder R. dos Santos ◽  
Julio C. D. M. Esquerdo ◽  
Jansle V. Rocha

This study aimed at identifying different conditions of coffee plants after harvesting period, using data mining and spectral behavior profiles from Hyperion/EO1 sensor. The Hyperion image, with spatial resolution of 30 m, was acquired in August 28th, 2008, at the end of the coffee harvest season in the studied area. For pre-processing imaging, atmospheric and signal/noise effect corrections were carried out using Flaash and MNF (Minimum Noise Fraction Transform) algorithms, respectively. Spectral behavior profiles (38) of different coffee varieties were generated from 150 Hyperion bands. The spectral behavior profiles were analyzed by Expectation-Maximization (EM) algorithm considering 2; 3; 4 and 5 clusters. T-test with 5% of significance was used to verify the similarity among the wavelength cluster means. The results demonstrated that it is possible to separate five different clusters, which were comprised by different coffee crop conditions making possible to improve future intervention actions.


Author(s):  
Robab Saadatdoost ◽  
Alex Tze Hiang Sim ◽  
Hosein Jafarkarimi ◽  
Jee Mei Hee

This project presents the patterns and relations between attributes of Iran Higher Education data gained from the use of data mining techniques to discover knowledge and use them in decision making system of IHE. Large dataset of IHE is difficult to analysis and display, since they are significant for decision making in IHE. This study utilized the famous data mining software, Weka and SOM to mine and visualize IHE data. In order to discover worthwhile patterns, we used clustering techniques and visualized the results. The selected dataset includes data of five medical university of Tehran as a small data set and Ministry of Science - Research and Technology's universities as a larger data set. Knowledge discovery and visualization are necessary for analyzing of these datasets. Our analysis reveals some knowledge in higher education aspect related to program of study, degree in each program, learning style, study mode and other IHE attributes. This study helps to IHE to discover knowledge in a visualize way; our results can be focused more by experts in higher education field to assess and evaluate more.


2017 ◽  
Vol 9 (1) ◽  
pp. 38-49
Author(s):  
Fatma Önay Koçoğlu ◽  
İlkim Ecem Emre ◽  
Çiğdem Selçukcan Erol

The aim of this study is to analyze success in e-learning with data mining methods and find out potential patterns. In this context, 374.073 data of 2013-14 period taken from an institution serving in e-learning field in Turkey are used. Data set, which is collected from information technology, banking and pharmaceutical industries, includes success and industry of employees', trainings which they complete, whether the trainings are completed, first login and last logout dates, training completion date and duration of experience in training. Using this data set, success status of participants is observed by using data mining methods (C5.0, Random Forest and Gini). By observing using accuracy, error rate, specificity and f- score from performance evaluation criteria, C5.0 has chosen the algorithm which gives the best performance results. According to the results of the study, it has been determined that the sectors of the employees are not important, on the contrary the ones that are important are the completion status, the duration of experience and training.


Sign in / Sign up

Export Citation Format

Share Document