scholarly journals A Pattern Storage System using Pattern Warehouse along with Sources of Pattern Generation and Applications

Now a day different data mining algorithms are ready to create the specific set of data known as Pattern from a huge data repository, but there is no infrastructure or system to save it as persistent storage for the generated patterns. Pattern warehouse presents a foundation to make these patterns safe in the specific environment for long term use. Most organizations are excited to know the information or patterns rather than raw data or group of unprocessed data. Because extracted knowledge play a vital role to take right decision for the growth of an organization. We have examined the sources of patterns generated from large data sets. In this paper, we have presented little importance on the application area of pattern and idea of patter warehouse, the architecture of pattern warehouse then correlation between data warehouse and data mining, association between data mining and pattern warehouse, critical evaluation between existing approaches which theoretically published and more stress on association rule related review elements. In this paper, we analyze the patterns warehouse, data warehouse concerning various factors like storage space, type of storage unit, characteristics, and provide several research domains.

d'CARTESIAN ◽  
2014 ◽  
Vol 3 (1) ◽  
pp. 1
Author(s):  
M. Zainal Mahmudin ◽  
Altien Rindengan ◽  
Winsy Weku

Abstract The requirement of highest information sometimes is not balance with the provision of adequate information, so that the information must be re-excavated in large data. By using the technique of association rule we can obtain information from large data such as the college data. The purposes of this research is to determine the patterns of study from student in F-MIPA UNSRAT by using association rule method of data mining algorithms and to compare in the apriori method and a hash-based algorithms. The major’s student data of F-MIPA UNSRAT as a data were processed by association rule method of data mining with the apriori algorithm and a hash-based algorithm by using support and confidance at least 1 %. The results of processing data with apriori algorithms was same with the processing results of hash-based algorithms is as much as 49 combinations of 2-itemset. The pattern that formed between 7,5% of graduates from mathematics major that studied for more 5 years with confidence value is 38,5%. Keywords: Apriori algorithm, hash-based algorithm, association rule, data mining. Abstrak Kebutuhan informasi yang sangat tinggi terkadang tidak diimbangi dengan pemberian informasi yang memadai, sehingga informasi tersebut harus kembali digali dalam data yang besar. Dengan menggunakan teknik association rule kita dapat memperoleh informasi dari data yang besar seperti data yang ada di perguruan tinggi. Tujuan penelitian ini adalah menentukan pola lama studi mahasiswa F-MIPA UNSRAT dengan menggunakan metode association rule data mining serta membandingkan algoritma apriori dan algoritma hash-based. Data yang digunakan adalah data induk mahasiswa F-MIPA UNSRAT yang  diolah menggunakan teknik association rule data mining dengan algoritma apriori dan algoritma hash-based dengan minimum support 1% dan minimum confidance 1%. Hasil pengolahan data dengan algoritma apriori sama dengan hasil pengolahan data dengan algoritma hash-based yaitu sebanyak 49 kombinasi 2-itemset. Pola yang terbentuk antara lain 7,5% lulusan yang berasal dari jurusan matematika menempuh studi selama lebih dari     5 tahun dengan nilai confidence 38,5%. Kata kunci : Association rule data mining, algoritma apriori, algoritma hash-based


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Zhihui Wang ◽  
Jinyu Wang

The data mining and big data technologies could be of utmost importance to investigate outbound and case datasets in the police records. New findings and useful information may potentially be obtained through data preprocessing and multidimensional modeling. Public security data is a kind of “big data,” having characteristics like large volume, rapid growth, various structures, large-scale storage, low density, and time sensitiveness. In this paper, a police data warehouse is constructed and a public security information analysis system is proposed. The proposed system comprises two modules: (i) case management and (ii) public security information mining. The former is responsible for the collection and processing of case information. The latter preprocesses the data of major cases that have occurred in the past ten years to create a data warehouse. Then, we use the model to create a data warehouse based on needs. By dividing the measurement values and dimensions, the analysis and prediction of criminals’ characteristics and the case environment realize relationships between them. In the process of mining and processing crime data, data mining algorithms can quickly find out the relevant information in the data. Furthermore, the system can find out relevant trends and laws to detect criminal cases faster than other methods. This can reduce the emergence of new crimes and provide a basis for decision-making in the public security department that has practical significance.


2005 ◽  
Vol 15 (1) ◽  
pp. 125-145 ◽  
Author(s):  
Milija Suknovic ◽  
Milutin Cupic ◽  
Milan Martic ◽  
Darko Krulj

This paper shows design and implementation of data warehouse as well as the use of data mining algorithms for the purpose of knowledge discovery as the basic resource of adequate business decision making process. The project is realized for the needs of Student's Service Department of the Faculty of Organizational Sciences (FOS), University of Belgrade, Serbia and Montenegro. This system represents a good base for analysis and predictions in the following time period for the purpose of quality business decision-making by top management. Thus, the first part of the paper shows the steps in designing and development of data warehouse of the mentioned business system. The second part of the paper shows the implementation of data mining algorithms for the purpose of deducting rules, patterns and knowledge as a resource for support in the process of decision making.


The main employment and resource of our country is agriculture. In the upcoming days agriculture is going to be one of the important field .Agriculture plays a vital role in economical development of india. Half of the Indian population is mainly depended on agriculture. It is the source of living it is important in everyday life. Comparing to previous years Now-aday's Agriculture is in poor condition. The most important reasons for this is there is no proper guidance for the farmers.Outstanding to these problems, farming affects the yield of Coriander and lack of knowledge about the Coriander cultivation methodologies. And also season to cultivate the coriander and choosing which soil is the best to cultivate the particular Coriander based on the weather condition and also when to harvest the Coriander for the best yield. If the farmer is aware about the Coriander cultivation methodologies and harvesting it will more helpful for the people in the real world and also to increase the Coriander productivity. Data mining is the process of finding new template from large data sets, this technology which is in use in inferring useful knowledge that can be put to use from a vast amount of data. Climate is one of the meteorological data that is well-to-do by important knowledge. This paper presents a brief comparative study of various different techniques used for yield of coriander. The data mining techniques that are in use for the coriander yield estimation are K-Means.


2020 ◽  
Vol 35 (3) ◽  
pp. 182-194
Author(s):  
Gary Smith

The scientific method is based on the rigorous testing of falsifiable conjectures. Data mining, in contrast, puts data before theory by searching for statistical patterns without being constrained by prespecified hypotheses. Artificial intelligence and machine learning systems, for example, often rely on data-mining algorithms to construct models with little or no human guidance. However, a plethora of patterns are inevitable in large data sets, and computer algorithms have no effective way of assessing whether the patterns they unearth are truly useful or meaningless coincidences. While data mining sometimes discovers useful relationships, the data deluge has caused the number of possible patterns that can be discovered relative to the number that are genuinely useful to grow exponentially—which makes it increasingly likely that what data mining unearths is likely to be fool’s gold.


2015 ◽  
Vol 719-720 ◽  
pp. 924-928 ◽  
Author(s):  
Xiao Chun Sheng ◽  
Xiao Feng Xue ◽  
Yan Ping Cheng

Cloud computing is computing tasks distribution resources of a large number of computers in the subnet, to provide users with cheap and efficient computing power, storage capacity and service capabilities. Data mining is to find useful information in large data repository. Frequent flow of large amounts of data quickly and accurately find important basis for forecasting and decision, therefore, under the cloud computing environment parallelization frequent item data mining strategy to provide efficient solutions to store and analyze vast amounts of data has important theoretical significanceand application value.


2020 ◽  
Vol 17 (1) ◽  
pp. 6-9
Author(s):  
Ramya G. Franklin ◽  
B. Muthukumar

The growth of Science is a priceless asset to the human and society. The plethora of high-end machines has made life a sophistication which in turn is paid back as health issues. The health care data are complex and large. This heterogeneous data are used to diagnose patient’s diseases. It is better to predict the diseases at an earlier stage that can save the life and also have an upper hand in controlling the diseases. Data mining approaches are very useful in analyzing the complex, heterogeneous and large data set. The mining algorithms extract the essential data set from the raw data. This paper presents a survey on the various data mining algorithms used in predicting a very common disease in day a today life “Diabetics Mellitus.” Over 246 million people in the world are diabetic with a majority of them being women. The WHO reports that by 2025 this number is expected to rise to over 380 million.


Author(s):  
R. Catherine Stephina Mary ◽  
B. Satheesh Kumar

Data Mining is a field of computer science which is used to discover new patterns for large data sets. Classification is an important task in data mining. In different areas of medicine, data mining has contributed to improve the results with other methodologies. Gestational diabetes is a condition characterized by high blood sugar (glucose) levels that is first recognized during pregnancy period of a woman. Diabetes is a disease in which levels of blood glucose, also called blood sugar, are above normal. People with diabetes have problems converting food to energy. Normally, after a meal, the body breaks food down into glucose, which the blood carries to cells throughout the body. Cells use insulin, a hormone made in the pancreas, to help them convert blood glucose into energy.During the second and third trimester, a mother's diabetes can lead to over-nutrition and excess growth of the baby. Having a large baby increases risks during labour and delivery. For example, large babies often require caesarean deliveries and if he or she is delivered vaginally, they are at increased risk for trauma to their shoulder. In addition, when foetal over-nutrition occurs and hyper insulinemia results, the baby's blood sugar can drop very low after birth, since it won't be receiving the high blood sugar from the mother. However, with proper treatment, a gestational diabetic mother can deliver a healthy baby despite having diabetes. In this paper, many classification algorithms like J48, simple CART and Naïve bayes algorithm are used to diagnose the diabetes in pregnant women and they are compared for their accuracy levels.


2014 ◽  
Vol 667 ◽  
pp. 218-225 ◽  
Author(s):  
Yan Wang ◽  
Kun Yang ◽  
Xiang Jing ◽  
Huang Long Jin

KDD Cup 99 dataset is not only the most widely used dataset in intrusion detection, but also the de facto benchmark on evaluating the performance merits of intrusion detection system. Nevertheless there are a lot of issues in this dataset which cannot be omitted. In order to establish good data mining models in intrusion detection and find the appropriate network intrusion attack types’ features, researchers should have a well-known understanding on this dataset. In this paper, first and foremost we have made an in-depth analysis on the problems which the dataset are existed, and given the related solutions. Secondly, we also have carried out plenty data preprocessing on the 10% subset of KDD Cup 99 dataset’s training set, giving better results to the following process. What’s more, by comparing 10 common kinds of data mining algorithms in our experiment, we have analyzed and summarized that data preprocessing plays a vital role on the performance and importance to data mining algorithms.


Sign in / Sign up

Export Citation Format

Share Document