Pattern Identification and Predictions in Data Analysis

Data Mining is an analytic process to explore data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new sets of data. The main target of data mining application is prediction. Predictive data mining is important and it has the most direct business applications in world. The paper briefly explains the process of data mining which consists of three stages: (1) the Initial exploration, (2) Pattern identification with validation, and (3) Deployment (application of the model to new data in order to generate predictions). Data Mining is being done for Patterns and Relationships recognitions in Data analysis, with an emphasis on large Observational data bases. From a statistical perspective Data Mining is viewed as computer automated exploratory data analytical system for large sets of data and it has huge Research challenges in India and abroad as well. Machine learning methods form the core of Data Mining and Decision tree learning. Data mining work is integrated within an existing user environment, including the works that already make use of data warehousing and Online Analytical Processing (OLAP). The paper describes how data mining tools predict future trends and behavior which allows in making proactive knowledge-driven decisions.

Download Full-text

ANALYSIS OF A LARGE VOLUME OF DATA ON THE STATE OF HIGH-TECH EQUIPMENT

Transport development ◽

10.33082/td.2017.1-1.09 ◽

2017 ◽

pp. 90-95

Author(s):

Д.С. ШИБАЕВ ◽

В.В. ВЫЧУЖАНИН ◽

Н.О. ШИБАЕВА

Keyword(s):

Data Mining ◽

Data Analysis ◽

Data Storage ◽

Large Volume ◽

Modern Architecture ◽

Map Reduce ◽

High Tech ◽

Short Term ◽

Use Of Data ◽

Time Of Operation

The ideological basis of the study is to analyze the data obtained in the result of a large number of high-tech equipment. The data is distributed in databases, depending on various characteristics. The complexity of the sub-sequent processing depends on the amount of information you need to perform, as well as architectural type of data storage. The use of data mining technology allows to significantly improve the analysis of information and subsequent short-term search value. The use of this technology will improve the efficiency of the archives of marine indicators for all time of operation of the vessel. The technology of data analysis is not tho-rough and requires permanent modification to increase their own efficiency. The addition of modern architecture through data in the databases, will allow to increase efficiency of data analysis, consisting of a large number of indicators of the condition of the vessel and its equipment. One of these architectures is Map-Reduce.

Download Full-text

Decision Support System for Diabetes Classification Using Data Mining Techniques

Research Anthology on Decision Support Systems and Decision Management in Healthcare, Business, and Engineering ◽

10.4018/978-1-7998-9023-2.ch053 ◽

2021 ◽

pp. 1091-1113

Author(s):

Ahmad M. Al-Khasawneh

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Algorithms ◽

Use Of Data ◽

Predictive Data Mining ◽

Severity Of The Disease ◽

Using Data

The use of data mining algorithms in health information systems has played a significant role in developing applications that help to diagnose different diseases. The type of the disease determines the selection of the algorithm, parameters to be used, and dataset pre-processing steps, etc. In this chapter, diagnosing diabetes mellitus is the target since it has gained significant attention in the last few decades due to the increased severity of the disease. Four predictive data mining approaches are being used in diagnosing diabetes. Four models were implemented to diagnose diabetes from PIMA dataset: k-nearest neighbor, support vector machine, multilayer perceptron neural network, and naive Bayesian network. Giving the highest classification accuracy, support vector machine technique outperformed the others with a value of 78.83%.

Download Full-text

Decision Support System for Diabetes Classification Using Data Mining Techniques

Advances in Healthcare Information Systems and Administration - Handbook of Research on Emerging Perspectives on Healthcare Information Systems and Informatics ◽

10.4018/978-1-5225-5460-8.ch012 ◽

2018 ◽

pp. 281-303

Author(s):

Ahmad M. Al-Khasawneh

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Algorithms ◽

Use Of Data ◽

Predictive Data Mining ◽

Severity Of The Disease ◽

Using Data

Download Full-text

Sentimental Analysis on Twitter using Pig and Hive

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7051.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 438-441

Keyword(s):

Data Analysis ◽

Data Science ◽

Research Area ◽

Large Set ◽

Future Trends ◽

Large Sets ◽

Analytic Process ◽

New Challenges

Data science is the analytic process to explore new prediction and pattern when to process the collected data. Data analysis is done using large sets of databases and due to them we can easily form patterns and then they could be recognized. This will helpful for prediction of new challenges and circumstances. From the perspective of statistics data analysis of large observational databases has very challenges which made it a research area in abroad as well as in India. Different tools are available in market to process and analyze the large set of data for prediction of future trends and due to which knowledgeable decision should be created. Bigdata and hadoop are one of them. In this paper we have collected 5000 above tweets and then we have done pre-processing over it and then done sentimental analysis so as to get negative and positive tweets and then done prediction over it so as to get the people’s sentiments over a particular person.

Download Full-text

Association Rule Hiding Methods

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch012 ◽

2011 ◽

pp. 71-75

Author(s):

Vassilios S. Verykios

Keyword(s):

Data Mining ◽

Data Analysis ◽

Research Area ◽

Privacy And Security ◽

The Public ◽

Use Of Data ◽

Processing Power ◽

Interesting Approach ◽

Data Auditing ◽

New Research

The enormous expansion of data collection and storage facilities has created an unprecedented increase in the need for data analysis and processing power. Data mining has long been the catalyst for automated and sophisticated data analysis and interrogation. Recent advances in data mining and knowledge discovery have generated controversial impact in both scientific and technological arenas. On the one hand, data mining is capable of analyzing vast amounts of information within a minimum amount of time, an analysis that has exceeded the expectations of even the most imaginative scientists of the last decade. On the other hand, the excessive processing power of intelligent algorithms which is brought with this new research area puts at risk sensitive and confidential information that resides in large and distributed data stores. Privacy and security risks arising from the use of data mining techniques have been first investigated in an early paper by O’ Leary (1991). Clifton & Marks (1996) were the first to propose possible remedies to the protection of sensitive data and sensitive knowledge from the use of data mining. In particular, they suggested a variety of ways like the use of controlled access to the data, fuzzification of the data, elimination of unnecessary groupings in the data, data augmentation, as well as data auditing. A subsequent paper by Clifton (2000) made concrete early results in the area by demonstrating an interesting approach for privacy protection that relies on sampling. A main result of Clifton’s paper was to show how to determine the right sample size of the public data (data to be disclosed to the public where sensitive information has been trimmed off), by estimating at the same time the error that is introduced from the sampling to the significance of the rules. Agrawal and Srikant (2000) were the first to establish a new research area, the privacy preserving data mining, which had as its goal to consider privacy and confidentiality issues originating in the mining of the data. The authors proposed an approach known as data perturbation that relies on disclosing a modified database with noisy data instead of the original database. The modified database could produce very similar patterns with those of the original database.

Download Full-text

Data mining for marine data analysis

Russian journal of resources conservation and recycling ◽

10.15862/06inor121 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Valery Maximov ◽

Kseniya Reznikova ◽

Dmitry Popov

Keyword(s):

Data Mining ◽

Data Analysis ◽

Information Technologies ◽

Software Tool ◽

Production Costs ◽

Highly Qualified ◽

Use Of Data ◽

Marine Data ◽

Definition Of ◽

Effective Use

There is practically no industry left where modern information technologies would not be used. Data mining approaches are very popular today. Using this technology allows to transform huge amounts of data into useful information. In the article, the authors present the definition of data mining technology and frequently used methods. Some of the popular data mining techniques include classification, clustering, machine learning, and prediction. The authors paid special attention to such a clustering method as the k-means. The algorithm’s essence is to distribute the dataset into clusters. The finished results can be visualized and detect the scatter by naked eye, which implies heterogeneity in the data. By further investigating these variations, the analyst can find errors and weaknesses in the study area according to the task at hand. Accurate and complete data is essential in maritime activities. In the field of shipbuilding data analysis and well-made operational decisions can affect the speed and quality of ship construction or even reduce production costs. In shipping and logistics, they can be used to optimize routes and improve the safety of seafarers. Effective use of data mining usually requires highly qualified database specialists and programmers. In this work, the authors have demonstrated a variant of using the Orange Data Mining software tool. This program does not require programming skills from the user, which makes it a useful tool for people far from writing program code. The article explores the application of the Orange Data Mining program for automated mining of marine data. The results obtained show that the program can be effectively used in maritime activities.

Download Full-text

Use of Data Mining Techniques for Network Data Analysis

10.1109/kit52904.2021.9583755 ◽

2021 ◽

Author(s):

Julius Barath ◽

Miroslav Liska

Keyword(s):

Data Mining ◽

Data Analysis ◽

Network Data ◽

Data Mining Techniques ◽

Use Of Data

Download Full-text

Using Predictive Data Mining Models for Data Analysis in a Logistics Company

Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-67220-5_15 ◽

2017 ◽

pp. 161-170 ◽

Cited By ~ 1

Author(s):

Miroslava Muchová ◽

Ján Paralič ◽

Michael Nemčík

Keyword(s):

Data Mining ◽

Data Analysis ◽

Logistics Company ◽

Predictive Data Mining

Download Full-text

Research of the possibilities of application of the Data Warehouse in the construction area

MATEC Web of Conferences ◽

10.1051/matecconf/201825103062 ◽

2018 ◽

Vol 251 ◽

pp. 03062 ◽

Cited By ~ 7

Author(s):

Alexandr Konikov ◽

Ekaterina Kulikova ◽

Olga Stifeeva

Keyword(s):

Data Mining ◽

Big Data ◽

Data Analysis ◽

Data Warehouse ◽

Information Technologies ◽

Use Of Data ◽

Building Area ◽

Olap Analysis ◽

Rapid Processing

Today, in information technologies, the direction associated with the use of Data Warehouse (DW) is evolving very dynamically. Using DW, it is possible to implement two types of data analysis: OLAP-analysis: a set of technologies for the rapid processing of data presented as a multidimensional cube; Data Mining is an intelligent, deep analysis of data to detect previously unknown, practically useful patterns (in our case, the construction area). It is noted, that of all the methods used in technology Data Mining, cluster analysis is especially useful for the construction area. At present, the role of DW has increased, significantly due to the fact, that many methods and approaches of Data Mining have formed the basis of a new, promising method of Big Data. We will specify that, that Data processing from the Data Warehouse with the help of technology Big Data, allows to deduce researches in a building area to the higher level. The purpose of this work is to research of the possibilities of application of the Data Warehouse in the construction area. The article suggests the new approach to data analysis in the construction area, based on the use of Big Data technology and elements of OLAP - analysis. In the section “Discussion” is considering the possibility of the new promising business in the construction field, based on the application of Data Warehouse and technology Big Data.

Download Full-text

Fraud Detection in Healthcare System using Symbolic Data Analysis

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h9269.0710921 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1-7

Author(s):

Sahana Munavalli ◽

◽

Sanjeevakumar M. Hatture ◽

Keyword(s):

Data Mining ◽

Health Insurance ◽

Data Analysis ◽

Claim Data ◽

Symbolic Data Analysis ◽

Standard Data ◽

Symbolic Data ◽

Large Sets ◽

Strenuous Work ◽

Insurance Claim Data

In the era of digitization the frauds are found in all categories of health insurance. It is finished next to deliberate trickiness or distortion for acquiring some pitiful advantage in the form of health expenditures. Bigdata analysis can be utilized to recognize fraud in large sets of insurance claim data. In light of a couple of cases that are known or suspected to be false, the anomaly detection technique computes the closeness of each record to be fake by investigating the previous insurance claims. The investigators would then be able to have a nearer examination for the cases that have been set apart by data mining programming. One of the issues is the abuse of the medical insurance systems. Manual detection of frauds in the healthcare industry is strenuous work. Fraud and Abuse in the Health care system have become a significant concern and that too inside health insurance organizations, from the most recent couple of years because of the expanding misfortunes in incomes, handling medical claims have become a debilitating manual assignment, which is done by a couple of clinical specialists who have the duty of endorsing, adjusting, or dismissing the appropriations mentioned inside a restricted period from their gathering. Standard data mining techniques at this point do not sufficiently address the intricacy of the world. In this way, utilizing Symbolic Data Analysis is another sort of data analysis that permits us to address the intricacy of the real world and to recognize misrepresentation in the dataset.

Download Full-text