Violation of Homogeneity: A Methodologic Issue in the Use of Data Mining Tools

Drug Safety ◽  
2003 ◽  
Vol 26 (5) ◽  
pp. 363-364 ◽  
Author(s):  
David E Lilienfeld ◽  
Savian Nicholas ◽  
Daniel J Macneil ◽  
Olga Kurjatkin ◽  
Thomas Gelardin
Author(s):  
Yehuda Lindell

The increasing use of data mining tools in both the public and private sectors raises concerns regarding the potentially sensitive nature of much of the data being mined. The utility to be gained from widespread data mining seems to come into direct conflict with an individual’s need and right to privacy. Privacy preserving data mining solutions achieve the somewhat paradoxical property of enabling a data mining algorithm to use data without ever actually “seeing” it. Thus, the benefits of data mining can be enjoyed, without compromising the privacy of concerned individuals.


Author(s):  
Yehida Lindell

The increasing use of data-mining tools in both the public and private sectors raises concerns regarding the potentially sensitive nature of much of the data being mined. The utility to be gained from widespread data mining seems to come into direct conflict with an individual’s need and right to privacy. Privacy-preserving data-mining solutions achieve the somewhat paradoxical property of enabling a data-mining algorithm to use data without ever actually seeing it. Thus, the benefits of data mining can be enjoyed without compromising the privacy of concerned individuals.


2022 ◽  
Vol 21 (4) ◽  
pp. 346-363
Author(s):  
Hubert Anysz

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods


Author(s):  
Satish Kumar David ◽  
Amr T. M. Saeb ◽  
Mohamed Rafiullah ◽  
Khalid Rubeaan

Increasing volumes of data with the increased availability information mandates the use of data mining techniques in order to gather useful information from the datasets. In this chapter, data mining techniques are described with a special emphasis on classification techniques as one important supervised learning technique. Bioinformatics tools in the field for medical applications especially in medical microbiology are discussed. This chapter presents WEKA software as a tool of choice to perform classification analysis for different kinds of available data. Uses of WEKA data mining tools for biological applications such as genomic analysis and for medical applications such as diabetes are discussed. Data mining offers novel tools for medical applications for infectious diseases; it can help in identifying the pathogen and analyzing the drug resistance pattern. For non-communicable diseases such as diabetes, it provides excellent data analysis options for analyzing large volumes of data from many clinical studies.


2020 ◽  
pp. 13-19
Author(s):  
R.T. Alimkhanov ◽  
◽  
V.V. Rozhkova ◽  
R.F. Mazitov ◽  
◽  
...  

2018 ◽  
Vol 173 ◽  
pp. 03013
Author(s):  
Igor Kirilyuk ◽  
Anna Kuznetsova ◽  
Oleg Senko

The paper discusses problems associated with the use of data mining tools to study discrepancies between countries with different types of institutional matrices by variety of potential explanatory variables: climate, economic or infrastructure indicators. An approach is presented which is based on the search of statistically valid regularities describing the dependence of the institutional type on a single variable or a pair of variables. Examples of regularities are given.


Sign in / Sign up

Export Citation Format

Share Document