Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics

Author(s):  
Jing Lu ◽  
Alan Hales ◽  
David Rew
Author(s):  
Katrina E. Barkwell ◽  
Alfredo Cuzzocrea ◽  
Carson K. Leung ◽  
Ashley A. Ocran ◽  
Jennifer M. Sanderson ◽  
...  

2020 ◽  
Author(s):  
Alessandra Maciel Paz Milani ◽  
Fernando V. Paulovich ◽  
Isabel Harb Manssour

Analyzing and managing raw data are still a challenging part of the data analysis process, mainly regarding data preprocessing. Although we can find studies proposing design implications or recommendations for visualization solutions in the data analysis scope, they do not focus on challenges during the preprocessing phase. Likewise, the current Visual Analytics processes do not consider preprocessing an equally important stage in their process. Thus, with this study, we aim to contribute to the discussion of how we can use and combine methods of visualization and data mining to assist data analysts during the preprocessing activities. To achieve that, we introduce the Preprocessing Profiling Model for Visual Analytics, which contemplates a set of features to inspire the implementation of new solutions. In turn, these features were designed considering a list of insights we obtained during an interview study with thirteen data analysts. Our contributions can be summarized as offering resources to promote a shift to a visual preprocessing.


2021 ◽  
Author(s):  
Ekaterina Chuprikova ◽  
Abraham Mejia Aguilar ◽  
Roberto Monsorno

<p>Increasing agricultural production challenges, such as climate change, environmental concerns, energy demands, and growing expectations from consumers triggered the necessity for innovation using data-driven approaches such as visual analytics. Although the visual analytics concept was introduced more than a decade ago, the latest developments in the data mining capacities made it possible to fully exploit the potential of this approach and gain insights into high complexity datasets (multi-source, multi-scale, and different stages). The current study focuses on developing prototypical visual analytics for an apple variety testing program in South Tyrol, Italy. Thus, the work aims (1) to establish a visual analytics interface enabled to integrate and harmonize information about apple variety testing and its interaction with climate by designing a semantic model; and (2) to create a single visual analytics user interface that can turn the data into knowledge for domain experts. </p><p>This study extends the visual analytics approach with a structural way of data organization (ontologies), data mining, and visualization techniques to retrieve knowledge from an extensive collection of apple variety testing program and environmental data. The prototype stands on three main components: ontology, data analysis, and data visualization. Ontologies provide a representation of expert knowledge and create standard concepts for data integration, opening the possibility to share the knowledge using a unified terminology and allowing for inference. Building upon relevant semantic models (e.g., agri-food experiment ontology, plant trait ontology, GeoSPARQL), we propose to extend them based on the apple variety testing and climate data. Data integration and harmonization through developing an ontology-based model provides a framework for integrating relevant concepts and relationships between them, data sources from different repositories, and defining a precise specification for the knowledge retrieval. Besides, as the variety testing is performed on different locations, the geospatial component can enrich the analysis with spatial properties. Furthermore, the visual narratives designed within this study will give a better-integrated view of data entities' relations and the meaningful patterns and clustering based on semantic concepts.</p><p>Therefore, the proposed approach is designed to improve decision-making about variety management through an interactive visual analytics system that can answer "what" and "why" about fruit-growing activities. Thus, the prototype has the potential to go beyond the traditional ways of organizing data by creating an advanced information system enabled to manage heterogeneous data sources and to provide a framework for more collaborative scientific data analysis. This study unites various interdisciplinary aspects and, in particular: Big Data analytics in the agricultural sector and visual methods; thus, the findings will contribute to the EU priority program in digital transformation in the European agricultural sector.</p><p>This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 894215.</p>


Author(s):  
Carson K. Leung ◽  
Christopher L. Carmichael ◽  
Yaroslav Hayduk ◽  
Fan Jiang ◽  
Vadim V. Kononov ◽  
...  

Web Services ◽  
2019 ◽  
pp. 618-638
Author(s):  
Goran Klepac ◽  
Kristi L. Berg

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Konstantinos F. Xylogiannopoulos ◽  
Panagiotis Karampelas ◽  
Reda Alhajj

Abstract Background The first half of 2020 has been marked as the era of COVID-19 pandemic which affected the world globally in almost every aspect of the daily life from societal to economical. To prevent the spread of COVID-19, countries have implemented diverse policies regarding Non-Pharmaceutical Intervention (NPI) measures. This is because in the first stage countries had limited knowledge about the virus and its contagiousness. Also, there was no effective medication or vaccines. This paper studies the effectiveness of the implemented policies and measures against the deaths attributed to the virus between January and May 2020. Methods Data from the European Centre for Disease Prevention and Control regarding the identified cases and deaths of COVID-19 from 48 countries have been used. Additionally, data concerning the NPI measures related policies implemented by the 48 countries and the capacity of their health care systems was collected manually from their national gazettes and official institutes. Data mining, time series analysis, pattern detection, machine learning, clustering methods and visual analytics techniques have been applied to analyze the collected data and discover possible relationships between the implemented NPIs and COVID-19 spread and mortality. Further, we recorded and analyzed the responses of the countries against COVID-19 pandemic, mainly in urban areas which are over-populated and accordingly COVID-19 has the potential to spread easier among humans. Results The data mining and clustering analysis of the collected data showed that the implementation of the NPI measures before the first death case seems to be very effective in controlling the spread of the disease. In other words, delaying the implementation of the NPI measures to after the first death case has practically little effect on limiting the spread of the disease. The success of implementing the NPI measures further depends on the way each government monitored their application. Countries with stricter policing of the measures seems to be more effective in controlling the transmission of the disease. Conclusions The conducted comparative data mining study provides insights regarding the correlation between the early implementation of the NPI measures and controlling COVID-19 contagiousness and mortality. We reported a number of useful observations that could be very helpful to the decision makers or epidemiologists regarding the rapid implementation and monitoring of the NPI measures in case of a future wave of COVID-19 or to deal with other unknown infectious pandemics. Regardless, after the first wave of COVID-19, most countries have decided to lift the restrictions and return to normal. This has resulted in a severe second wave in some countries, a situation which requires re-evaluating the whole process and inspiring lessons for the future.


Author(s):  
Giovanni Felici ◽  
Klaus Truemper

The method described in this chapter is designed for data mining and learning on logic data. This type of data is composed of records that can be described by the presence or absence of a finite number of properties. Formally, such records can be described by variables that may assume only the values true or false, usually referred to as logic (or Boolean) variables. In real applications, it may also happen that the presence or absence of some property cannot be verified for some record; in such a case we consider that variable to be unknown (the capability to treat formally data with missing values is a feature of logic-based methods). For example, to describe patient records in medical diagnosis applications, one may use the logic variables healthy, old, has_high_temperature, among many others. A very common data mining task is to find, based on training data, the rules that separate two subsets of the available records, or explains the belonging of the data to one subset or the other. For example, one may desire to find a rule that, based one the many variables observed in patient records, is able to distinguish healthy patients from sick ones. Such a rule, if sufficiently precise, may then be used to classify new data and/or to gain information from the available data. This task is often referred to as machine learning or pattern recognition and accounts for a significant portion of the research conducted in the data mining community. When the data considered is in logic form or can be transformed into it by some reasonable process, it is of great interest to determine explanatory rules in the form of the combination of logic variables, or logic formulas. In the example above, a rule derived from data could be:if (has_high_temperature is true) and (running_nose is true) then (the patient is not healthy).


Author(s):  
Dario Antonelli ◽  
Elena Baralis ◽  
Giulia Bruno ◽  
Silvia Chiusano ◽  
Naeem A. Mahoto ◽  
...  

With the introduction of electronic medical records, a large amount of patients’ medical data has been available. An actual problem in this domain is to perform reverse engineering of the medical treatment process to highlight medical pathways typically adopted for specific health conditions. This chapter addresses the ability of sequential data mining techniques to reconstruct the actual medical pathways followed by patients. Detected medical pathways are in the form of sets of exams frequently done together, sequences of exam sets frequently followed by patients and frequent correlations between exam sets. The analysis shows that the majority of the extracted pathways are consistent with the medical guidelines, but also reveals some unexpected results, which can be useful both to enrich existing guidelines and to improve the public sanitary service.


Sign in / Sign up

Export Citation Format

Share Document