scholarly journals Preprocessing Profiling Model for Visual Analytics

Author(s):  
Alessandra Maciel Paz Milani ◽  
Fernando V. Paulovich ◽  
Isabel Harb Manssour

Analyzing and managing raw data are still a challenging part of the data analysis process, mainly regarding data preprocessing. Although we can find studies proposing design implications or recommendations for visualization solutions in the data analysis scope, they do not focus on challenges during the preprocessing phase. Likewise, the current Visual Analytics processes do not consider preprocessing an equally important stage in their process. Thus, with this study, we aim to contribute to the discussion of how we can use and combine methods of visualization and data mining to assist data analysts during the preprocessing activities. To achieve that, we introduce the Preprocessing Profiling Model for Visual Analytics, which contemplates a set of features to inspire the implementation of new solutions. In turn, these features were designed considering a list of insights we obtained during an interview study with thirteen data analysts. Our contributions can be summarized as offering resources to promote a shift to a visual preprocessing.

2021 ◽  
Author(s):  
Ekaterina Chuprikova ◽  
Abraham Mejia Aguilar ◽  
Roberto Monsorno

<p>Increasing agricultural production challenges, such as climate change, environmental concerns, energy demands, and growing expectations from consumers triggered the necessity for innovation using data-driven approaches such as visual analytics. Although the visual analytics concept was introduced more than a decade ago, the latest developments in the data mining capacities made it possible to fully exploit the potential of this approach and gain insights into high complexity datasets (multi-source, multi-scale, and different stages). The current study focuses on developing prototypical visual analytics for an apple variety testing program in South Tyrol, Italy. Thus, the work aims (1) to establish a visual analytics interface enabled to integrate and harmonize information about apple variety testing and its interaction with climate by designing a semantic model; and (2) to create a single visual analytics user interface that can turn the data into knowledge for domain experts. </p><p>This study extends the visual analytics approach with a structural way of data organization (ontologies), data mining, and visualization techniques to retrieve knowledge from an extensive collection of apple variety testing program and environmental data. The prototype stands on three main components: ontology, data analysis, and data visualization. Ontologies provide a representation of expert knowledge and create standard concepts for data integration, opening the possibility to share the knowledge using a unified terminology and allowing for inference. Building upon relevant semantic models (e.g., agri-food experiment ontology, plant trait ontology, GeoSPARQL), we propose to extend them based on the apple variety testing and climate data. Data integration and harmonization through developing an ontology-based model provides a framework for integrating relevant concepts and relationships between them, data sources from different repositories, and defining a precise specification for the knowledge retrieval. Besides, as the variety testing is performed on different locations, the geospatial component can enrich the analysis with spatial properties. Furthermore, the visual narratives designed within this study will give a better-integrated view of data entities' relations and the meaningful patterns and clustering based on semantic concepts.</p><p>Therefore, the proposed approach is designed to improve decision-making about variety management through an interactive visual analytics system that can answer "what" and "why" about fruit-growing activities. Thus, the prototype has the potential to go beyond the traditional ways of organizing data by creating an advanced information system enabled to manage heterogeneous data sources and to provide a framework for more collaborative scientific data analysis. This study unites various interdisciplinary aspects and, in particular: Big Data analytics in the agricultural sector and visual methods; thus, the findings will contribute to the EU priority program in digital transformation in the European agricultural sector.</p><p>This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 894215.</p>


2021 ◽  
Vol 13 (0203) ◽  
pp. 78-81
Author(s):  
Ashish P. Joshi ◽  
Biraj V. Patel

The model and pattern for real time data mining have an important role for decision making. The meaningful real time data mining is basically depends on the quality of data while row or rough data available at warehouse. The data available at warehouse can be in any format, it may huge or it may unstructured. These kinds of data require some process to enhance the efficiency of data analysis. The process to make it ready to use is called data preprocessing. There can be many activities for data preprocessing such as data transformation, data cleaning, data integration, data optimization and data conversion which are use to converting the rough data to quality data. The data preprocessing techniques are the vital step for the data mining. The analyzed result will be good as far as data quality is good. This paper is about the different data preprocessing techniques which can be use for preparing the quality data for the data analysis for the available rough data.


2008 ◽  
pp. 449-468 ◽  
Author(s):  
Bernd Knobloch

This chapter introduces a framework for organizational data analysis suited for data-driven and hypotheses-driven problems. It shows why knowledge discovery and hypothesis verification are complementary approaches and how they can be chained together. It presents a methodology for organizational data analysis including a comprehensive processing scheme. Employing a plug-in metaphor, data analysis process engineering is introduced as a way to set up data analysis processes based on taxonomies of tasks that have to be performed during data analysis and on the idea of re-using experience from past data analysis projects. The framework aims at increasing the benefits of data mining and other data analysis approaches, by allowing a wider range of business problems to be tackled and by providing the users with structured guidance for planning and running analyses.


2011 ◽  
pp. 334-356 ◽  
Author(s):  
Bernd Knobloch

This chapter introduces a framework for organizational data analysis suited for data-driven and hypotheses-driven problems. It shows why knowledge discovery and hypothesis verification are complementary approaches and how they can be chained together. It presents a methodology for organizational data analysis including a comprehensive processing scheme. Employing a plug-in metaphor, data analysis process engineering is introduced as a way to set up data analysis processes based on taxonomies of tasks that have to be performed during data analysis and on the idea of re-using experience from past data analysis projects. The framework aims at increasing the benefits of data mining and other data analysis approaches, by allowing a wider range of business problems to be tackled and by providing the users with structured guidance for planning and running analyses.


Author(s):  
Umadevi S ◽  
NirmalaSugirthaRajini

Now a day’s data mining concepts are applied in various fields like medical, agriculture, production, etc. Creation of cluster is one of the major problems in data analysis process. Various clustering algorithms are used for data analysis purpose which depends upon the applications. DBSCAN is the famous method to create cluster. This article describes DBSCAN clustering concept applied on production database. The main objective of this research article is to collect and group the related data from large amount of data and remove the unwanted data. This clustering algorithm removes the unwanted attributes and groups the related data based upon density value.


2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


2020 ◽  
Vol 4 ◽  
pp. 97-100
Author(s):  
A.P. Pronichev ◽  

The article discusses the architecture of a system for collecting and analyzing heterogeneous data from social networks. This architecture is a distributed system of subsystem modules, each of which is responsible for a separate task. The system also allows you to use external systems for data analysis, providing the necessary interface abstraction for connection. This allows for more flexible customization of the data analysis process and reduces development, implementation and support costs.


Sign in / Sign up

Export Citation Format

Share Document