Visualization of Clusters in an Educational Data Set Based on Convex-Hull Shape Preservation Algorithm

Author(s):  
Marcelo Keese Albertini ◽  
André Ricardo Backes

We study the problem of visualization of clusters in an educational data set based on convex-hull shape preservation algorithm. This problem considers multidimensional data with pre-established classes with the requirement of elements of different classes must be presented at distinctive regions. Such problem is commonly found on economic and social data, where visualization is important to understand a phenomenon before further analysis. In this paper, we propose an algorithm that uses a nonlinear transformation to preserve some data distance properties and display in a convenient format to interpretation. The proposed visualization algorithm is a partition-conforming projection, as defined by Kleinberg [An impossibility theorem for clustering, Adv. Neural Inform. Processing Syst. 15: Proc. 2002 Conf., 2003, The MIT Press, p. 463.], and completely separates the convex hull of data classes by applying locally linear operations. We applied this algorithm to visualize data from an important exam applied for over four million students of the Brazilian educational system Exame Nacional do Ensino Médio (ENEM). Results show that the proposed algorithm successfully separates unintelligible data and presents it more accessible to further visual analysis.

2021 ◽  
pp. 1-11
Author(s):  
Yanan Huang ◽  
Yuji Miao ◽  
Zhenjing Da

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.


Obesity Facts ◽  
2021 ◽  
pp. 1-11
Author(s):  
Marijn Marthe Georgine van Berckel ◽  
Saskia L.M. van Loon ◽  
Arjen-Kars Boer ◽  
Volkher Scharnhorst ◽  
Simon W. Nienhuijs

<b><i>Introduction:</i></b> Bariatric surgery results in both intentional and unintentional metabolic changes. In a high-volume bariatric center, extensive laboratory panels are used to monitor these changes pre- and postoperatively. Consecutive measurements of relevant biochemical markers allow exploration of the health state of bariatric patients and comparison of different patient groups. <b><i>Objective:</i></b> The objective of this study is to compare biomarker distributions over time between 2 common bariatric procedures, i.e., sleeve gastrectomy (SG) and gastric bypass (RYGB), using visual analytics. <b><i>Methods:</i></b> Both pre- and postsurgical (6, 12, and 24 months) data of all patients who underwent primary bariatric surgery were collected retrospectively. The distribution and evolution of different biochemical markers were compared before and after surgery using asymmetric beanplots in order to evaluate the effect of primary SG and RYGB. A beanplot is an alternative to the boxplot that allows an easy and thorough visual comparison of univariate data. <b><i>Results:</i></b> In total, 1,237 patients (659 SG and 578 RYGB) were included. The sleeve and bypass groups were comparable in terms of age and the prevalence of comorbidities. The mean presurgical BMI and the percentage of males were higher in the sleeve group. The effect of surgery on lowering of glycated hemoglobin was similar for both surgery types. After RYGB surgery, the decrease in the cholesterol concentration was larger than after SG. The enzymatic activity of aspartate aminotransferase, alanine aminotransferase, and alkaline phosphate in sleeve patients was higher presurgically but lower postsurgically compared to bypass values. <b><i>Conclusions:</i></b> Beanplots allow intuitive visualization of population distributions. Analysis of this large population-based data set using beanplots suggests comparable efficacies of both types of surgery in reducing diabetes. RYGB surgery reduced dyslipidemia more effectively than SG. The trend toward a larger decrease in liver enzyme activities following SG is a subject for further investigation.


2019 ◽  
Vol 2 ◽  
pp. 1-6
Author(s):  
Wenjuan Lu ◽  
Aiguo Liu ◽  
Chengcheng Zhang

<p><strong>Abstract.</strong> With the development of geographic information technology, the way to get geographical information is constantly, and the data of space-time is exploding, and more and more scholars have started to develop a field of data processing and space and time analysis. In this, the traditional data visualization technology is high in popularity and simple and easy to understand, through simple pie chart and histogram, which can reveal and analyze the characteristics of the data itself, but still cannot combine with the map better to display the hidden time and space information to exert its application value. How to fully explore the spatiotemporal information contained in massive data and accurately explore the spatial distribution and variation rules of geographical things and phenomena is a key research problem at present. Based on this, this paper designed and constructed a universal thematic data visual analysis system that supports the full functions of data warehousing, data management, data analysis and data visualization. In this paper, Weifang city is taken as the research area, starting from the aspects of rainfall interpolation analysis and population comprehensive analysis of Weifang, etc., the author realizes the fast and efficient display under the big data set, and fully displays the characteristics of spatial and temporal data through the visualization effect of thematic data. At the same time, Cassandra distributed database is adopted in this research, which can also store, manage and analyze big data. To a certain extent, it reduces the pressure of front-end map drawing, and has good query analysis efficiency and fast processing ability.</p>


2021 ◽  
Author(s):  
Jan Michalek ◽  
Kuvvet Atakan ◽  
Christian Rønnevik ◽  
Helga Indrøy ◽  
Lars Ottemøller ◽  
...  

&lt;p&gt;The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aim of the Norwegian EPOS e&amp;#8209;infrastructure is to integrate data from the seismological and geodetic networks, as well as the data from the geological and geophysical data repositories. Among the six EPOS-N project partners, four institutions are providing data &amp;#8211; University of Bergen (UIB), - Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR.&lt;/p&gt;&lt;p&gt;In this contribution, we present the EPOS-Norway Portal as an online, open access, interactive tool, allowing visual analysis of multidimensional data. It supports maps and 2D plots with linked visualizations. Currently access is provided to more than 300 datasets (18 web services, 288 map layers and 14 static datasets) from four subdomains of Earth science in Norway. New datasets are planned to be integrated in the future. EPOS-N Portal can access remote datasets via web services like FDSNWS for seismological data and OGC services for geological and geophysical data (e.g. WMS). Standalone datasets are available through preloaded data files. Users can also simply add another WMS server or upload their own dataset for visualization and comparison with other datasets. This portal provides unique way (first of its kind in Norway) for exploration of various geoscientific datasets in one common interface. One of the key aspects is quick simultaneous visual inspection of data from various disciplines and test of scientific or geohazard related hypothesis. One of such examples can be spatio-temporal correlation of earthquakes (1980 until now) with existing critical infrastructures (e.g. pipelines), geological structures, submarine landslides or unstable slopes. &amp;#160;&lt;/p&gt;&lt;p&gt;The EPOS-N Portal is implemented by adapting Enlighten-web, a server-client program developed by NORCE. Enlighten-web facilitates interactive visual analysis of large multidimensional data sets, and supports interactive mapping of millions of points. The Enlighten-web client runs inside a web browser. An important element in the Enlighten-web functionality is brushing and linking, which is useful for exploring complex data sets to discover correlations and interesting properties hidden in the data. The views are linked to each other, so that highlighting a subset in one view automatically leads to the corresponding subsets being highlighted in all other linked views.&lt;/p&gt;


2016 ◽  
Vol 78 (11) ◽  
Author(s):  
Amolkumar Narayan Jadhav ◽  
Gomathi N.

Clustering finds variety of application in a wide range of disciplines because it is mostly helpful for grouping of similar data objects together. Due to the wide applicability, different algorithms have been presented in the literature for segmenting large multidimensional data into discernible representative clusters. Accordingly, in this paper, Kernel-based exponential grey wolf optimizer (KEGWO) is developed for rapid centroid estimation in data clustering. Here, KEGWO is newly proposed to search the cluster centroids with a new objective evaluation which considered two parameters called logarithmic kernel function and distance difference between two top clusters. Based on the new objective function and the modified KEGWO algorithm, centroids are encoded as position vectors and the optimal location is found for the final clustering. The proposed KEGWO algorithm is evaluated with banknote authentication Data Set, iris dataset and wine dataset using four metrics such as, Mean Square Error, F-measure, Rand co-efficient and jaccord coefficient. From the outcome, we proved that the proposed KEGWO algorithm outperformed the existing algorithms.   


2013 ◽  
Vol 1 (1) ◽  
pp. 7 ◽  
Author(s):  
Casimiro S. Munita ◽  
Lúcia P. Barroso ◽  
Paulo M.S. Oliveira

Several analytical techniques are often used in archaeometric studies, and when used in combination, these techniques can be used to assess 30 or more elements. Multivariate statistical methods are frequently used to interpret archaeometric data, but their applications can be problematic or difficult to interpret due to the large number of variables. In general, the analyst first measures several variables, many of which may be found to be uninformative, this is naturally very time consuming and expensive. In subsequent studies the analyst may wish to measure fewer variables while attempting to minimize the loss of essential information. Such multidimensional data sets must be closely examined to draw useful information. This paper aims to describe and illustrate a stopping rule for the identification of redundant variables, and the selection of variables subsets, preserving multivariate data structure using Procrustes analysis, selecting those variables that are in some senses adequate for discrimination purposes. We provide an illustrative example of the procedure using a data set of 40 samples in which were determined the concentration of As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, and U obtained via instrumental neutron activation analysis (INAA) on archaeological ceramic samples. The results showed that for this data set, only eight variables (As, Cr, Fe, Hf, La, Nd, Sm, and Th) are required to interpret the data without substantial loss information.


2013 ◽  
Vol 12 (3-4) ◽  
pp. 291-307 ◽  
Author(s):  
Ilir Jusufi ◽  
Andreas Kerren ◽  
Falk Schreiber

Ontologies and hierarchical clustering are both important tools in biology and medicine to study high-throughput data such as transcriptomics and metabolomics data. Enrichment of ontology terms in the data is used to identify statistically overrepresented ontology terms, giving insight into relevant biological processes or functional modules. Hierarchical clustering is a standard method to analyze and visualize data to find relatively homogeneous clusters of experimental data points. Both methods support the analysis of the same data set but are usually considered independently. However, often a combined view is desired: visualizing a large data set in the context of an ontology under consideration of a clustering of the data. This article proposes new visualization methods for this task. They allow for interactive selection and navigation to explore the data under consideration as well as visual analysis of mappings between ontology- and cluster-based space-filling representations. In this context, we discuss our approach together with specific properties of the biological input data and identify features that make our approach easily usable for domain experts.


Sign in / Sign up

Export Citation Format

Share Document