visual data mining
Recently Published Documents


TOTAL DOCUMENTS

252
(FIVE YEARS 20)

H-INDEX

18
(FIVE YEARS 2)

Author(s):  
Anusha Apparaju

The ultimate aim of every industry and organization is to make profits by attracting a greater number of customers. To to achieve this motto they need to analyze the priorities of the customer. This is usually done by doing marketing everywhere such as in social media, newspaper, sites, etc. The marketers keep advertising at many sites without even knowing whether the advertisement is useful at that platform or not. Hence, making such huge investments in advertisements at wrong platforms will lead to less profits. There is a need for the companies to provide customer services in such a way that the customer doesn’t lose his interest and trust and must maintain a healthy relationship. If such services are provided equally to every customer, then there is a possibility that the company might provide its service to those customers who bring low profits and keep the customers who make high profits in waiting. To avoid such discrepancies, categorization of customers can be done based on their priority. This can be achieved using the Clustering technique, k-means algorithm. Since the customer data is unsupervised, k-means helps us to cluster them. If we use supervised data, then prediction of new customer’s priority can also be done using K nearest neighbors’ algorithm. The exploration of deep insights of data using exploratory data analysis makes it easy to understand data using visual representations [3][4][6]. These visual representations also lead to less time consumption for exploratory data analysis.


2021 ◽  
Vol 117 (3/4) ◽  
Author(s):  
Annah V. Bengesai ◽  
Jonathan Pocock

Globally, there is growing concern about student progression in most higher education institutions. In this study, we examined patterns of persistence among students who began their engineering degree at the University of KwaZulu-Natal (UKZN) in 2012 and 2013. The sample was restricted to 1370 incoming students who were tracked to 2019, allowing for a 7-year graduation period for the initial cohort. The data were analysed using descriptive statistics as well as the decision tree approach – a highly visual data-mining technique which helps identify subgroups and relationships that are often difficult to detect through traditional statistical methods. The results from these analyses indicate that up to 50% of students enrolled in the School of Engineering had chosen engineering as their first choice. Approximately 40% had persisted in engineering, 50% had withdrawn by the time of this survey, while the remaining 10% were still registered in the engineering programme. Departure from engineering occurs most in the first year, while graduation most likely occurs after 5 years of registration. Student persistence in engineering can also be classified based on first-year accumulated credits, admission point scores, race, and financial aid, of which first-year accumulated credits is the most critical factor. Overall, our study suggests that understanding failure in the first year might be the missing link in our understanding of student persistence in engineering.


Metabolites ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Zhiqiang Pang ◽  
Guangyan Zhou ◽  
Jasmine Chong ◽  
Jianguo Xia

The novel coronavirus SARS-CoV-2 has spread across the world since 2019, causing a global pandemic. The pathogenesis of the viral infection and the associated clinical presentations depend primarily on host factors such as age and immunity, rather than the viral load or its genetic variations. A growing number of omics studies have been conducted to characterize the host immune and metabolic responses underlying the disease progression. Meta-analyses of these datasets have great potential to identify robust molecular signatures to inform clinical care and to facilitate therapeutics development. In this study, we performed a comprehensive meta-analysis of publicly available global metabolomics datasets obtained from three countries (United States, China and Brazil). To overcome high heterogeneity inherent in these datasets, we have (a) implemented a computational pipeline to perform consistent raw spectra processing; (b) conducted meta-analyses at pathway levels instead of individual feature levels; and (c) performed visual data mining on consistent patterns of change between disease severities for individual studies. Our analyses have yielded several key metabolic signatures characterizing disease progression and clinical outcomes. Their biological interpretations were discussed within the context of the current literature. To the best of our knowledge, this is the first comprehensive meta-analysis of global metabolomics datasets of COVID-19.


2020 ◽  
Vol 9 (3) ◽  
pp. 91-95
Author(s):  
Chen Qian ◽  
Jayesh P. Rai ◽  
Jianmin Pan ◽  
Aruni Bhatnagar ◽  
Craig J. McClain ◽  
...  

Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.


Constant streaming of data for any instances at such high volumes provides insight in various organizations. Analyzing and identifying the pattern from the huge volumes of data has become difficult with its raw form of data. Visualization of information and visual data mining helps to deal with the flood of information. Constant streaming of data for any instances at such high volumes provides insight in various organizations. Analyzing and identifying the pattern from the huge volumes of data has become difficult with its raw form of data. Visualization of information and visual data mining helps to deal with the flood of information. Visual data representation takes the data and its results to all the stakeholders in a meaningful manner which comes out of the data mining process. Recent developments have brought a large number of information visualization techniques to explore the large data sets which can be converted into useful information and knowledge. Observations and inspection data gathered from chemical and gas industries are being piled up on a daily basis as raw data. Continuous analysis is a new term evolving in the industry which continuously performs on the streaming data to have real-time analysis and prediction on-live. In this paper, usage of the various graphing model as per the respective information obtained from the organization have been discussed and justified. It also describes the value addition in making the decisions by representations through graphs and charts for better understanding. Heatmap, Scattergram and customized Radar plots the analyzed data as in the required format to visualize the prediction done for the occupational incidents in chemical and gas industries. As a result of the graphing model, representation provides a higher level of confidence in the findings of the analysis. This fact takes a better visual representation technique and transforms them to provide better results with faster processing and understanding.


Sign in / Sign up

Export Citation Format

Share Document