Categorical Data Skyline Using Classification Tree

Author(s):  
Wookey Lee ◽  
Justin JongSu Song ◽  
Carson K. -S. Leung
2020 ◽  
Vol 90 (3-4) ◽  
pp. 195-199 ◽  
Author(s):  
Gaelle Chevallereau ◽  
Mathilde Legeay ◽  
Guillaume T. Duval ◽  
Spyridon N. Karras ◽  
Bruno Fantino ◽  
...  

Abstract. Despite the high prevalence of hypovitaminosis D in older adults, universal vitamin D supplementation is not recommended due to potential risk of intoxication. Our aim here was to determine the clinical profiles of older community-dwellers with hypovitaminosis D. The perspective is to build novel strategies to screen for and supplement those with hypovitaminosis D. A classification tree (CHAID analysis) was performed on multiple datasets standardizedly collected from 1991 older French community-dwelling volunteers ≥ 65 years in 2009–2012. Hypovitaminosis D was defined as serum 25-hydroxyvitamin D ≤ 50 nmol/L. CHAID analysis retained 5 clinical profiles of older community-dwellers with different risks of hypovitaminosis D up to 87.3%, based on various combinations of the following characteristics: polymorbidity, obesity, sadness and gait disorders. For instance, the probability of hypovitaminosis D was 1.42-fold higher [95CI: 1.27–1.59] for those with polymorbidity and gait disorders compared to those with no polymorbidity, no obesity and no sadness. In conclusion, these easily-recordable measures may be used in clinical routine to identify older community-dwellers for whom vitamin D supplementation should be initiated.


2020 ◽  
Vol 64 (4) ◽  
pp. 40404-1-40404-16
Author(s):  
I.-J. Ding ◽  
C.-M. Ruan

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.


2016 ◽  
Vol 7 (2) ◽  
pp. 75-80
Author(s):  
Adhi Kusnadi ◽  
Risyad Ananda Putra

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability


2014 ◽  
Vol 24 (11) ◽  
pp. 2628-2641 ◽  
Author(s):  
Li-Fei CHEN ◽  
Gong-De GUO
Keyword(s):  

2020 ◽  
Vol 13 (5) ◽  
pp. 1020-1030
Author(s):  
Pradeep S. ◽  
Jagadish S. Kallimani

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.


1986 ◽  
Vol 62 (2) ◽  
pp. 192 ◽  
Author(s):  
Joel L. Horowitz ◽  
Neil Wrigley

Sign in / Sign up

Export Citation Format

Share Document