Research on Latent Factor Model and its Optimization Algorithms of Machine Learning

2015 ◽  
Vol 734 ◽  
pp. 495-498
Author(s):  
Qing Feng Li

There are some bottleneck problems in the supervised machine learning and unsupervised machine learning. In view of the current problems, this paper tries to make some meaningful exploration. The main work is as follows: Research on the statistical analysis of factor analysis and latent variable and in some valuable research results of typical machine learning, and some no analysis method and factor analysis of supervised learning or hidden variables method to contact with the typical analysis, summary of the comprehensive characteristics of implicit factor model and to reveal the hiding data structures help and contributions.

2020 ◽  
Vol 17 (1) ◽  
pp. 35-43
Author(s):  
A. A. Mikryukov ◽  
M. S. Gasparian ◽  
D. S. Karpov

The purpose of the study. The purpose of the study is to develop scientifically based proposals to increase the university performance indicators in the international institutional rating QS to the required values, taking into account the presence of a combination of latent (hidden) factors, the degree of achievement of the set values of the basic indicators and, as a result, the university ranking level.Materials and methods. To achieve this goal, methods of statistical analysis (correlation-regression and factor analysis) were used, which made it possible to identify the degree of influence of latent factors on basic indicators and the main indicator (rating functional). During the study, the following tasks were solved: identification of latent factors affecting the basic indicators of the university, an assessment of their significance and degree of influence on the basic indicators, as well as their grouping. Based on the results of the correlation - regression and factor analysis, measures are formulated to achieve the specified values of the QS University institutional rating indicators.Results. An approach to solving the problem of providing conditions for achieving the required values of university performance indicators in the international institutional ranking QS using models developed based on the methods of correlation-regression and factor analysis is proposed. Estimates of the relationship of indicators and university ranking based on the methods of correlation and regression analysis are obtained. A comparative analysis of the results obtained at the universities of the reference group is made. The problem of identifying factors that influence the change in the values of indicators is solved; the degree of this influence is assessed. Based on the results obtained, reasonable proposals have been developed to achieve the required values of the basic indicators and the rating functional of the university.Conclusion. The results obtained in the course of the study made it possible to justify the measures necessary to solve the problem of achieving the specified performance indicators of the university. Based on the correlation model, correlation dependencies between the rating functional and basic indicators are obtained. Interpretation of the results of factor analysis allowed us to identify a set of factors that have a significant impact on the basic indicators. It is shown that measures to achieve the specified indicators must be carried out taking into account the revealed correlation dependencies between factors and basic indicators, as well as the interpretation results of the developed factor model.


2019 ◽  
Vol 8 (11) ◽  
pp. e298111473
Author(s):  
Hugo Kenji Rodrigues Okada ◽  
Andre Ricardo Nascimento das Neves ◽  
Ricardo Shitsuka

Decision trees are data structures or computational methods that enable nonparametric supervised machine learning and are used in classification and regression tasks. The aim of this paper is to present a comparison between the decision tree induction algorithms C4.5 and CART. A quantitative study is performed in which the two methods are compared by analyzing the following aspects: operation and complexity. The experiments presented practically equal hit percentages in the execution time for tree induction, however, the CART algorithm was approximately 46.24% slower than C4.5 and was considered to be more effective.


Animals ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 1687
Author(s):  
Giovanni P. Burrai ◽  
Andrea Gabrieli ◽  
Valentina Moccia ◽  
Valentina Zappulli ◽  
Ilaria Porcellato ◽  
...  

Canine mammary tumors (CMTs) represent a serious issue in worldwide veterinary practice and several risk factors are variably implicated in the biology of CMTs. The present study examines the relationship between risk factors and histological diagnosis of a large CMT dataset from three academic institutions by classical statistical analysis and supervised machine learning methods. Epidemiological, clinical, and histopathological data of 1866 CMTs were included. Dogs with malignant tumors were significantly older than dogs with benign tumors (9.6 versus 8.7 years, p < 0.001). Malignant tumors were significantly larger than benign counterparts (2.69 versus 1.7 cm, p < 0.001). Interestingly, 18% of malignant tumors were smaller than 1 cm in diameter, providing compelling evidence that the size of the tumor should be reconsidered during the assessment of the TNM-WHO clinical staging. The application of the logistic regression and the machine learning model identified the age and the tumor’s size as the best predictors with an overall diagnostic accuracy of 0.63, suggesting that these risk factors are sufficient but not exhaustive indicators of the malignancy of CMTs. This multicenter study increases the general knowledge of the main epidemiologica-clinical risk factors involved in the onset of CMTs and paves the way for further investigations of these factors in association with CMTs and in the application of machine learning technology.


2015 ◽  
Vol 14 (4) ◽  
pp. 101-108
Author(s):  
Pinchao Meng ◽  
Weishi Yin ◽  
Yanzhong Li

Abstract In this paper 12 economic indices of the software industry in 30 cities/provinces in China are used to set up an evaluation system for the competitiveness of the regional software industry. By using the statistical analysis method of factor analysis, an evaluation model of the comprehensive competitiveness of the software industry for each city/province is built. Taking Beijing and Shanghai as examples, the comprehensive competitiveness and problems of the software industry in Jilin province are compared and analyzed.


2020 ◽  
Vol 25 (4) ◽  
pp. 174-189 ◽  
Author(s):  
Guillaume  Palacios ◽  
Arnaud Noreña ◽  
Alain Londero

Introduction: Subjective tinnitus (ST) and hyperacusis (HA) are common auditory symptoms that may become incapacitating in a subgroup of patients who thereby seek medical advice. Both conditions can result from many different mechanisms, and as a consequence, patients may report a vast repertoire of associated symptoms and comorbidities that can reduce dramatically the quality of life and even lead to suicide attempts in the most severe cases. The present exploratory study is aimed at investigating patients’ symptoms and complaints using an in-depth statistical analysis of patients’ natural narratives in a real-life environment in which, thanks to the anonymization of contributions and the peer-to-peer interaction, it is supposed that the wording used is totally free of any self-limitation and self-censorship. Methods: We applied a purely statistical, non-supervised machine learning approach to the analysis of patients’ verbatim exchanged on an Internet forum. After automated data extraction, the dataset has been preprocessed in order to make it suitable for statistical analysis. We used a variant of the Latent Dirichlet Allocation (LDA) algorithm to reveal clusters of symptoms and complaints of HA patients (topics). The probability of distribution of words within a topic uniquely characterizes it. The convergence of the log-likelihood of the LDA-model has been reached after 2,000 iterations. Several statistical parameters have been tested for topic modeling and word relevance factor within each topic. Results: Despite a rather small dataset, this exploratory study demonstrates that patients’ free speeches available on the Internet constitute a valuable material for machine learning and statistical analysis aimed at categorizing ST/HA complaints. The LDA model with K = 15 topics seems to be the most relevant in terms of relative weights and correlations with the capability to individualizing subgroups of patients displaying specific characteristics. The study of the relevance factor may be useful to unveil weak but important signals that are present in patients’ narratives. Discussion/Conclusion: We claim that the LDA non-supervised approach would permit to gain knowledge on the patterns of ST- and HA-related complaints and on patients’ centered domains of interest. The merits and limitations of the LDA algorithms are compared with other natural language processing methods and with more conventional methods of qualitative analysis of patients’ output. Future directions and research topics emerging from this innovative algorithmic analysis are proposed.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yu Wu ◽  
Ying Wang ◽  
Jiazhen Hu ◽  
Yan Dang ◽  
Yuanyuan Zhang ◽  
...  

Abstract Background Breastfeeding plays an important role in the early stages of humans and throughout the development process. Breastfeeding competency is a self-assessment of pregnant women’s overall competency to breastfeeding which could predict the breastfeeding behaviours of pregnant women. However, a valid and reliable scale for assessing breastfeeding competency has not yet been developed and validated. This study was conducted to develop and validate an assessment scale designed to assess pregnant women’s breastfeeding competency in the third trimester: the Breastfeeding Competency Scale (BCS). Methods The BCS was developed and validated over three phases between September 2018 and September 2019, and these phases included item statistical analysis, exploratory factor analysis (EFA), content validation, internal consistency assessment, split-half reliability assessment and confirmatory factor analysis (CFA). Results The item statistical analysis and EFA resulted in 38 items and 4 factors that explained 66.489% of the total variance. The Cronbach’s α coefficients for the total scale and the 4 factors were 0.970, 0.960, 0.940, 0.822 and 0.931. The split-half reliability of the BCS was 0.894 and 0.890. CFA model showed that the 4-factor model fits the data well. Conclusions The BCS is a new valid and reliable instrument for assessing the breastfeeding competency of pregnant women in the third trimester.


Author(s):  
Qiao Dong ◽  
Xueqin Chen ◽  
Shi Dong ◽  
Jun Zhang

AbstractThis study extracted 16 climatic data variables including annual temperature, freeze thaw, precipitation, and snowfall conditions from the Long-term Pavement Performance (LTPP) program database to evaluate the climatic regionalization for pavement infrastructure. The effect and significance of climate change were firstly evaluated using time as the only predictor and t-test. It was found that both the temperature and humidity increased in most States. Around one third of the 800 weather stations record variation of freeze and precipitation classifications and a few of them show significant change of classifications over time based on the results of logistic regression analyses. Three unsupervised machine learning including Principle Component Analysis (PCA), factor analysis and cluster analysis were conducted to identify the main component and common factors for climatic variables, and then to classify datasets into different groups. Then, two supervised machine learning methods including Fisher’s discriminant analysis and Artificial Neural Networks (ANN) were adopted to predict the climatic regions based on climatic data. Results of PCA and factor analysis show that temperature and humidity are the first two principle components and common factors, accounting for 71.6% of the variance. The 4-means clusters include wet no freeze, dry no freeze, dry freeze and snow freeze. The best k-mean clustering suggested 9 clusters with more temperature clusters. Both the Fisher’s linear discriminant analysis and ANN can effectively predict climatic regions with multiple climatic variables. ANN performs better with higher R square and low misclassification rate, especially for those with more layers and nodes.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1012
Author(s):  
Sebastian Ciobanu ◽  
Liviu Ciortuz

Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors is greater than the number of samples because the closed-form solution contains a matrix inverse that is not defined when having more predictors than samples. The standard approach to solve this issue is using the Moore–Penrose inverse or the L2 regularization. We propose another solution starting from a machine learning model that, this time, is used in unsupervised learning performing a dimensionality reduction task or just a density estimation one—factor analysis (FA)—with one-dimensional latent space. The density estimation task represents our focus since, in this case, it can fit a Gaussian distribution even if the dimensionality of the data is greater than the number of samples; hence, we obtain this advantage when creating the supervised counterpart of factor analysis, which is linked to linear regression. We also create its semisupervised counterpart and then extend it to be usable with missing data. We prove an equivalence to linear regression and create experiments for each extension of the factor analysis model. The resulting algorithms are either a closed-form solution or an expectation–maximization (EM) algorithm. The latter is linked to information theory by optimizing a function containing a Kullback–Leibler (KL) divergence or the entropy of a random variable.


2017 ◽  
Vol 6 (6) ◽  
pp. 35 ◽  
Author(s):  
Karl Schweizer ◽  
Stefan Troche ◽  
Siegbert Reiß

The paper reports an investigation of whether sums of squared factor loadings obtained in confirmatory factor analysis correspond to eigenvalues of exploratory factor analysis. The sum of squared factor loadings reflects the variance of the corresponding latent variable if the variance parameter of the confirmatory factor model is set equal to one. Hence, the computation of the sum implies a specific type of scaling of the variance. While the investigation of the theoretical foundations suggested the expected correspondence between sums of squared factor loadings and eigenvalues, the necessity of procedural specifications in the application, as for example the estimation method, revealed external influences on the outcome. A simulation study was conducted that demonstrated the possibility of exact correspondence if the same estimation method was applied. However, in the majority of realized specifications the estimates showed similar sizes but no correspondence. 


Sign in / Sign up

Export Citation Format

Share Document