scholarly journals Missing data analysis using machine learning methods to predict the performance of technical students

2020 ◽  
Vol 12 (2) ◽  
pp. 134-143
Author(s):  
Gilberto De Melo Junior ◽  
Symone G. Soares Alcalá ◽  
Geovanne Pereira Furriel ◽  
Sílvio L. Vieira

O aprendizado de máquina (ML) tornou-se uma tecnologia emergente capaz de resolver problemas em muitas áreas, incluindo educação, medicina, robótica e aeroespacial. O ML é um campo específico de inteligência artificial que projeta modelos computacionais capazes de aprender com os dados. No entanto, para desenvolver um modelo de ML, é necessário garantir a qualidade dos dados, pois os dados do mundo real são incompletos, ruídosos e inconsistentes. Este artigo avalia métodos avançados de tratamento de dados ausentes usando algoritmos ML para classificar o desempenho de estudantes do ensino médio do Instituto Federal de Goiânia como no Brasil. O objetivo é fornecer uma ferramenta computacional eficiente para auxiliar o desempenho educacional que permite aos educadores verificar a tendência do aluno a reprovar. Os resultados indicam que o método de ignorar e descartar supera outros métodos de tratamento de dados ausentes. Além disso, os testes revelam que a Otimização Mínima Sequencial, Redes Neurais e Bagging superam os outros algoritmos de ML, como Naive Bayes e Árvore de Decisão, em termos de precisão de classificação.

Author(s):  
Ebru Efeoglu ◽  
Gurkan Tuna

In this chapter, traditional and innovative approaches used in hazardous liquid detection are reviewed, and a novel approach for the detection of hazardous liquids is presented. The proposed system is based on electromagnetic response measurements of liquids in the microwave frequency band. Thanks to this technique, liquid classification can be made quickly without pouring the liquid from its bottle and without opening the lid of its bottle. The system can detect solutions with hazardous liquid concentrations of 70% or more, as well as pure hazardous liquids. Since it relies on machine learning methods and the success of all machine learning methods depends on provided data type and dataset, a performance evaluation study has been carried out to find the most suitable method. In the performance evaluation study naive Bayes and sequential minimal optimization has been evaluated, and the results have shown that naive Bayes is more suitable for liquid classification.


2021 ◽  
Author(s):  
Hong Zhao ◽  
Wanling li ◽  
Junsheng Li ◽  
Li Li ◽  
Hang Wang ◽  
...  

Abstract Purpose Using machine learning methods (MLMs) to predict stone-free status after percutaneous nephrolithotomy (PCNL). We compared the performance of this system with Guy’s stone score and S.T.O.N.E score system. Materials and Methods Data from 222 patients (90 females, 41%) who underwent PCNL at our center were used. Twenty-six parameters, including individual variables, renal and stone factors, surgical factors were used as input data for MLMS. We evaluate the efficacy of four different techniques: Lasso-logistic (LL), random forest (RF), support vector machine (SVM) and Naive Bayes. Model performance was evaluated using area under curve (AUC) and compared with Guy’s stone score and S.T.O.N.E score system. Results Overall stone free rate was 50% (111/222). To predict the stone-free status, all receiver operating characteristic curves of the four MLMs were above the curve for the Guy’s stone score. The AUCs of LL, RF, SVM and Naive Bayes were 0.879, 0.803, 0.818, 0.803 respectively. Those values were higher than the AUC of the Guy’s score system, 0.800. The accuracies of the MLMs (0.803–0.818%) were also superior to S.T.O.N.E score system (0.788%). Among the MLMs, Lasso-logistic showed the most favorable AUC. Conclusion Machine learning methods can predict stone-free rate with AUCs no inferior to those of Guy’s stone score and S.T.O.N.E score system.


2016 ◽  
Vol 100 ◽  
pp. 731-738 ◽  
Author(s):  
A. Salcedo-Bernal ◽  
M.P. Villamil-Giraldo ◽  
A.D. Moreno-Barbosa

2018 ◽  
Vol 12 (2) ◽  
pp. 66-71
Author(s):  
A. V. Zolotaryuk ◽  
I. A. Chechneva

The authors consider the problems associated with the activities of microfinance organizations, and directions to eliminate them. The subject of the study is the need to introduce machine learning to solve urgent problems. Machine learning methods are increasingly being implemented to analyze financial and economic information, which reduces and eliminates some of the difficulties. Although currently these methods are not widely used in the field of microfinance institutions (MFIs), there are opportunities for their application. The aim of the work is to determine the prospects for the use of these methods in MFOs. The article describes the subject area of research, associated with MFIs. The authors identify the main groups of problems related to MFOs, consider the possibility of introducing machine learning for data analysis in this area and determine the main directions of the possible use of machine learning for MFIs. The authors concluded that such methods are applicable for assessing the performance of MFIs.


2019 ◽  
Vol 109 (2) ◽  
pp. 251-277 ◽  
Author(s):  
Nastasiya F. Grinberg ◽  
Oghenejokpeme I. Orhobor ◽  
Ross D. King

Abstract In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.


Polymers ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 825
Author(s):  
Kaixin Liu ◽  
Zhengyang Ma ◽  
Yi Liu ◽  
Jianguo Yang ◽  
Yuan Yao

Increasing machine learning methods are being applied to infrared non-destructive assessment for internal defects assessment of composite materials. However, most of them extract only linear features, which is not in accord with the nonlinear characteristics of infrared data. Moreover, limited infrared images tend to restrict the data analysis capabilities of machine learning methods. In this work, a novel generative kernel principal component thermography (GKPCT) method is proposed for defect detection of carbon fiber reinforced polymer (CFRP) composites. Specifically, the spectral normalization generative adversarial network is proposed to augment the thermograms for model construction. Sequentially, the KPCT method is used by feature mapping of all thermogram data using kernel principal component analysis, which allows for differentiation of defects and background in the dimensionality-reduced data. Additionally, a defect-background separation metric is designed to help the performance evaluation of data analysis methods. Experimental results on CFRP demonstrate the feasibility and advantages of the proposed GKPCT method.


2021 ◽  
Vol 22 (8) ◽  
pp. 4107
Author(s):  
Ambarish Nag ◽  
Alida Gerritsen ◽  
Crissa Doeppke ◽  
Anne E. Harman-Ware

High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses are becoming increasingly popular for the rapid analysis of biomass cell wall composition and typically require the use of different data analysis tools depending on the need and application. Here, the authors report the py-MBMS analysis of several types of lignocellulosic biomass to gain an understanding of spectral patterns and variation with associated biomass composition and use machine learning approaches to classify, differentiate, and predict biomass types on the basis of py-MBMS spectra. Py-MBMS spectra were also corrected for instrumental variance using generalized linear modeling (GLM) based on the use of select ions relative abundances as spike-in controls. Machine learning classification algorithms e.g., random forest, k-nearest neighbor, decision tree, Gaussian Naïve Bayes, gradient boosting, and multilayer perceptron classifiers were used. The k-nearest neighbors (k-NN) classifier generally performed the best for classifications using raw spectral data, and the decision tree classifier performed the worst. After normalization of spectra to account for instrumental variance, all the classifiers had comparable and generally acceptable performance for predicting the biomass types, although the k-NN and decision tree classifiers were not as accurate for prediction of specific sample types. Gaussian Naïve Bayes (GNB) and extreme gradient boosting (XGB) classifiers performed better than the k-NN and the decision tree classifiers for the prediction of biomass mixtures. The data analysis workflow reported here could be applied and extended for comparison of biomass samples of varying types, species, phenotypes, and/or genotypes or subjected to different treatments, environments, etc. to further elucidate the sources of spectral variance, patterns, and to infer compositional information based on spectral analysis, particularly for analysis of data without a priori knowledge of the feedstock composition or identity.


Metabolites ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 243 ◽  
Author(s):  
Ulf W. Liebal ◽  
An N. T. Phan ◽  
Malvika Sudhakar ◽  
Karthik Raman ◽  
Lars M. Blank

The metabolome of an organism depends on environmental factors and intracellular regulation and provides information about the physiological conditions. Metabolomics helps to understand disease progression in clinical settings or estimate metabolite overproduction for metabolic engineering. The most popular analytical metabolomics platform is mass spectrometry (MS). However, MS metabolome data analysis is complicated, since metabolites interact nonlinearly, and the data structures themselves are complex. Machine learning methods have become immensely popular for statistical analysis due to the inherent nonlinear data representation and the ability to process large and heterogeneous data rapidly. In this review, we address recent developments in using machine learning for processing MS spectra and show how machine learning generates new biological insights. In particular, supervised machine learning has great potential in metabolomics research because of the ability to supply quantitative predictions. We review here commonly used tools, such as random forest, support vector machines, artificial neural networks, and genetic algorithms. During processing steps, the supervised machine learning methods help peak picking, normalization, and missing data imputation. For knowledge-driven analysis, machine learning contributes to biomarker detection, classification and regression, biochemical pathway identification, and carbon flux determination. Of important relevance is the combination of different omics data to identify the contributions of the various regulatory levels. Our overview of the recent publications also highlights that data quality determines analysis quality, but also adds to the challenge of choosing the right model for the data. Machine learning methods applied to MS-based metabolomics ease data analysis and can support clinical decisions, guide metabolic engineering, and stimulate fundamental biological discoveries.


Author(s):  
T. I. Nurgaliev

This review briefly describes modern approaches of data analysis in psychiatry using machine learning and gives possible prospects and common obstacles of this approach.


2019 ◽  
Vol 1 (88) ◽  
pp. 27-38
Author(s):  
G.G. Rapakov ◽  
G.T. Banshchikov ◽  
V.A. Gorbunov ◽  
L.L. Malygin ◽  
I.M. Revelev

Sign in / Sign up

Export Citation Format

Share Document