scholarly journals Class Center-Based Firefly Algorithm for Handling Missing Data

Author(s):  
Heru Nugroho ◽  
Nugraha Priya Utama ◽  
Kridanto Surendro

Abstract Estimating missing data in a dataset is a significant advance during the data cleaning stage. Improper data handling can make inaccurate results when conducting data analysis. Most of the research about missing data estimation is irrespective of the correlation between attributes. However, an adaptive search procedure helps find the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by finding the estimated value that is closest to the value in other data known. Therefore, this study proposes a class center-based adaptive approach model for missing data by considering the attribute correlation in the imputation process (C3-FA). Based on the experiment, the general result find that the class center-based firefly algorithm is an efficient technique for getting the actual value in handling the missing data. This can be seen on the value of Pearson correlation coefficient (r) that close to 1 and the root mean squared error (RMSE) value is generally closer to 0. In addition, the proposed method can maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test that value of DKS for most of the attributes in the dataset is generally closer to 0. Also, the results of the accuracy evaluation using three classifiers, showed that the proposed method produces good accuracy.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Heru Nugroho ◽  
Nugraha Priya Utama ◽  
Kridanto Surendro

AbstractA significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient (r) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.


2021 ◽  
Author(s):  
Heru Nugroho ◽  
Nugraha Priya Utama ◽  
Kridanto Surendro

Abstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes . However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient ( r ) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Daisuke Miyamori ◽  
Takeshi Uemura ◽  
Wenliang Zhu ◽  
Kei Fujikawa ◽  
Takaaki Nakaya ◽  
...  

AbstractThe recent increase of the number of unidentified cadavers has become a serious problem throughout the world. As a simple and objective method for age estimation, we attempted to utilize Raman spectrometry for forensic identification. Raman spectroscopy is an optical-based vibrational spectroscopic technique that provides detailed information regarding a sample’s molecular composition and structures. Building upon our previous proof-of-concept study, we measured the Raman spectra of abdominal skin samples from 132 autopsy cases and the protein-folding intensity ratio, RPF, defined as the ratio between the Raman signals from a random coil an α-helix. There was a strong negative correlation between age and RPF with a Pearson correlation coefficient of r = 0.878. Four models, based on linear (RPF), squared (RPF2), sex, and RPF by sex interaction terms, were examined. The results of cross validation suggested that the second model including linear and squared terms was the best model with the lowest root mean squared error (11.3 years of age) and the highest coefficient of determination (0.743). Our results indicate that the there was a high correlation between the age and RPF and the Raman biological clock of protein folding can be used as a simple and objective forensic age estimation method for unidentified cadavers.


2015 ◽  
Vol 4 (2) ◽  
pp. 74
Author(s):  
MADE SUSILAWATI ◽  
KARTIKA SARI

Missing data often occur in agriculture and animal husbandry experiment. The missing data in experimental design makes the information that we get less complete. In this research, the missing data was estimated with Yates method and Expectation Maximization (EM) algorithm. The basic concept of the Yates method is to minimize sum square error (JKG), meanwhile the basic concept of the EM algorithm is to maximize the likelihood function. This research applied Balanced Lattice Design with 9 treatments, 4 replications and 3 group of each repetition. Missing data estimation results showed that the Yates method was better used for two of missing data in the position on a treatment, a column and random, meanwhile the EM algorithm was better used to estimate one of missing data and two of missing data in the position of a group and a replication. The comparison of the result JKG of ANOVA showed that JKG of incomplete data larger than JKG of incomplete data that has been added with estimator of data. This suggest  thatwe need to estimate the missing data.


2019 ◽  
Vol 25 (3) ◽  
pp. 325-335
Author(s):  
Maria Zefanya Sampe ◽  
Eko Ariawan ◽  
I Wayan Ariawan

Employee turnover is a common issue in any company. A high turnover phenomenon becomes a big problem that will certainly affect the performance of the company. Therefore, measuring employee turnover can be helpful to employers to improve employee retention rates and give them a head start on turnover. A study to analyze for employee loyalty has been carried out by using Logistic Regression (LR) and Artificial Neural Networks (ANN) model. Response variables such as satisfaction level, number of projects, average monthly working hours, employment period, working accident, promotion in the last 5 years, department, and salary level are used to model the employee turnover. Parameters such as accuracy, precision, sensitivity, Kolmogorov-Smirnov statistic, and Mean Squared Error (MSE) are used to compare both models.


2018 ◽  
Vol 9 (3) ◽  
Author(s):  
Arethuza De Melo Brito Carvalho ◽  
Juliana Araújo Cardoso ◽  
Francisca Aline Amaral Da Silva ◽  
Jefferson Abraão Caetano Lira ◽  
Samuel Moura Carvalho

Objetivo: avaliar a qualidade de vida no trabalho da equipe de enfermagem do centro cirúrgico. Metodologia: estudo descritivo, de corte transversal e com abordagem quantitativa, desenvolvido com 70 profissionais de enfermagem do centro cirúrgico em um hospital de referência em Teresina (PI), com a aplicação dos questionários sociodemográfico e Quality Working Life Questionnaire-bref. A análise foi realizada com o SPSS versão 21.0, utilizando a correlação de Pearson e os testes alfa de Cronbach, Kolmogorov-Smirnov, Qui-quadrado e regressão linear múltipla, com intervalo de confiança de 95%. Resultados: a maioria dos participantes (62,9%) tiveram menor impacto na qualidade de vida no trabalho, entretanto a renda familiar e a formação acadêmica foram bastante significativas. Conclusão: apesar do trabalho no centro cirúrgico apresentar baixo impacto na qualidade de vida, o domínio psicológico demonstrou impacto considerável, destacando que a preocupação com a saúde do trabalhador e a valorização da equipe de enfermagem ainda precisam avançar.Descritores: Qualidade de vida; Enfermagem de centro cirúrgico; Saúde do trabalhador.QUALITY OF LIFE IN THE WORK OF THE SURGI CAL CENTER NURSING TEAMObjective: to evaluate the quality of life in the work of the nursing staff of the surgical center. Methodology: a descriptive, cross-sectional study with a quantitative approach developed with 70 nursing professionals from a surgical center at a reference hospital in Teresina (PI), with the application of the sociodemographic and Quality Working Life Questionnairebref questionnaires. The analysis was performed with the SPSS version 21.0, using the Pearson correlation and the Cronbach, Kolmogorov-Smirnov, Chi-square and multiple linear regression alpha tests with a 95% confidence interval. Results: the majority of the participants (62.9%) had a lower impact on the quality of life at work, however the family income and academic training were quite significant. Conclusion: Although the work in the surgical center has a low impact on the quality of life, the psychological domain has shown considerable impact, emphasizing that the concern with the health of the worker and the appreciation of the nursing team still need to move forward.Descriptors: Quality of life; Operating Room Nursing; Occupational Health.CALIDAD DE VIDA EN ENFERMERÍA DEL CENTRO DE TRABA JO DE EQUIPO QUIRÚRGICOObjetivo: Evaluar la calidad de vida del personal de enfermería que trabaja en la sala de operaciones. Metodología: enfoque descriptivo, transversal y cuantitativo desarrollado con 70 enfermeras en el quirófano de un hospital de referencia en Teresina (PI), con la aplicación de un cuestionario sociodemográfico y Calidad de Vida Cuestionario de Trabajo-BREF. El análisis se realizó con el programa SPSS versión 21.0 mediante la prueba de correlación de Pearson y alfa de Cronbach, prueba de Kolmogorov-Smirnov, regresión lineal chi-cuadrado y múltiple con un intervalo de confianza del 95%. Resultados: La mayoría de los participantes (62,9%) tenían un menor impacto en la calidad de la vida laboral, aunque el ingreso familiar y la formación académica eran bastante significativa. Conclusión: Aunque el trabajo en la sala de operaciones tiene un bajo impacto en la calidad de vida, el dominio psicológico mostró un impacto considerable, teniendo en cuenta que la preocupación por la salud de los trabajadores y la apreciación del equipo de enfermería todavía tienen que seguir adelante.Descriptores: Calidad de vida; Enfermería de quirófano; Salud laboral.


10.2196/27386 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e27386
Author(s):  
Qingyu Chen ◽  
Alex Rankine ◽  
Yifan Peng ◽  
Elaheh Aghaarabi ◽  
Zhiyong Lu

Background Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank. Objective Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications. Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures. Results Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications. Conclusions Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.


2017 ◽  
Vol 1 (1) ◽  
pp. 43 ◽  
Author(s):  
Masoud Tosifyan ◽  
Saeed Tosifyan

This research was done with the aim to investigate the effect of social media on tendency to entrepreneurship and business establishment. The aim of applied research and methods used in this survey was a descriptive survey research. A standard questionnaire was used to collect relevant data in this study. The reliability of each questionnaire was estimated 0.779, 0.806 and 0.798. The population study is Iranian entrepreneurs who are active in social media and number of them is uncertain; A sample of 120 active Iranian entrepreneurs were selected as target and a questionnaire was distributed among these individuals. To collect the information and necessary data to evaluate the hypotheses of the research, a questionnaire and SPSS and LISREL software were evaluated.  At inferential comprehension level, techniques of Kolmogorov-Smirnov test for being normal, Pearson correlation test and structural equation modelling were used to test the hypotheses. Based on the results, the hypotheses were accepted.


Sign in / Sign up

Export Citation Format

Share Document