scholarly journals Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting

2021 ◽  
pp. 001316442110047
Author(s):  
Ulrich Schroeders ◽  
Christoph Schmidt ◽  
Timo Gnambs

Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.

2020 ◽  
Author(s):  
Ulrich Schroeders ◽  
Christoph Schmidt ◽  
Timo Gnambs

Careless responding is considered a bias in survey responses without regard to the actual item content which constitutes a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the art machine learning technique, are introduced to identify carleess responders. The performance of the approach was compared to established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent vs. careless response behavior were induced. The comparison between the results of the simulation and the online study showed that simulations that rely on prototypical pattern of careless responses tend to overestimate the classification accuracy. Gradient boosted trees outperform traditional detection mechanisms in flagging aberrant responses, especially by including response times as paradata, but are not to be misunderstood as a panacea of data cleaning. We critically discuss the results with regard to their generalizability and provide recommendations for the detection of aberrant response patterns in survey research.


2020 ◽  
Author(s):  
Ulrich Schroeders ◽  
Christoph Schmidt ◽  
Timo Gnambs

Careless responding is considered a bias in survey responses without regard to the actual item content which constitutes a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the art machine learning technique, are introduced to identify carleess responders. The performance of the approach was compared to established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent vs. careless response behavior were induced. The comparison between the results of the simulation and the online study showed that simulations that rely on prototypical pattern of careless responses tend to overestimate the classification accuracy. Gradient boosted trees outperform traditional detection mechanisms in flagging aberrant responses, especially by including response times as paradata, but are not to be misunderstood as a panacea of data cleaning. We critically discuss the results with regard to their generalizability and provide recommendations for the detection of aberrant response patterns in survey research.


Author(s):  
Jason L. Huang ◽  
Zhonghao Wang

Careless responding, also known as insufficient effort responding, refers to survey/test respondents providing random, inattentive, or inconsistent answers to question items due to lack of effort in conforming to instructions, interpreting items, and/or providing accurate responses. Researchers often use these two terms interchangeably to describe deviant behaviors in survey/test responding that threaten data quality. Careless responding threatens the validity of research findings by bringing in random and systematic errors. Specifically, careless responding can reduce measurement reliability, while under specific circumstances it can also inflate the substantive relations between variables. Numerous factors can explain why careless responding happens (or does not happen), such as individual difference characteristics (e.g., conscientiousness), survey characteristics (e.g., survey length), and transient psychological states (e.g., positive and negative affect). To identify potential careless responding, researchers can use procedural detection methods and post hoc statistical methods. For example, researchers can insert detection items (e.g., infrequency items, instructed response items) into the questionnaire, monitor participants’ response time, and compute statistical indices, such as psychometric antonym/synonym, Mahalanobis distance, individual reliability, individual response variability, and model fit statistics. Application of multiple detection methods would be better able to capture careless responding given convergent evidence. Comparison of results based on data with and without careless respondents can help evaluate the degree to which the data are influenced by careless responding. To handle data contaminated by careless responding, researchers may choose to filter out identified careless respondents, recode careless responses as missing data, or include careless responding as a control variable in the analysis. To prevent careless responding, researchers have tried utilizing various deterrence methods developed from motivational and social interaction theories. These methods include giving warning, rewarding, or educational messages, proctoring the process of responding, and designing user-friendly surveys. Interest in careless responding has been growing not only in business and management but also in other related disciplines. Future research and practice on careless responding in the business and management areas can also benefit from findings in other related disciplines.


2017 ◽  
Vol 76 (3) ◽  
pp. 91-105 ◽  
Author(s):  
Vera Hagemann

Abstract. The individual attitudes of every single team member are important for team performance. Studies show that each team member’s collective orientation – that is, propensity to work in a collective manner in team settings – enhances the team’s interdependent teamwork. In the German-speaking countries, there was previously no instrument to measure collective orientation. So, I developed and validated a German-language instrument to measure collective orientation. In three studies (N = 1028), I tested the validity of the instrument in terms of its internal structure and relationships with other variables. The results confirm the reliability and validity of the instrument. The instrument also predicts team performance in terms of interdependent teamwork. I discuss differences in established individual variables in team research and the role of collective orientation in teams. In future research, the instrument can be applied to diagnose teamwork deficiencies and evaluate interventions for developing team members’ collective orientation.


2021 ◽  
pp. 152483802098554
Author(s):  
Stephanie Gusler ◽  
Jessy Guler ◽  
Rachel Petrie ◽  
Heather Marshall ◽  
Daryl Cooley ◽  
...  

Although evidence suggests that individuals’ appraisals (i.e., subjective interpretations) of adverse or traumatic life events may serve as a mechanism accounting for differences in adversity exposure and psychological adjustment, understanding this mechanism is contingent on our ability to reliably and consistently measure appraisals. However, measures have varied widely between studies, making conclusions about how best to measure appraisal a challenge for the field. To address this issue, the present study reviewed 88 articles from three research databases, assessing adults’ appraisals of adversity. To be included in the scoping review, articles had to meet the following criteria: (1) published no earlier than 1999, (2) available in English, (3) published as a primary source manuscript, and (4) included a measure assessing for adults’ (over the age of 18) subjective primary and/or secondary interpretations of adversity. Each article was thoroughly reviewed and coded based on the following information: study demographics, appraisal measurement tool(s), category of appraisal, appraisal dimensions (e.g., self-blame, impact, and threat), and the tool’s reliability and validity. Further, information was coded according to the type of adversity appraised, the time in which the appraised event occurred, and which outcomes were assessed in relation to appraisal. Results highlight the importance of continued examination of adversity appraisals and reveal which appraisal tools, categories, and dimensions are most commonly assessed for. These results provide guidance to researchers in how to examine adversity appraisals and what gaps among the measurement of adversity appraisal which need to be addressed in the future research.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 517
Author(s):  
Seong-heum Kim ◽  
Youngbae Hwang

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.


Author(s):  
Marcelo N. de Sousa ◽  
Ricardo Sant’Ana ◽  
Rigel P. Fernandes ◽  
Julio Cesar Duarte ◽  
José A. Apolinário ◽  
...  

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Louis Ehwerhemuepha ◽  
Theodore Heyming ◽  
Rachel Marano ◽  
Mary Jane Piroutek ◽  
Antonio C. Arrieta ◽  
...  

AbstractThis study was designed to develop and validate an early warning system for sepsis based on a predictive model of critical decompensation. Data from the electronic medical records for 537,837 visits to a pediatric Emergency Department (ED) from March 2013 to December 2019 were collected. A multiclass stochastic gradient boosting model was built to identify early warning signs associated with death, severe sepsis, non-severe sepsis, and bacteremia. Model features included triage vital signs, previous diagnoses, medications, and healthcare utilizations within 6 months of the index ED visit. There were 483 patients who had severe sepsis and/or died, 1102 had non-severe sepsis, 1103 had positive bacteremia tests, and the remaining had none of the events. The most important predictors were age, heart rate, length of stay of previous hospitalizations, temperature, systolic blood pressure, and prior sepsis. The one-versus-all area under the receiver operator characteristic curve (AUROC) were 0.979 (0.967, 0.991), 0.990 (0.985, 0.995), 0.976 (0.972, 0.981), and 0.968 (0.962, 0.974) for death, severe sepsis, non-severe sepsis, and bacteremia without sepsis respectively. The multi-class macro average AUROC and area under the precision recall curve were 0.977 and 0.316 respectively. The study findings were used to develop an automated early warning decision tool for sepsis. Implementation of this model in pediatric EDs will allow sepsis-related critical decompensation to be predicted accurately after a few seconds of triage.


Sensors ◽  
2019 ◽  
Vol 19 (22) ◽  
pp. 4916 ◽  
Author(s):  
Qiaoyun Wu ◽  
Yunzhe Zhang ◽  
Qian Yang ◽  
Ning Yuan ◽  
Wei Zhang

The vital importance of rapid and accurate detection of food borne pathogens has driven the development of biosensor to prevent food borne illness outbreaks. Electrochemical DNA biosensors offer such merits as rapid response, high sensitivity, low cost, and ease of use. This review covers the following three aspects: food borne pathogens and conventional detection methods, the design and fabrication of electrochemical DNA biosensors and several techniques for improving sensitivity of biosensors. We highlight the main bioreceptors and immobilizing methods on sensing interface, electrochemical techniques, electrochemical indicators, nanotechnology, and nucleic acid-based amplification. Finally, in view of the existing shortcomings of electrochemical DNA biosensors in the field of food borne pathogen detection, we also predict and prospect future research focuses from the following five aspects: specific bioreceptors (improving specificity), nanomaterials (enhancing sensitivity), microfluidic chip technology (realizing automate operation), paper-based biosensors (reducing detection cost), and smartphones or other mobile devices (simplifying signal reading devices).


Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1820
Author(s):  
Ekaterina V. Orlova

This research deals with the challenge of reducing banks’ credit risks associated with the insolvency of borrowing individuals. To solve this challenge, we propose a new approach, methodology and models for assessing individual creditworthiness, with additional data about borrowers’ digital footprints to implement comprehensive analysis and prediction of a borrower’s credit profile. We suggest a model for borrowers’ clustering based on the method of hierarchical clustering and the k-means method, which groups actual borrowers having similar creditworthiness and similar credit risks into homogeneous clusters. We also design the model for borrowers’ classification based on the stochastic gradient boosting (SGB) method, which reliably determines the cluster number and therefore the risk level for a new borrower. The developed models are the basis for decision making regarding the decision about lending value, interest rates and lending terms for each risk-homogeneous borrower’s group. The modified version of the methodology for assessing individual creditworthiness is presented, which is to reduce the credit risks and to increase the stability and profitability of financial organizations.


Sign in / Sign up

Export Citation Format

Share Document