Effects of gamification on participation and data quality in a real-world market research domain

Author(s):  
Jared Cechanowicz ◽  
Carl Gutwin ◽  
Briana Brownell ◽  
Larry Goodfellow
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yiqing Zhao ◽  
Saravut J. Weroha ◽  
Ellen L. Goode ◽  
Hongfang Liu ◽  
Chen Wang

Abstract Background Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in the clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N = 196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results We identified seven topics in the clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance. Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, the capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate the real-world utility of genetic information to initiate a prescription of targeted therapy.


2017 ◽  
Author(s):  
Amelia McNamara ◽  
Nicholas J Horton

Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This paper discusses common problems arising from categorical variable transformations in R, demonstrates the use of factors, and suggests approaches to address data wrangling challenges. For each problem, we present at least two strategies for management, one in base R and the other from the ‘tidyverse.’ We consider several motivating examples, suggest defensive coding strategies, and outline principles for data wrangling to help ensure data quality and sound analysis.


2020 ◽  
Author(s):  
Yiqing ZHAO ◽  
Saravut J Weroha ◽  
Ellen Goode ◽  
Hongfang Liu ◽  
Chen Wang

Abstract Background: Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information.Methods: We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated on a Foundation-tested women cancer cohort (N=196). Upon retrieval of patients’ genetic information using NLP system, we assessed completeness of genetic data captured in unstructured clinical notes according a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results: We identified seven topics in clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance (VUS). Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies.Conclusions: In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issue such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate real-world utility of genetic information to initiate prescription of targeted therapy.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Xiaoting Zhong ◽  
Brian Gallagher ◽  
Keenan Eves ◽  
Emily Robertson ◽  
T. Nathan Mundhenk ◽  
...  

AbstractMachine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM) micrographs for molecular solid materials, in which image pixel intensities vary due to both the microstructure content and microscope instrument conditions. We then built ML models to predict the ultimate compressive strength (UCS) of consolidated molecular solids, by encoding micrographs with different image feature descriptors and training a random forest regressor, and by training an end-to-end deep-learning (DL) model. Results show that instrument-induced pixel intensity signals can affect ML model predictions in a consistently negative way. As a remedy, we explored intensity normalization techniques. It is seen that intensity normalization helps to improve micrograph data quality and ML model robustness, but microscope-induced intensity variations can be difficult to eliminate.


2020 ◽  
Vol 2020 ◽  
pp. 1-18 ◽  
Author(s):  
Haniyeh Dastyar ◽  
Daniel Rippel ◽  
Michael Freitag

Over the last decades, supplier development has become an increasingly important concept to remain competitive in today’s markets. Therefore, manufacturers invest resources in their suppliers to increase their abilities and, ultimately, to reduce their product prices. Thereby, most approaches found in the literature focus on long-term supplier development programs. Nevertheless, today’s volatile and dynamic markets require flexible approaches to deal with this complexity. We apply Model Predictive Control to optimize the number of supplier development projects in order to achieve flexibility while maintaining a certain level of security for all parties. Thereby, the article focusses on a multimanufacturer scenario, where two manufacturers aim to develop the same supplier. These manufacturers can establish different levels of horizontal collaboration. While previous results already show the benefits of applying this approach to a static scenario, this article extends this formulation by introducing market dynamics in the numerical simulations as well as into the optimization approach. Thus, the article proposes to derive regression models using real-world data. The article evaluates the effects of real-world market dynamics on two use cases: an automotive use case and a use case from the mobile phone sector. The results show that assuming market dynamics during the optimization leads to increased or at least close-to-equal revenues across the involved partners. The average increase ranges from approximately 1% to 5% depending on the type and magnitude of the dynamics. Thereby, the results differ depending on the selected collaboration scheme. While a full-cooperative collaboration scheme benefits the least from regarding dynamics in the optimization, it results in the highest overall revenue across all partners.


2021 ◽  
pp. 1-12
Author(s):  
Jing Wang ◽  
Jie Wei ◽  
Long Li ◽  
Lijian Zhang

With the rapid development of evidence-based medicine, translational medicine, and pharmacoeconomics in China, as well as the country’s strong commitment to clinical research, the demand for physicians’ research continues to increase. In recent years, real-world studies are attracting more and more attention in the field of health care, as a method of post-marketing re-evaluation of drugs, RWS can better reflect the effects of drugs in real clinical settings. In the past, it was difficult to ensure data quality and efficiency of research implementation because of the large sample size required and the large amount of medical data involved. However, due to the large sample size required and the large amount of medical data involved, it is not only time-consuming and labor-intensive, but also prone to human error, making it difficult to ensure data quality and efficiency of research implementation. This paper analyzes and summarizes the existing application systems of big data analytics platforms, and concludes that big data research analytics platforms using natural language processing, machine learning and other artificial intelligence technologies can help RWS to quickly complete the collection, integration, processing, statistics and analysis of large amounts of medical data, and deeply mine the intrinsic value of the data, real-world research in new drug development, drug discovery, drug discovery, drug discovery, and drug discovery. It has a broad application prospect for multi-level and multi-angle needs such as economics, medical insurance cost control, indications/contraindications evaluation, and clinical guidance.


2010 ◽  
Vol 50 (1) ◽  
pp. 152-163 ◽  
Author(s):  
Adir Even ◽  
G. Shankaranarayanan ◽  
Paul D. Berger

Sign in / Sign up

Export Citation Format

Share Document