A Study on Analyzing Data from Designed Experiments with Missing Values

2021 ◽  
Vol 47 (3) ◽  
pp. 321-325
Author(s):  
Jai-Hyun Byun
Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


2017 ◽  
Author(s):  
Natalia Sizochenko ◽  
Alicja Mikolajczyk ◽  
Karolina Jagiello ◽  
Tomasz Puzyn ◽  
Jerzy Leszczynski ◽  
...  

Application of predictive modeling approaches is able solve the problem of the missing data. There are a lot of studies that investigate the effects of missing values on qualitative or quantitative modeling, but only few publications have been<br>discussing it in case of applications to nanotechnology related data. Current project aimed at the development of multi-nano-read-across modeling technique that helps in predicting the toxicity of different species: bacteria, algae, protozoa, and mammalian cell lines. In this study, the experimental toxicity for 184 metal- and silica oxides (30 unique chemical types) nanoparticles from 15 experimental datasets was analyzed. A hybrid quantitative multi-nano-read-across approach that combines interspecies correlation analysis and self-organizing map analysis was developed. At the first step, hidden patterns of toxicity among the nanoparticles were identified using a combination of methods. Then the developed model that based on categorization of metal oxide nanoparticles’ toxicity outcomes was evaluated by means of combination of supervised and unsupervised machine learning techniques to find underlying factors responsible for toxicity.


2017 ◽  
Author(s):  
Natalia Sizochenko ◽  
Alicja Mikolajczyk ◽  
Karolina Jagiello ◽  
Tomasz Puzyn ◽  
Jerzy Leszczynski ◽  
...  

Application of predictive modeling approaches is able solve the problem of the missing data. There are a lot of studies that investigate the effects of missing values on qualitative or quantitative modeling, but only few publications have been<br>discussing it in case of applications to nanotechnology related data. Current project aimed at the development of multi-nano-read-across modeling technique that helps in predicting the toxicity of different species: bacteria, algae, protozoa, and mammalian cell lines. In this study, the experimental toxicity for 184 metal- and silica oxides (30 unique chemical types) nanoparticles from 15 experimental datasets was analyzed. A hybrid quantitative multi-nano-read-across approach that combines interspecies correlation analysis and self-organizing map analysis was developed. At the first step, hidden patterns of toxicity among the nanoparticles were identified using a combination of methods. Then the developed model that based on categorization of metal oxide nanoparticles’ toxicity outcomes was evaluated by means of combination of supervised and unsupervised machine learning techniques to find underlying factors responsible for toxicity.


2019 ◽  
Author(s):  
Víctor Gabriel Baldovino Medrano ◽  
Karen V. Caballero ◽  
Hernando Guerrero-Amaya

Turnover rates for glycerol esterification with acetic acid over Amberlyst-35 were measured under different temperatures, reactants and active sites concentrations, and catalyst particle sizes. Data were collected in a batch reactor. Experiments were done following a sequence of factorial experimental designs.


Author(s):  
E. Widener ◽  
S. Tatti ◽  
P. Schani ◽  
S. Crown ◽  
B. Dunnigan ◽  
...  

Abstract A new 0.5 um 1 Megabit SRAM which employed a double metal, triple poly CMOS process with Tungsten plug metal to poly /silicon contacts was introduced. During burn-in of this product, high currents, apparently due to electrical overstress, were experienced. Electrical analysis showed abnormal supply current characteristics at high voltages. Failure analysis identified the sites of the high currents of the bum-in rejects and discovered cracks in the glue layer prior to Tungsten deposition as the root cause of the failure. The glue layer cracks allowed a reaction with the poly/silicon, causing opens at the bottom of contacts. These floating nodes caused high currents and often latch-up during burn-in. Designed experiments in the wafer fab identified an improved glue layer process, which has been implemented. The new process shows improvement in burn in performance as well as outgoing product quality.


2020 ◽  
Vol 21 ◽  
Author(s):  
Sukanya Panja ◽  
Sarra Rahem ◽  
Cassandra J. Chu ◽  
Antonina Mitrofanova

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.


Author(s):  
B. Mathura Bai ◽  
N. Mangathayaru ◽  
B. Padmaja Rani ◽  
Shadi Aljawarneh

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.


Sign in / Sign up

Export Citation Format

Share Document